Last active
February 1, 2024 10:07
-
-
Save aeyage/0b7ac3f9d832eaf317554da44948c1a5 to your computer and use it in GitHub Desktop.
N-gram of DBS Bank
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "view-in-github", | |
| "colab_type": "text" | |
| }, | |
| "source": [ | |
| "<a href=\"https://colab.research.google.com/gist/aeyage/0b7ac3f9d832eaf317554da44948c1a5/cpc353-n-gram-of-dbs-bank.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "D3pibhj5iJd_" | |
| }, | |
| "source": [ | |
| "# Text and Sentiment Analysis of DBS Bank Ltd. Banking Performance Using N-gram\n", | |
| "\n", | |
| "" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "5-TuJNm7pZR0" | |
| }, | |
| "source": [ | |
| "**Project Overview:**\n", | |
| "\n", | |
| "This project aims to study the sentiment and text analysis of DBS Bank performance feedback by performing n-gram on a total of 107 consumer reviews from Jul 2019 to Nov 2023 extracted from:\n", | |
| "\n", | |
| " i. *Trustpilot* <br>\n", | |
| " ii. *BankQuality* <br>\n", | |
| "```\n", | |
| "Author: Aiman Hakimi (153153)\n", | |
| "```\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "OiC_7b_4d9Nj" | |
| }, | |
| "source": [ | |
| "Suggestion: Best viewed on *Google Colab*." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "KOqOzWdprajS" | |
| }, | |
| "source": [ | |
| "##**Import Necesssary Libraries**" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "6QefOW-YA3fd", | |
| "outputId": "c6fb9337-c214-4b8f-9269-5fa7b31c5e37" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stdout", | |
| "text": [ | |
| "Collecting demoji\n", | |
| " Downloading demoji-1.1.0-py3-none-any.whl (42 kB)\n", | |
| "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m42.9/42.9 kB\u001b[0m \u001b[31m1.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[?25hInstalling collected packages: demoji\n", | |
| "Successfully installed demoji-1.1.0\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!pip install demoji" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "VU55EWjAsRIj", | |
| "outputId": "c8ef1197-3659-4a19-a126-1c6fe16cd6db" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stderr", | |
| "text": [ | |
| "[nltk_data] Downloading package punkt to /root/nltk_data...\n", | |
| "[nltk_data] Unzipping tokenizers/punkt.zip.\n", | |
| "[nltk_data] Downloading package stopwords to /root/nltk_data...\n", | |
| "[nltk_data] Unzipping corpora/stopwords.zip.\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "%config Completer.use_jedi=False\n", | |
| "\n", | |
| "import numpy as np\n", | |
| "import pandas as pd\n", | |
| "import seaborn as sns\n", | |
| "import matplotlib.pyplot as plt\n", | |
| "from matplotlib import rcParams # stores various configuration settings for matplotlib plots\n", | |
| "%matplotlib inline\n", | |
| "\n", | |
| "# Natural Language Processing Packages\n", | |
| "import nltk\n", | |
| "nltk.download('punkt')\n", | |
| "from nltk import word_tokenize\n", | |
| "from nltk.corpus import stopwords\n", | |
| "nltk.download('stopwords')\n", | |
| "stop = stopwords.words('english')\n", | |
| "from nltk.util import ngrams\n", | |
| "from collections import Counter\n", | |
| "from wordcloud import WordCloud\n", | |
| "\n", | |
| "from sklearn import feature_extraction, linear_model, model_selection, preprocessing\n", | |
| "from sklearn.feature_extraction.text import TfidfTransformer\n", | |
| "from sklearn.feature_extraction.text import CountVectorizer\n", | |
| "from sklearn.feature_extraction.text import TfidfVectorizer\n", | |
| "\n", | |
| "from sklearn.model_selection import train_test_split\n", | |
| "from sklearn.linear_model import LogisticRegression\n", | |
| "\n", | |
| "from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, ConfusionMatrixDisplay\n", | |
| "\n", | |
| "from sklearn.pipeline import Pipeline\n", | |
| "\n", | |
| "import spacy\n", | |
| "# from langcat.langdetect import detect\n", | |
| "from demoji import replace\n", | |
| "\n", | |
| "import re\n", | |
| "import csv\n", | |
| "import string\n", | |
| "\n", | |
| "import warnings\n", | |
| "warnings.filterwarnings(\"ignore\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "ozU4Sl4b183A" | |
| }, | |
| "source": [ | |
| "##**Load The Sample Datasets**" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "m29fpSXK2OXU" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# The dataset uses '?' to represent missing values as a pre-emptive measure as-\n", | |
| "# some reviews may not appear due to customer's privacy settings.\n", | |
| "dbs_rev = pd.read_csv('DBS_reviews_trustpilot_scrapped.csv', na_values='?')\n", | |
| "dbs_pos_rev = pd.read_csv('dbs_pos_reviews.csv', na_values='?')\n", | |
| "dbs_neg_rev = pd.read_csv('dbs_neg_reviews.csv', na_values='?')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "zXVsIKyQdETO" | |
| }, | |
| "source": [ | |
| "##**Understanding The Sample Dataset**" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 206 | |
| }, | |
| "id": "peYYEGj45FFl", | |
| "outputId": "809a0ef1-b947-4001-fb09-88fe6478b26f" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| " Author Author Country Date Rating \\\n", | |
| "0 Samantha SG 12/7/2019 4 \n", | |
| "1 Yasmine SG 19/2/2020 5 \n", | |
| "2 Richard Rowe TH 27/5/2020 5 \n", | |
| "3 Charlie S SG 7/11/2020 4 \n", | |
| "4 Ananya Agarwal IN 11/12/2020 5 \n", | |
| "\n", | |
| " Review \n", | |
| "0 Pretty good. Just ATM can be crowded at times. \n", | |
| "1 A great bank for us Singaporeans!! The smart p... \n", | |
| "2 I've been a customer for four years and have n... \n", | |
| "3 Moved to Singapore from the UK a couple of yea... \n", | |
| "4 wow ...bhumi meri hai " | |
| ], | |
| "text/html": [ | |
| "\n", | |
| " <div id=\"df-3dee980b-beca-4b3f-bce1-1ecb95b3385a\" class=\"colab-df-container\">\n", | |
| " <div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>Author</th>\n", | |
| " <th>Author Country</th>\n", | |
| " <th>Date</th>\n", | |
| " <th>Rating</th>\n", | |
| " <th>Review</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>Samantha</td>\n", | |
| " <td>SG</td>\n", | |
| " <td>12/7/2019</td>\n", | |
| " <td>4</td>\n", | |
| " <td>Pretty good. Just ATM can be crowded at times.</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>Yasmine</td>\n", | |
| " <td>SG</td>\n", | |
| " <td>19/2/2020</td>\n", | |
| " <td>5</td>\n", | |
| " <td>A great bank for us Singaporeans!! The smart p...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>Richard Rowe</td>\n", | |
| " <td>TH</td>\n", | |
| " <td>27/5/2020</td>\n", | |
| " <td>5</td>\n", | |
| " <td>I've been a customer for four years and have n...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>Charlie S</td>\n", | |
| " <td>SG</td>\n", | |
| " <td>7/11/2020</td>\n", | |
| " <td>4</td>\n", | |
| " <td>Moved to Singapore from the UK a couple of yea...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>Ananya Agarwal</td>\n", | |
| " <td>IN</td>\n", | |
| " <td>11/12/2020</td>\n", | |
| " <td>5</td>\n", | |
| " <td>wow ...bhumi meri hai</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>\n", | |
| " <div class=\"colab-df-buttons\">\n", | |
| "\n", | |
| " <div class=\"colab-df-container\">\n", | |
| " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-3dee980b-beca-4b3f-bce1-1ecb95b3385a')\"\n", | |
| " title=\"Convert this dataframe to an interactive table.\"\n", | |
| " style=\"display:none;\">\n", | |
| "\n", | |
| " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", | |
| " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", | |
| " </svg>\n", | |
| " </button>\n", | |
| "\n", | |
| " <style>\n", | |
| " .colab-df-container {\n", | |
| " display:flex;\n", | |
| " gap: 12px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-convert {\n", | |
| " background-color: #E8F0FE;\n", | |
| " border: none;\n", | |
| " border-radius: 50%;\n", | |
| " cursor: pointer;\n", | |
| " display: none;\n", | |
| " fill: #1967D2;\n", | |
| " height: 32px;\n", | |
| " padding: 0 0 0 0;\n", | |
| " width: 32px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-convert:hover {\n", | |
| " background-color: #E2EBFA;\n", | |
| " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", | |
| " fill: #174EA6;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-buttons div {\n", | |
| " margin-bottom: 4px;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-convert {\n", | |
| " background-color: #3B4455;\n", | |
| " fill: #D2E3FC;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-convert:hover {\n", | |
| " background-color: #434B5C;\n", | |
| " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", | |
| " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", | |
| " fill: #FFFFFF;\n", | |
| " }\n", | |
| " </style>\n", | |
| "\n", | |
| " <script>\n", | |
| " const buttonEl =\n", | |
| " document.querySelector('#df-3dee980b-beca-4b3f-bce1-1ecb95b3385a button.colab-df-convert');\n", | |
| " buttonEl.style.display =\n", | |
| " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", | |
| "\n", | |
| " async function convertToInteractive(key) {\n", | |
| " const element = document.querySelector('#df-3dee980b-beca-4b3f-bce1-1ecb95b3385a');\n", | |
| " const dataTable =\n", | |
| " await google.colab.kernel.invokeFunction('convertToInteractive',\n", | |
| " [key], {});\n", | |
| " if (!dataTable) return;\n", | |
| "\n", | |
| " const docLinkHtml = 'Like what you see? Visit the ' +\n", | |
| " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", | |
| " + ' to learn more about interactive tables.';\n", | |
| " element.innerHTML = '';\n", | |
| " dataTable['output_type'] = 'display_data';\n", | |
| " await google.colab.output.renderOutput(dataTable, element);\n", | |
| " const docLink = document.createElement('div');\n", | |
| " docLink.innerHTML = docLinkHtml;\n", | |
| " element.appendChild(docLink);\n", | |
| " }\n", | |
| " </script>\n", | |
| " </div>\n", | |
| "\n", | |
| "\n", | |
| "<div id=\"df-1f6a8fe6-19a8-437d-9b8b-3d484d10c40f\">\n", | |
| " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-1f6a8fe6-19a8-437d-9b8b-3d484d10c40f')\"\n", | |
| " title=\"Suggest charts\"\n", | |
| " style=\"display:none;\">\n", | |
| "\n", | |
| "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", | |
| " width=\"24px\">\n", | |
| " <g>\n", | |
| " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", | |
| " </g>\n", | |
| "</svg>\n", | |
| " </button>\n", | |
| "\n", | |
| "<style>\n", | |
| " .colab-df-quickchart {\n", | |
| " --bg-color: #E8F0FE;\n", | |
| " --fill-color: #1967D2;\n", | |
| " --hover-bg-color: #E2EBFA;\n", | |
| " --hover-fill-color: #174EA6;\n", | |
| " --disabled-fill-color: #AAA;\n", | |
| " --disabled-bg-color: #DDD;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-quickchart {\n", | |
| " --bg-color: #3B4455;\n", | |
| " --fill-color: #D2E3FC;\n", | |
| " --hover-bg-color: #434B5C;\n", | |
| " --hover-fill-color: #FFFFFF;\n", | |
| " --disabled-bg-color: #3B4455;\n", | |
| " --disabled-fill-color: #666;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart {\n", | |
| " background-color: var(--bg-color);\n", | |
| " border: none;\n", | |
| " border-radius: 50%;\n", | |
| " cursor: pointer;\n", | |
| " display: none;\n", | |
| " fill: var(--fill-color);\n", | |
| " height: 32px;\n", | |
| " padding: 0;\n", | |
| " width: 32px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart:hover {\n", | |
| " background-color: var(--hover-bg-color);\n", | |
| " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", | |
| " fill: var(--button-hover-fill-color);\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart-complete:disabled,\n", | |
| " .colab-df-quickchart-complete:disabled:hover {\n", | |
| " background-color: var(--disabled-bg-color);\n", | |
| " fill: var(--disabled-fill-color);\n", | |
| " box-shadow: none;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-spinner {\n", | |
| " border: 2px solid var(--fill-color);\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " animation:\n", | |
| " spin 1s steps(1) infinite;\n", | |
| " }\n", | |
| "\n", | |
| " @keyframes spin {\n", | |
| " 0% {\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " }\n", | |
| " 20% {\n", | |
| " border-color: transparent;\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " }\n", | |
| " 30% {\n", | |
| " border-color: transparent;\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " }\n", | |
| " 40% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " }\n", | |
| " 60% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " }\n", | |
| " 80% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " }\n", | |
| " 90% {\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " }\n", | |
| " }\n", | |
| "</style>\n", | |
| "\n", | |
| " <script>\n", | |
| " async function quickchart(key) {\n", | |
| " const quickchartButtonEl =\n", | |
| " document.querySelector('#' + key + ' button');\n", | |
| " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", | |
| " quickchartButtonEl.classList.add('colab-df-spinner');\n", | |
| " try {\n", | |
| " const charts = await google.colab.kernel.invokeFunction(\n", | |
| " 'suggestCharts', [key], {});\n", | |
| " } catch (error) {\n", | |
| " console.error('Error during call to suggestCharts:', error);\n", | |
| " }\n", | |
| " quickchartButtonEl.classList.remove('colab-df-spinner');\n", | |
| " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", | |
| " }\n", | |
| " (() => {\n", | |
| " let quickchartButtonEl =\n", | |
| " document.querySelector('#df-1f6a8fe6-19a8-437d-9b8b-3d484d10c40f button');\n", | |
| " quickchartButtonEl.style.display =\n", | |
| " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", | |
| " })();\n", | |
| " </script>\n", | |
| "</div>\n", | |
| " </div>\n", | |
| " </div>\n" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 4 | |
| } | |
| ], | |
| "source": [ | |
| "# Returns the first 5 positive rows\n", | |
| "dbs_pos_rev.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 206 | |
| }, | |
| "id": "sN4ewJ4a4v2u", | |
| "outputId": "7503b646-a2a4-4e94-dd9a-649eac3e9e7c" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| " Author Author Country Date Rating \\\n", | |
| "0 William Miller GB 20/12/2019 1 \n", | |
| "1 Sridhar Iyer SG 7/10/2020 1 \n", | |
| "2 Ioana C GB 16/12/2020 1 \n", | |
| "3 Matt B US 18/12/2020 1 \n", | |
| "4 yi xiong yao SG 19/12/2020 1 \n", | |
| "\n", | |
| " Review \n", | |
| "0 I moved to Bangkok from Singapore in September... \n", | |
| "1 DBS is a terrible bank when it comes to how lo... \n", | |
| "2 Very disappointed with the customer service at... \n", | |
| "3 Called one month ago to ask to close an accoun... \n", | |
| "4 The staff at counter 8 (2nd storey,last servic... " | |
| ], | |
| "text/html": [ | |
| "\n", | |
| " <div id=\"df-5c11f561-0fc0-481c-8d49-fcb20d842d01\" class=\"colab-df-container\">\n", | |
| " <div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>Author</th>\n", | |
| " <th>Author Country</th>\n", | |
| " <th>Date</th>\n", | |
| " <th>Rating</th>\n", | |
| " <th>Review</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>William Miller</td>\n", | |
| " <td>GB</td>\n", | |
| " <td>20/12/2019</td>\n", | |
| " <td>1</td>\n", | |
| " <td>I moved to Bangkok from Singapore in September...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>Sridhar Iyer</td>\n", | |
| " <td>SG</td>\n", | |
| " <td>7/10/2020</td>\n", | |
| " <td>1</td>\n", | |
| " <td>DBS is a terrible bank when it comes to how lo...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>Ioana C</td>\n", | |
| " <td>GB</td>\n", | |
| " <td>16/12/2020</td>\n", | |
| " <td>1</td>\n", | |
| " <td>Very disappointed with the customer service at...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>Matt B</td>\n", | |
| " <td>US</td>\n", | |
| " <td>18/12/2020</td>\n", | |
| " <td>1</td>\n", | |
| " <td>Called one month ago to ask to close an accoun...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>yi xiong yao</td>\n", | |
| " <td>SG</td>\n", | |
| " <td>19/12/2020</td>\n", | |
| " <td>1</td>\n", | |
| " <td>The staff at counter 8 (2nd storey,last servic...</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>\n", | |
| " <div class=\"colab-df-buttons\">\n", | |
| "\n", | |
| " <div class=\"colab-df-container\">\n", | |
| " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-5c11f561-0fc0-481c-8d49-fcb20d842d01')\"\n", | |
| " title=\"Convert this dataframe to an interactive table.\"\n", | |
| " style=\"display:none;\">\n", | |
| "\n", | |
| " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", | |
| " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", | |
| " </svg>\n", | |
| " </button>\n", | |
| "\n", | |
| " <style>\n", | |
| " .colab-df-container {\n", | |
| " display:flex;\n", | |
| " gap: 12px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-convert {\n", | |
| " background-color: #E8F0FE;\n", | |
| " border: none;\n", | |
| " border-radius: 50%;\n", | |
| " cursor: pointer;\n", | |
| " display: none;\n", | |
| " fill: #1967D2;\n", | |
| " height: 32px;\n", | |
| " padding: 0 0 0 0;\n", | |
| " width: 32px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-convert:hover {\n", | |
| " background-color: #E2EBFA;\n", | |
| " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", | |
| " fill: #174EA6;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-buttons div {\n", | |
| " margin-bottom: 4px;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-convert {\n", | |
| " background-color: #3B4455;\n", | |
| " fill: #D2E3FC;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-convert:hover {\n", | |
| " background-color: #434B5C;\n", | |
| " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", | |
| " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", | |
| " fill: #FFFFFF;\n", | |
| " }\n", | |
| " </style>\n", | |
| "\n", | |
| " <script>\n", | |
| " const buttonEl =\n", | |
| " document.querySelector('#df-5c11f561-0fc0-481c-8d49-fcb20d842d01 button.colab-df-convert');\n", | |
| " buttonEl.style.display =\n", | |
| " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", | |
| "\n", | |
| " async function convertToInteractive(key) {\n", | |
| " const element = document.querySelector('#df-5c11f561-0fc0-481c-8d49-fcb20d842d01');\n", | |
| " const dataTable =\n", | |
| " await google.colab.kernel.invokeFunction('convertToInteractive',\n", | |
| " [key], {});\n", | |
| " if (!dataTable) return;\n", | |
| "\n", | |
| " const docLinkHtml = 'Like what you see? Visit the ' +\n", | |
| " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", | |
| " + ' to learn more about interactive tables.';\n", | |
| " element.innerHTML = '';\n", | |
| " dataTable['output_type'] = 'display_data';\n", | |
| " await google.colab.output.renderOutput(dataTable, element);\n", | |
| " const docLink = document.createElement('div');\n", | |
| " docLink.innerHTML = docLinkHtml;\n", | |
| " element.appendChild(docLink);\n", | |
| " }\n", | |
| " </script>\n", | |
| " </div>\n", | |
| "\n", | |
| "\n", | |
| "<div id=\"df-f46d62f6-9152-45ff-b236-be79dfec3959\">\n", | |
| " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-f46d62f6-9152-45ff-b236-be79dfec3959')\"\n", | |
| " title=\"Suggest charts\"\n", | |
| " style=\"display:none;\">\n", | |
| "\n", | |
| "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", | |
| " width=\"24px\">\n", | |
| " <g>\n", | |
| " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", | |
| " </g>\n", | |
| "</svg>\n", | |
| " </button>\n", | |
| "\n", | |
| "<style>\n", | |
| " .colab-df-quickchart {\n", | |
| " --bg-color: #E8F0FE;\n", | |
| " --fill-color: #1967D2;\n", | |
| " --hover-bg-color: #E2EBFA;\n", | |
| " --hover-fill-color: #174EA6;\n", | |
| " --disabled-fill-color: #AAA;\n", | |
| " --disabled-bg-color: #DDD;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-quickchart {\n", | |
| " --bg-color: #3B4455;\n", | |
| " --fill-color: #D2E3FC;\n", | |
| " --hover-bg-color: #434B5C;\n", | |
| " --hover-fill-color: #FFFFFF;\n", | |
| " --disabled-bg-color: #3B4455;\n", | |
| " --disabled-fill-color: #666;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart {\n", | |
| " background-color: var(--bg-color);\n", | |
| " border: none;\n", | |
| " border-radius: 50%;\n", | |
| " cursor: pointer;\n", | |
| " display: none;\n", | |
| " fill: var(--fill-color);\n", | |
| " height: 32px;\n", | |
| " padding: 0;\n", | |
| " width: 32px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart:hover {\n", | |
| " background-color: var(--hover-bg-color);\n", | |
| " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", | |
| " fill: var(--button-hover-fill-color);\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart-complete:disabled,\n", | |
| " .colab-df-quickchart-complete:disabled:hover {\n", | |
| " background-color: var(--disabled-bg-color);\n", | |
| " fill: var(--disabled-fill-color);\n", | |
| " box-shadow: none;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-spinner {\n", | |
| " border: 2px solid var(--fill-color);\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " animation:\n", | |
| " spin 1s steps(1) infinite;\n", | |
| " }\n", | |
| "\n", | |
| " @keyframes spin {\n", | |
| " 0% {\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " }\n", | |
| " 20% {\n", | |
| " border-color: transparent;\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " }\n", | |
| " 30% {\n", | |
| " border-color: transparent;\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " }\n", | |
| " 40% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " }\n", | |
| " 60% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " }\n", | |
| " 80% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " }\n", | |
| " 90% {\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " }\n", | |
| " }\n", | |
| "</style>\n", | |
| "\n", | |
| " <script>\n", | |
| " async function quickchart(key) {\n", | |
| " const quickchartButtonEl =\n", | |
| " document.querySelector('#' + key + ' button');\n", | |
| " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", | |
| " quickchartButtonEl.classList.add('colab-df-spinner');\n", | |
| " try {\n", | |
| " const charts = await google.colab.kernel.invokeFunction(\n", | |
| " 'suggestCharts', [key], {});\n", | |
| " } catch (error) {\n", | |
| " console.error('Error during call to suggestCharts:', error);\n", | |
| " }\n", | |
| " quickchartButtonEl.classList.remove('colab-df-spinner');\n", | |
| " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", | |
| " }\n", | |
| " (() => {\n", | |
| " let quickchartButtonEl =\n", | |
| " document.querySelector('#df-f46d62f6-9152-45ff-b236-be79dfec3959 button');\n", | |
| " quickchartButtonEl.style.display =\n", | |
| " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", | |
| " })();\n", | |
| " </script>\n", | |
| "</div>\n", | |
| " </div>\n", | |
| " </div>\n" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 5 | |
| } | |
| ], | |
| "source": [ | |
| "# Returns the first 5 negative rows\n", | |
| "dbs_neg_rev.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "V4s4s9dFjEi6", | |
| "outputId": "29b9d541-7ea4-40a3-c952-381adff53508" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "Author 0\n", | |
| "Author Country 0\n", | |
| "Date 0\n", | |
| "Rating 0\n", | |
| "Review 0\n", | |
| "dtype: int64" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 6 | |
| } | |
| ], | |
| "source": [ | |
| "# Detect missing values for the whole scrapped dataset\n", | |
| "dbs_rev.isna().sum()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "Sf8iOqtDz8rh", | |
| "outputId": "8c3a78d5-0805-4d53-c9fd-8ec7349d3d0f" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "1 75.0\n", | |
| "5 15.0\n", | |
| "4 5.0\n", | |
| "3 3.0\n", | |
| "2 2.0\n", | |
| "Name: Rating, dtype: float64" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 7 | |
| } | |
| ], | |
| "source": [ | |
| "pc_rating = 100 * dbs_rev['Rating'].value_counts() / len(dbs_rev)\n", | |
| "pc_rating # percentage of ratings in the scrapped dataset" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "iouUY6KV5S0J", | |
| "outputId": "edda2f8f-e7e1-4e75-adc9-ec676c933240" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "5 15\n", | |
| "4 5\n", | |
| "Name: Rating, dtype: int64" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 8 | |
| } | |
| ], | |
| "source": [ | |
| "dbs_pos_rev.Rating.value_counts()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "QDp6L0u655hm" | |
| }, | |
| "source": [ | |
| "Slightly more than 3/4 of the scrapped dataset is skewed towards Rating '1' which is the worst dissatisfaction." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 842 | |
| }, | |
| "id": "B620qD6G2I-a", | |
| "outputId": "0199d061-3736-41d9-982a-dcd3a0ffb159" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "<Axes: >" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 9 | |
| }, | |
| { | |
| "output_type": "display_data", | |
| "data": { | |
| "text/plain": [ | |
| "<Figure size 1500x1000 with 1 Axes>" | |
| ], | |
| "image/png": "iVBORw0KGgoAAAANSUhEUgAABLkAAAMoCAYAAAAjpjKlAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAlM0lEQVR4nO3df6yWdf348deBo7eEnFsh+XHyIKxZhhhWqDPNQbLcGZK29UOnxmjrh5Fkp0zOFiozO9iao5Jh2graxB9/JJmVzuEPcoIKpMutUArxTAfYsnPLcd6xw/354/v1XkdAPHrdHF6cx2O7/riv633u9+sPNu49d93X3VSr1WoBAAAAAIkNG+wBAAAAAOC9ErkAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0mgd7gLfas2dPvPzyyzFq1Khoamoa7HEAAAAAGES1Wi1ee+21aG1tjWHD9n+/1iEXuV5++eVoa2sb7DEAAAAAOIR0d3fH8ccfv9/rh1zkGjVqVET8v8FbWloGeRoAAAAABlOlUom2trZ6M9qfQy5yvfkVxZaWFpELAAAAgIiIAz7WyoPnAQAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIr3mwB4BGmbTwD4M9AjDEvbBk9mCPAAAAQ4Y7uQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSG3DkWrt2bcyZMydaW1ujqakpVq9evd+13/jGN6KpqSmWLl36HkYEAAAAgLc34MjV29sb06ZNi2XLlr3tunvuuSfWr18fra2t73o4AAAAAHgnmgf6B+3t7dHe3v62a1566aW44oor4oEHHojZs2e/6+EAAAAA4J0YcOQ6kD179sRll10WV111VZx88skHXF+tVqNardZfVyqVokcCAAAA4DBX+IPnb7zxxmhubo4FCxa8o/VdXV1RLpfrR1tbW9EjAQAAAHCYKzRybdy4MX7605/GihUroqmp6R39TWdnZ/T09NSP7u7uIkcCAAAAYAgoNHL9+c9/jp07d8bEiROjubk5mpubY9u2bfHd7343Jk2atM+/KZVK0dLS0u8AAAAAgIEo9Jlcl112WcyaNavfufPOOy8uu+yymDdvXpFbAQAAAEDdgCPXrl27YsuWLfXXW7dujaeffjpGjx4dEydOjDFjxvRbf8QRR8T48ePjwx/+8HufFgAAAAD2YcCRa8OGDTFz5sz6646OjoiImDt3bqxYsaKwwQAAAADgnRpw5JoxY0bUarV3vP6FF14Y6BYAAAAAMCCFPngeAAAAAAaDyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOkNOHKtXbs25syZE62trdHU1BSrV6+uX9u9e3dcffXVccopp8TIkSOjtbU1vvzlL8fLL79c5MwAAAAA0M+AI1dvb29MmzYtli1btte1119/PTZt2hSLFi2KTZs2xW9/+9vYvHlzfPazny1kWAAAAADYl+aB/kF7e3u0t7fv81q5XI4HH3yw37mbb745Tj/99HjxxRdj4sSJ725KAAAAAHgbA45cA9XT0xNNTU1xzDHH7PN6tVqNarVaf12pVBo9EgAAAACHmYY+eP6NN96Iq6++Oi6++OJoaWnZ55qurq4ol8v1o62trZEjAQAAAHAYaljk2r17d3zxi1+MWq0Wy5cv3++6zs7O6OnpqR/d3d2NGgkAAACAw1RDvq74ZuDatm1bPPTQQ/u9iysiolQqRalUasQYAAAAAAwRhUeuNwPX888/Hw8//HCMGTOm6C0AAAAAoJ8BR65du3bFli1b6q+3bt0aTz/9dIwePTomTJgQn//852PTpk1x3333RV9fX2zfvj0iIkaPHh1HHnlkcZMDAAAAwP834Mi1YcOGmDlzZv11R0dHRETMnTs3rrvuurj33nsjIuLUU0/t93cPP/xwzJgx491PCgAAAAD7MeDINWPGjKjVavu9/nbXAAAAAKARGvbrigAAAABwsIhcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQ3oAj19q1a2POnDnR2toaTU1NsXr16n7Xa7VaXHPNNTFhwoQYMWJEzJo1K55//vmi5gUAAACAvQw4cvX29sa0adNi2bJl+7z+4x//OH72s5/FLbfcEk888USMHDkyzjvvvHjjjTfe87AAAAAAsC/NA/2D9vb2aG9v3+e1Wq0WS5cujR/84AdxwQUXRETEb37zmxg3blysXr06Lrroovc2LQAAAADsQ6HP5Nq6dWts3749Zs2aVT9XLpfjjDPOiHXr1u3zb6rValQqlX4HAAAAAAxEoZFr+/btERExbty4fufHjRtXv/ZWXV1dUS6X60dbW1uRIwEAAAAwBAz6ryt2dnZGT09P/eju7h7skQAAAABIptDINX78+IiI2LFjR7/zO3bsqF97q1KpFC0tLf0OAAAAABiIQiPX5MmTY/z48bFmzZr6uUqlEk888USceeaZRW4FAAAAAHUD/nXFXbt2xZYtW+qvt27dGk8//XSMHj06Jk6cGFdeeWX88Ic/jBNPPDEmT54cixYtitbW1rjwwguLnBsAAAAA6gYcuTZs2BAzZ86sv+7o6IiIiLlz58aKFSvi+9//fvT29sbXvva1+M9//hNnn3123H///XHUUUcVNzUAAAAA/I+mWq1WG+wh/lelUolyuRw9PT2ez8V7MmnhHwZ7BGCIe2HJ7MEeAQAA0nunrWjQf10RAAAAAN4rkQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9AqPXH19fbFo0aKYPHlyjBgxIj74wQ/G9ddfH7VareitAAAAACAiIpqLfsMbb7wxli9fHitXroyTTz45NmzYEPPmzYtyuRwLFiwoejsAAAAAKD5yPf7443HBBRfE7NmzIyJi0qRJcccdd8STTz5Z9FYAAAAAEBEN+LriJz/5yVizZk0899xzERHxzDPPxGOPPRbt7e37XF+tVqNSqfQ7AAAAAGAgCr+Ta+HChVGpVOKkk06K4cOHR19fX9xwww1xySWX7HN9V1dXLF68uOgxAAAAABhCCr+T6+67747bb789Vq1aFZs2bYqVK1fGT37yk1i5cuU+13d2dkZPT0/96O7uLnokAAAAAA5zhd/JddVVV8XChQvjoosuioiIU045JbZt2xZdXV0xd+7cvdaXSqUolUpFjwEAAADAEFL4nVyvv/56DBvW/22HDx8ee/bsKXorAAAAAIiIBtzJNWfOnLjhhhti4sSJcfLJJ8df/vKXuOmmm+IrX/lK0VsBAAAAQEQ0IHL9/Oc/j0WLFsU3v/nN2LlzZ7S2tsbXv/71uOaaa4reCgAAAAAiogGRa9SoUbF06dJYunRp0W8NAAAAAPtU+DO5AAAAAOBgE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACC9hkSul156KS699NIYM2ZMjBgxIk455ZTYsGFDI7YCAAAAgGgu+g1fffXVOOuss2LmzJnxpz/9KY477rh4/vnn49hjjy16KwAAAACIiAZErhtvvDHa2tri17/+df3c5MmTi94GAAAAAOoK/7rivffeG9OnT48vfOELMXbs2PjYxz4Wt912237XV6vVqFQq/Q4AAAAAGIjCI9c///nPWL58eZx44onxwAMPxOWXXx4LFiyIlStX7nN9V1dXlMvl+tHW1lb0SAAAAAAc5ppqtVqtyDc88sgjY/r06fH444/Xzy1YsCCeeuqpWLdu3V7rq9VqVKvV+utKpRJtbW3R09MTLS0tRY7GEDNp4R8GewRgiHthyezBHgEAANKrVCpRLpcP2IoKv5NrwoQJMWXKlH7nPvKRj8SLL764z/WlUilaWlr6HQAAAAAwEIVHrrPOOis2b97c79xzzz0XJ5xwQtFbAQAAAEBENCByfec734n169fHj370o9iyZUusWrUqbr311pg/f37RWwEAAABARDQgcp122mlxzz33xB133BFTp06N66+/PpYuXRqXXHJJ0VsBAAAAQERENDfiTc8///w4//zzG/HWAAAAALCXwu/kAgAAAICDTeQCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgvebBHgAAAGicSQv/MNgjAMQLS2YP9ggMAe7kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEiv4ZFryZIl0dTUFFdeeWWjtwIAAABgiGpo5HrqqafiF7/4RXz0ox9t5DYAAAAADHENi1y7du2KSy65JG677bY49thjG7UNAAAAADQucs2fPz9mz54ds2bNett11Wo1KpVKvwMAAAAABqK5EW965513xqZNm+Kpp5464Nqurq5YvHhxI8YAAAAAYIgo/E6u7u7u+Pa3vx233357HHXUUQdc39nZGT09PfWju7u76JEAAAAAOMwVfifXxo0bY+fOnfHxj3+8fq6vry/Wrl0bN998c1Sr1Rg+fHj9WqlUilKpVPQYAAAAAAwhhUeuc889N/7617/2Ozdv3rw46aST4uqrr+4XuAAAAACgCIVHrlGjRsXUqVP7nRs5cmSMGTNmr/MAAAAAUISG/boiAAAAABwsDfl1xbd65JFHDsY2AAAAAAxR7uQCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0hO5AAAAAEhP5AIAAAAgPZELAAAAgPRELgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0RC4AAAAA0is8cnV1dcVpp50Wo0aNirFjx8aFF14YmzdvLnobAAAAAKgrPHI9+uijMX/+/Fi/fn08+OCDsXv37vjMZz4Tvb29RW8FAAAAABER0Vz0G95///39Xq9YsSLGjh0bGzdujHPOOafo7QAAAACg+Mj1Vj09PRERMXr06H1er1arUa1W668rlUqjRwIAAADgMNPQB8/v2bMnrrzyyjjrrLNi6tSp+1zT1dUV5XK5frS1tTVyJAAAAAAOQw2NXPPnz49nn3027rzzzv2u6ezsjJ6envrR3d3dyJEAAAAAOAw17OuK3/rWt+K+++6LtWvXxvHHH7/fdaVSKUqlUqPGAAAAAGAIKDxy1Wq1uOKKK+Kee+6JRx55JCZPnlz0FgAAAADQT+GRa/78+bFq1ar43e9+F6NGjYrt27dHRES5XI4RI0YUvR0AAAAAFP9MruXLl0dPT0/MmDEjJkyYUD/uuuuuorcCAAAAgIho0NcVAQAAAOBgauivKwIAAADAwSByAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeiIXAAAAAOmJXAAAAACkJ3IBAAAAkJ7IBQAAAEB6IhcAAAAA6YlcAAAAAKQncgEAAACQnsgFAAAAQHoiFwAAAADpiVwAAAAApCdyAQAAAJCeyAUAAABAeg2LXMuWLYtJkybFUUcdFWeccUY8+eSTjdoKAAAAgCGuIZHrrrvuio6Ojrj22mtj06ZNMW3atDjvvPNi586djdgOAAAAgCGuIZHrpptuiq9+9asxb968mDJlStxyyy3xvve9L371q181YjsAAAAAhrjmot/wv//9b2zcuDE6Ozvr54YNGxazZs2KdevW7bW+Wq1GtVqtv+7p6YmIiEqlUvRoDDF7qq8P9gjAEOf/MuBQ4DMRcCjwuYj34s1/P7Va7W3XFR65/vWvf0VfX1+MGzeu3/lx48bF3//+973Wd3V1xeLFi/c639bWVvRoAHBQlZcO9gQAAIcGn4sowmuvvRblcnm/1wuPXAPV2dkZHR0d9dd79uyJf//73zFmzJhoamoaxMmAoaxSqURbW1t0d3dHS0vLYI8DADAofCYCDgW1Wi1ee+21aG1tfdt1hUeu97///TF8+PDYsWNHv/M7duyI8ePH77W+VCpFqVTqd+6YY44peiyAd6WlpcUHOgBgyPOZCBhsb3cH15sKf/D8kUceGZ/4xCdizZo19XN79uyJNWvWxJlnnln0dgAAAADQmK8rdnR0xNy5c2P69Olx+umnx9KlS6O3tzfmzZvXiO0AAAAAGOIaErm+9KUvxSuvvBLXXHNNbN++PU499dS4//7793oYPcChqlQqxbXXXrvX16kBAIYSn4mATJpqB/r9RQAAAAA4xBX+TC4AAAAAONhELgAAAADSE7kAAAAASE/kAgAAYL88xhnIQuQCAABgv0qlUvztb38b7DEADqh5sAcAONT09vbG3XffHVu2bIkJEybExRdfHGPGjBnssQAAGqqjo2Of5/v6+mLJkiX1z0M33XTTwRwL4B1rqrn3FBjipkyZEo899liMHj06uru745xzzolXX301PvShD8U//vGPaG5ujvXr18fkyZMHe1QAgIYZNmxYTJs2LY455ph+5x999NGYPn16jBw5MpqamuKhhx4anAEBDkDkAoa8YcOGxfbt22Ps2LFx6aWXxtatW+OPf/xjlMvl2LVrV3zuc5+L4447LlatWjXYowIANMySJUvi1ltvjV/+8pfx6U9/un7+iCOOiGeeeSamTJkyiNMBHJhncgH8j3Xr1sV1110X5XI5IiKOPvroWLx4cTz22GODPBkAQGMtXLgw7rrrrrj88svje9/7XuzevXuwRwIYEJELICKampoiIuKNN96ICRMm9Lv2gQ98IF555ZXBGAsA4KA67bTTYuPGjfHKK6/E9OnT49lnn61/TgI41HnwPEBEnHvuudHc3ByVSiU2b94cU6dOrV/btm2bB88DAEPG0UcfHStXrow777wzZs2aFX19fYM9EsA7InIBQ961117b7/XRRx/d7/Xvf//7+NSnPnUwRwIAGHQXXXRRnH322bFx48Y44YQTBnscgAPy4HkAAAAA0vNMLgAAAADSE7kAAAAASE/kAgAAACA9kQsAAACA9EQuAAAAANITuQAAAABIT+QCAAAAID2RCwAAAID0/g8V5Z9GeBqxdwAAAABJRU5ErkJggg==\n" | |
| }, | |
| "metadata": {} | |
| } | |
| ], | |
| "source": [ | |
| "rcParams['figure.figsize'] = 15,10\n", | |
| "dbs_pos_rev.Rating.value_counts().plot(kind=\"bar\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "6nfQFmpu2YPa" | |
| }, | |
| "source": [ | |
| "The visualisation above shows that target column of the postive scrapped dataset is not equally distributed and skewed more towards Rating '5'." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 842 | |
| }, | |
| "id": "cxesvnJP6z3a", | |
| "outputId": "021daec8-e738-4eb8-9053-d7f9be2975ab" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "<Axes: >" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 10 | |
| }, | |
| { | |
| "output_type": "display_data", | |
| "data": { | |
| "text/plain": [ | |
| "<Figure size 1500x1000 with 1 Axes>" | |
| ], | |
| "image/png": "\n" | |
| }, | |
| "metadata": {} | |
| } | |
| ], | |
| "source": [ | |
| "rcParams['figure.figsize'] = 15,10\n", | |
| "dbs_neg_rev.Rating.value_counts().plot(kind=\"bar\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "klpZdAsU67fL" | |
| }, | |
| "source": [ | |
| "The visualisation above shows that target column is incredibly skewed towards Rating '5' in the negative scrapped dataset." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "AX2mr8PH9y-j", | |
| "outputId": "f6c077fd-e634-4750-f5be-ce35ad2f3560" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stdout", | |
| "text": [ | |
| "(15, 5)\n", | |
| "(5, 5)\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Separate the dataset into different dataframes based on the label columns\n", | |
| "best_pos_rev = dbs_pos_rev[dbs_pos_rev['Rating'] == 5] # Review Rating of 5\n", | |
| "secbest_pos_rev = dbs_pos_rev[dbs_pos_rev['Rating'] == 4] # Review Rating of 4\n", | |
| "\n", | |
| "print(best_pos_rev.shape)\n", | |
| "print(secbest_pos_rev.shape)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "IGvxfDdf-qPg", | |
| "outputId": "e0059e51-662b-41f1-c994-2b3367441953" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stdout", | |
| "text": [ | |
| "The maximum length of string in reviews of rating of 5 is 2000 words.\n", | |
| "The maximum length of string in reviews of rating of 4 is 1375 words.\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "best_pos_rev_len = best_pos_rev['Review'].str.len()\n", | |
| "secbest_pos_rev_len = secbest_pos_rev['Review'].str.len()\n", | |
| "\n", | |
| "print(f\"The maximum length of string in reviews of rating of 5 is {max(best_pos_rev_len)} words.\")\n", | |
| "print(f\"The maximum length of string in reviews of rating of 4 is {max(secbest_pos_rev_len)} words.\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "wJgyl1OWC9yg" | |
| }, | |
| "source": [ | |
| "##**Data Pre-processing**\n", | |
| "Perform data preprocessing such as **tokenisations**, **stop words removal**, **special characters/ alphabets handling** and **normalisation (lowercase conversion)**." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "giVe6or_-uDf" | |
| }, | |
| "source": [ | |
| "###**Tokenisation, Stop Words Removal, Special Characters Handling and Normalisation (Lowercase Conversion)**" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "ZsYO9TzbW9q7", | |
| "outputId": "186536f7-9055-40f7-ef53-f0f15a8ff1d3" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stderr", | |
| "text": [ | |
| "[nltk_data] Downloading package punkt to /root/nltk_data...\n", | |
| "[nltk_data] Package punkt is already up-to-date!\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "nltk.download('punkt')\n", | |
| "import nltk\n", | |
| "\n", | |
| "def tokeniseandstopwords(Review):\n", | |
| " # Load spaCy model\n", | |
| " nlp = spacy.load(\"en_core_web_sm\")\n", | |
| " # Process the review\n", | |
| " doc = nlp(Review)\n", | |
| " # Extract tokens and remove stop words\n", | |
| " tokens = [token.text for token in doc if not token.is_stop]\n", | |
| " # Join the remaining words\n", | |
| " joined_words = \" \".join(tokens)\n", | |
| "\n", | |
| " return joined_words" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "7dT5IW9xg_z-", | |
| "outputId": "cc11166e-567d-4080-fa3f-09dedfd3b9d5" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stdout", | |
| "text": [ | |
| "Requirement already satisfied: demoji in /usr/local/lib/python3.10/dist-packages (1.1.0)\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!pip install demoji\n", | |
| "import demoji\n", | |
| "\n", | |
| "def _remove_emojis(Review):\n", | |
| " return demoji.replace(Review) # Replace emojis based on demoji version\n", | |
| "\n", | |
| "'''Make reviews lowercase, remove reviews in square brackets, remove links,\n", | |
| " remove punctuation and remove words containing numbers.'''\n", | |
| "def clean_reviews(Review):\n", | |
| " crev = str(Review).lower()\n", | |
| " crev = re.sub('\\[.*?\\]', '', crev)\n", | |
| " crev = re.sub('https?://\\S+|www\\.\\S+', '', crev)\n", | |
| " crev = re.sub('<.*?>+', '', crev)\n", | |
| " crev = re.sub('[%s]' % re.escape(string.punctuation), '', crev)\n", | |
| " crev = re.sub('\\n', '', crev)\n", | |
| " crev = re.sub('\\w*\\d\\w*', '', crev)\n", | |
| " crev = re.sub(r\"\\s{2,}\", \" \", crev)\n", | |
| " crev = _remove_emojis(crev)\n", | |
| "\n", | |
| " return crev # return clean reviews only" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "Fth539I2DO8u" | |
| }, | |
| "source": [ | |
| "####**Preprocessing of Positive and Negative Scrapped Dataset**" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "8OLaApJUDhhI" | |
| }, | |
| "source": [ | |
| "#####Positive Scrapped Dataset" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "-PHnRppYErQV" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "best_pos_rev['Review'] = best_pos_rev['Review'].apply(lambda x:clean_reviews(x))\n", | |
| "secbest_pos_rev['Review'] = secbest_pos_rev['Review'].apply(lambda x:clean_reviews(x))\n", | |
| "\n", | |
| "best_pos_rev['Review'] = best_pos_rev['Review'].apply(tokeniseandstopwords)\n", | |
| "secbest_pos_rev['Review'] = secbest_pos_rev['Review'].apply(tokeniseandstopwords)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 206 | |
| }, | |
| "id": "CWcgsNqCm_Cd", | |
| "outputId": "9e694b9c-dc71-4d69-a64d-4b804f102ec7" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| " Author Author Country Date Rating \\\n", | |
| "0 NaN NaN NaN NaN \n", | |
| "1 Richard Rowe TH 27/5/2020 5.0 \n", | |
| "2 Ananya Agarwal IN 11/12/2020 5.0 \n", | |
| "3 NaN NaN NaN NaN \n", | |
| "4 Ashwin Krishnan P C IN 1/4/2021 5.0 \n", | |
| "\n", | |
| " Review \n", | |
| "0 NaN \n", | |
| "1 ve customer years exemplary experiences \n", | |
| "2 wow bhumi meri hai \n", | |
| "3 NaN \n", | |
| "4 internet banking best world " | |
| ], | |
| "text/html": [ | |
| "\n", | |
| " <div id=\"df-f92ba57f-82af-4167-a776-79a62c4aa6e7\" class=\"colab-df-container\">\n", | |
| " <div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>Author</th>\n", | |
| " <th>Author Country</th>\n", | |
| " <th>Date</th>\n", | |
| " <th>Rating</th>\n", | |
| " <th>Review</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>Richard Rowe</td>\n", | |
| " <td>TH</td>\n", | |
| " <td>27/5/2020</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>ve customer years exemplary experiences</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>Ananya Agarwal</td>\n", | |
| " <td>IN</td>\n", | |
| " <td>11/12/2020</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>wow bhumi meri hai</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>Ashwin Krishnan P C</td>\n", | |
| " <td>IN</td>\n", | |
| " <td>1/4/2021</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>internet banking best world</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>\n", | |
| " <div class=\"colab-df-buttons\">\n", | |
| "\n", | |
| " <div class=\"colab-df-container\">\n", | |
| " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-f92ba57f-82af-4167-a776-79a62c4aa6e7')\"\n", | |
| " title=\"Convert this dataframe to an interactive table.\"\n", | |
| " style=\"display:none;\">\n", | |
| "\n", | |
| " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", | |
| " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", | |
| " </svg>\n", | |
| " </button>\n", | |
| "\n", | |
| " <style>\n", | |
| " .colab-df-container {\n", | |
| " display:flex;\n", | |
| " gap: 12px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-convert {\n", | |
| " background-color: #E8F0FE;\n", | |
| " border: none;\n", | |
| " border-radius: 50%;\n", | |
| " cursor: pointer;\n", | |
| " display: none;\n", | |
| " fill: #1967D2;\n", | |
| " height: 32px;\n", | |
| " padding: 0 0 0 0;\n", | |
| " width: 32px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-convert:hover {\n", | |
| " background-color: #E2EBFA;\n", | |
| " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", | |
| " fill: #174EA6;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-buttons div {\n", | |
| " margin-bottom: 4px;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-convert {\n", | |
| " background-color: #3B4455;\n", | |
| " fill: #D2E3FC;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-convert:hover {\n", | |
| " background-color: #434B5C;\n", | |
| " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", | |
| " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", | |
| " fill: #FFFFFF;\n", | |
| " }\n", | |
| " </style>\n", | |
| "\n", | |
| " <script>\n", | |
| " const buttonEl =\n", | |
| " document.querySelector('#df-f92ba57f-82af-4167-a776-79a62c4aa6e7 button.colab-df-convert');\n", | |
| " buttonEl.style.display =\n", | |
| " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", | |
| "\n", | |
| " async function convertToInteractive(key) {\n", | |
| " const element = document.querySelector('#df-f92ba57f-82af-4167-a776-79a62c4aa6e7');\n", | |
| " const dataTable =\n", | |
| " await google.colab.kernel.invokeFunction('convertToInteractive',\n", | |
| " [key], {});\n", | |
| " if (!dataTable) return;\n", | |
| "\n", | |
| " const docLinkHtml = 'Like what you see? Visit the ' +\n", | |
| " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", | |
| " + ' to learn more about interactive tables.';\n", | |
| " element.innerHTML = '';\n", | |
| " dataTable['output_type'] = 'display_data';\n", | |
| " await google.colab.output.renderOutput(dataTable, element);\n", | |
| " const docLink = document.createElement('div');\n", | |
| " docLink.innerHTML = docLinkHtml;\n", | |
| " element.appendChild(docLink);\n", | |
| " }\n", | |
| " </script>\n", | |
| " </div>\n", | |
| "\n", | |
| "\n", | |
| "<div id=\"df-7b011e45-15e7-4ff2-b233-b490996b9531\">\n", | |
| " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-7b011e45-15e7-4ff2-b233-b490996b9531')\"\n", | |
| " title=\"Suggest charts\"\n", | |
| " style=\"display:none;\">\n", | |
| "\n", | |
| "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", | |
| " width=\"24px\">\n", | |
| " <g>\n", | |
| " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", | |
| " </g>\n", | |
| "</svg>\n", | |
| " </button>\n", | |
| "\n", | |
| "<style>\n", | |
| " .colab-df-quickchart {\n", | |
| " --bg-color: #E8F0FE;\n", | |
| " --fill-color: #1967D2;\n", | |
| " --hover-bg-color: #E2EBFA;\n", | |
| " --hover-fill-color: #174EA6;\n", | |
| " --disabled-fill-color: #AAA;\n", | |
| " --disabled-bg-color: #DDD;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-quickchart {\n", | |
| " --bg-color: #3B4455;\n", | |
| " --fill-color: #D2E3FC;\n", | |
| " --hover-bg-color: #434B5C;\n", | |
| " --hover-fill-color: #FFFFFF;\n", | |
| " --disabled-bg-color: #3B4455;\n", | |
| " --disabled-fill-color: #666;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart {\n", | |
| " background-color: var(--bg-color);\n", | |
| " border: none;\n", | |
| " border-radius: 50%;\n", | |
| " cursor: pointer;\n", | |
| " display: none;\n", | |
| " fill: var(--fill-color);\n", | |
| " height: 32px;\n", | |
| " padding: 0;\n", | |
| " width: 32px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart:hover {\n", | |
| " background-color: var(--hover-bg-color);\n", | |
| " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", | |
| " fill: var(--button-hover-fill-color);\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart-complete:disabled,\n", | |
| " .colab-df-quickchart-complete:disabled:hover {\n", | |
| " background-color: var(--disabled-bg-color);\n", | |
| " fill: var(--disabled-fill-color);\n", | |
| " box-shadow: none;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-spinner {\n", | |
| " border: 2px solid var(--fill-color);\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " animation:\n", | |
| " spin 1s steps(1) infinite;\n", | |
| " }\n", | |
| "\n", | |
| " @keyframes spin {\n", | |
| " 0% {\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " }\n", | |
| " 20% {\n", | |
| " border-color: transparent;\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " }\n", | |
| " 30% {\n", | |
| " border-color: transparent;\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " }\n", | |
| " 40% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " }\n", | |
| " 60% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " }\n", | |
| " 80% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " }\n", | |
| " 90% {\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " }\n", | |
| " }\n", | |
| "</style>\n", | |
| "\n", | |
| " <script>\n", | |
| " async function quickchart(key) {\n", | |
| " const quickchartButtonEl =\n", | |
| " document.querySelector('#' + key + ' button');\n", | |
| " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", | |
| " quickchartButtonEl.classList.add('colab-df-spinner');\n", | |
| " try {\n", | |
| " const charts = await google.colab.kernel.invokeFunction(\n", | |
| " 'suggestCharts', [key], {});\n", | |
| " } catch (error) {\n", | |
| " console.error('Error during call to suggestCharts:', error);\n", | |
| " }\n", | |
| " quickchartButtonEl.classList.remove('colab-df-spinner');\n", | |
| " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", | |
| " }\n", | |
| " (() => {\n", | |
| " let quickchartButtonEl =\n", | |
| " document.querySelector('#df-7b011e45-15e7-4ff2-b233-b490996b9531 button');\n", | |
| " quickchartButtonEl.style.display =\n", | |
| " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", | |
| " })();\n", | |
| " </script>\n", | |
| "</div>\n", | |
| " </div>\n", | |
| " </div>\n" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 16 | |
| } | |
| ], | |
| "source": [ | |
| "# Merge dataframes after cleaning and tokenization\n", | |
| "merged_pos_reviews = pd.concat([best_pos_rev, secbest_pos_rev]).reset_index(drop=True)\n", | |
| "\n", | |
| "# Identify indices of best_pos_rev and secbest_pos_rev rows in dbs_pos_rev\n", | |
| "best_pos_indices = dbs_pos_rev[dbs_pos_rev['Rating'] == 5].index\n", | |
| "secbest_pos_indices = dbs_pos_rev[dbs_pos_rev['Rating'] == 4].index\n", | |
| "\n", | |
| "# Update the corresponding rows in dbs_pos_rev with merged_pos_reviews\n", | |
| "dbs_pos_rev.loc[best_pos_indices, :] = merged_pos_reviews[merged_pos_reviews['Rating'] == 5]\n", | |
| "dbs_pos_rev.loc[secbest_pos_indices, :] = merged_pos_reviews[merged_pos_reviews['Rating'] == 4]\n", | |
| "\n", | |
| "dbs_pos_rev.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 206 | |
| }, | |
| "id": "tmiAFNuaNvaD", | |
| "outputId": "c0d66bd8-1e1e-452c-f11d-ee2d04b64c61" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| " Author Author Country Date Rating \\\n", | |
| "1 Richard Rowe TH 27/5/2020 5.0 \n", | |
| "2 Ananya Agarwal IN 11/12/2020 5.0 \n", | |
| "4 Ashwin Krishnan P C IN 1/4/2021 5.0 \n", | |
| "5 Adeline Tan SG 9/7/2021 5.0 \n", | |
| "6 cindy win SG 17/7/2021 5.0 \n", | |
| "\n", | |
| " Review \n", | |
| "1 ve customer years exemplary experiences \n", | |
| "2 wow bhumi meri hai \n", | |
| "4 internet banking best world \n", | |
| "5 thank yr kind assistance service oriented n ch... \n", | |
| "6 customer service adviser yanty helpful happy a... " | |
| ], | |
| "text/html": [ | |
| "\n", | |
| " <div id=\"df-d4df789a-7994-43bc-b60c-8f9e585291f7\" class=\"colab-df-container\">\n", | |
| " <div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>Author</th>\n", | |
| " <th>Author Country</th>\n", | |
| " <th>Date</th>\n", | |
| " <th>Rating</th>\n", | |
| " <th>Review</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>Richard Rowe</td>\n", | |
| " <td>TH</td>\n", | |
| " <td>27/5/2020</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>ve customer years exemplary experiences</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>Ananya Agarwal</td>\n", | |
| " <td>IN</td>\n", | |
| " <td>11/12/2020</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>wow bhumi meri hai</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>Ashwin Krishnan P C</td>\n", | |
| " <td>IN</td>\n", | |
| " <td>1/4/2021</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>internet banking best world</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>5</th>\n", | |
| " <td>Adeline Tan</td>\n", | |
| " <td>SG</td>\n", | |
| " <td>9/7/2021</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>thank yr kind assistance service oriented n ch...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>6</th>\n", | |
| " <td>cindy win</td>\n", | |
| " <td>SG</td>\n", | |
| " <td>17/7/2021</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>customer service adviser yanty helpful happy a...</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>\n", | |
| " <div class=\"colab-df-buttons\">\n", | |
| "\n", | |
| " <div class=\"colab-df-container\">\n", | |
| " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-d4df789a-7994-43bc-b60c-8f9e585291f7')\"\n", | |
| " title=\"Convert this dataframe to an interactive table.\"\n", | |
| " style=\"display:none;\">\n", | |
| "\n", | |
| " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", | |
| " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", | |
| " </svg>\n", | |
| " </button>\n", | |
| "\n", | |
| " <style>\n", | |
| " .colab-df-container {\n", | |
| " display:flex;\n", | |
| " gap: 12px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-convert {\n", | |
| " background-color: #E8F0FE;\n", | |
| " border: none;\n", | |
| " border-radius: 50%;\n", | |
| " cursor: pointer;\n", | |
| " display: none;\n", | |
| " fill: #1967D2;\n", | |
| " height: 32px;\n", | |
| " padding: 0 0 0 0;\n", | |
| " width: 32px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-convert:hover {\n", | |
| " background-color: #E2EBFA;\n", | |
| " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", | |
| " fill: #174EA6;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-buttons div {\n", | |
| " margin-bottom: 4px;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-convert {\n", | |
| " background-color: #3B4455;\n", | |
| " fill: #D2E3FC;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-convert:hover {\n", | |
| " background-color: #434B5C;\n", | |
| " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", | |
| " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", | |
| " fill: #FFFFFF;\n", | |
| " }\n", | |
| " </style>\n", | |
| "\n", | |
| " <script>\n", | |
| " const buttonEl =\n", | |
| " document.querySelector('#df-d4df789a-7994-43bc-b60c-8f9e585291f7 button.colab-df-convert');\n", | |
| " buttonEl.style.display =\n", | |
| " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", | |
| "\n", | |
| " async function convertToInteractive(key) {\n", | |
| " const element = document.querySelector('#df-d4df789a-7994-43bc-b60c-8f9e585291f7');\n", | |
| " const dataTable =\n", | |
| " await google.colab.kernel.invokeFunction('convertToInteractive',\n", | |
| " [key], {});\n", | |
| " if (!dataTable) return;\n", | |
| "\n", | |
| " const docLinkHtml = 'Like what you see? Visit the ' +\n", | |
| " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", | |
| " + ' to learn more about interactive tables.';\n", | |
| " element.innerHTML = '';\n", | |
| " dataTable['output_type'] = 'display_data';\n", | |
| " await google.colab.output.renderOutput(dataTable, element);\n", | |
| " const docLink = document.createElement('div');\n", | |
| " docLink.innerHTML = docLinkHtml;\n", | |
| " element.appendChild(docLink);\n", | |
| " }\n", | |
| " </script>\n", | |
| " </div>\n", | |
| "\n", | |
| "\n", | |
| "<div id=\"df-f68fae30-d54e-461f-918f-d4dd397904f7\">\n", | |
| " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-f68fae30-d54e-461f-918f-d4dd397904f7')\"\n", | |
| " title=\"Suggest charts\"\n", | |
| " style=\"display:none;\">\n", | |
| "\n", | |
| "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", | |
| " width=\"24px\">\n", | |
| " <g>\n", | |
| " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", | |
| " </g>\n", | |
| "</svg>\n", | |
| " </button>\n", | |
| "\n", | |
| "<style>\n", | |
| " .colab-df-quickchart {\n", | |
| " --bg-color: #E8F0FE;\n", | |
| " --fill-color: #1967D2;\n", | |
| " --hover-bg-color: #E2EBFA;\n", | |
| " --hover-fill-color: #174EA6;\n", | |
| " --disabled-fill-color: #AAA;\n", | |
| " --disabled-bg-color: #DDD;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-quickchart {\n", | |
| " --bg-color: #3B4455;\n", | |
| " --fill-color: #D2E3FC;\n", | |
| " --hover-bg-color: #434B5C;\n", | |
| " --hover-fill-color: #FFFFFF;\n", | |
| " --disabled-bg-color: #3B4455;\n", | |
| " --disabled-fill-color: #666;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart {\n", | |
| " background-color: var(--bg-color);\n", | |
| " border: none;\n", | |
| " border-radius: 50%;\n", | |
| " cursor: pointer;\n", | |
| " display: none;\n", | |
| " fill: var(--fill-color);\n", | |
| " height: 32px;\n", | |
| " padding: 0;\n", | |
| " width: 32px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart:hover {\n", | |
| " background-color: var(--hover-bg-color);\n", | |
| " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", | |
| " fill: var(--button-hover-fill-color);\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart-complete:disabled,\n", | |
| " .colab-df-quickchart-complete:disabled:hover {\n", | |
| " background-color: var(--disabled-bg-color);\n", | |
| " fill: var(--disabled-fill-color);\n", | |
| " box-shadow: none;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-spinner {\n", | |
| " border: 2px solid var(--fill-color);\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " animation:\n", | |
| " spin 1s steps(1) infinite;\n", | |
| " }\n", | |
| "\n", | |
| " @keyframes spin {\n", | |
| " 0% {\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " }\n", | |
| " 20% {\n", | |
| " border-color: transparent;\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " }\n", | |
| " 30% {\n", | |
| " border-color: transparent;\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " }\n", | |
| " 40% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " }\n", | |
| " 60% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " }\n", | |
| " 80% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " }\n", | |
| " 90% {\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " }\n", | |
| " }\n", | |
| "</style>\n", | |
| "\n", | |
| " <script>\n", | |
| " async function quickchart(key) {\n", | |
| " const quickchartButtonEl =\n", | |
| " document.querySelector('#' + key + ' button');\n", | |
| " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", | |
| " quickchartButtonEl.classList.add('colab-df-spinner');\n", | |
| " try {\n", | |
| " const charts = await google.colab.kernel.invokeFunction(\n", | |
| " 'suggestCharts', [key], {});\n", | |
| " } catch (error) {\n", | |
| " console.error('Error during call to suggestCharts:', error);\n", | |
| " }\n", | |
| " quickchartButtonEl.classList.remove('colab-df-spinner');\n", | |
| " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", | |
| " }\n", | |
| " (() => {\n", | |
| " let quickchartButtonEl =\n", | |
| " document.querySelector('#df-f68fae30-d54e-461f-918f-d4dd397904f7 button');\n", | |
| " quickchartButtonEl.style.display =\n", | |
| " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", | |
| " })();\n", | |
| " </script>\n", | |
| "</div>\n", | |
| " </div>\n", | |
| " </div>\n" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 17 | |
| } | |
| ], | |
| "source": [ | |
| "dbs_pos_rev = dbs_pos_rev.dropna()\n", | |
| "dbs_pos_rev.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "8wQfk3xea62N" | |
| }, | |
| "source": [ | |
| "#####Negative Scrapped Dataset" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "ymKq5hkhbP-a" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Separate the dataset into different dataframes based on the label columns\n", | |
| "worst_neg_rev = dbs_neg_rev[(dbs_neg_rev['Rating'] == 1) |\n", | |
| " (dbs_neg_rev['Rating'] == 2)] # Review Rating of 1 and 2\n", | |
| "bad_neg_rev = dbs_neg_rev[dbs_neg_rev['Rating'] == 3] # Review Rating of 3\n", | |
| "\n", | |
| "worst_neg_rev['Review'] = worst_neg_rev['Review'].apply(lambda x:clean_reviews(x))\n", | |
| "bad_neg_rev['Review'] = bad_neg_rev['Review'].apply(lambda x:clean_reviews(x))\n", | |
| "\n", | |
| "worst_neg_rev['Review'] = worst_neg_rev['Review'].apply(tokeniseandstopwords)\n", | |
| "bad_neg_rev['Review'] = bad_neg_rev['Review'].apply(tokeniseandstopwords)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "LBmz7olV0W_I", | |
| "outputId": "38188b46-86c5-4655-b9b9-af2b6f874edc" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| " Author Author Country Date Rating \\\n", | |
| "0 William Miller GB 20/12/2019 1.0 \n", | |
| "1 Sridhar Iyer SG 7/10/2020 1.0 \n", | |
| "2 Ioana C GB 16/12/2020 1.0 \n", | |
| "3 Matt B US 18/12/2020 1.0 \n", | |
| "4 yi xiong yao SG 19/12/2020 1.0 \n", | |
| "\n", | |
| " Review \n", | |
| "0 moved bangkok singapore september intended lar... \n", | |
| "1 dbs terrible bank comes long wait branches tim... \n", | |
| "2 disappointed customer service dbs customer yea... \n", | |
| "3 called month ago ask close account said hold a... \n", | |
| "4 staff counter storeylast servicing counter num... " | |
| ], | |
| "text/html": [ | |
| "\n", | |
| " <div id=\"df-2d0aa552-9b5d-43d8-9828-7f3032c45185\" class=\"colab-df-container\">\n", | |
| " <div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>Author</th>\n", | |
| " <th>Author Country</th>\n", | |
| " <th>Date</th>\n", | |
| " <th>Rating</th>\n", | |
| " <th>Review</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>William Miller</td>\n", | |
| " <td>GB</td>\n", | |
| " <td>20/12/2019</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>moved bangkok singapore september intended lar...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>Sridhar Iyer</td>\n", | |
| " <td>SG</td>\n", | |
| " <td>7/10/2020</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>dbs terrible bank comes long wait branches tim...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>Ioana C</td>\n", | |
| " <td>GB</td>\n", | |
| " <td>16/12/2020</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>disappointed customer service dbs customer yea...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>Matt B</td>\n", | |
| " <td>US</td>\n", | |
| " <td>18/12/2020</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>called month ago ask close account said hold a...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>yi xiong yao</td>\n", | |
| " <td>SG</td>\n", | |
| " <td>19/12/2020</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>staff counter storeylast servicing counter num...</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>\n", | |
| " <div class=\"colab-df-buttons\">\n", | |
| "\n", | |
| " <div class=\"colab-df-container\">\n", | |
| " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-2d0aa552-9b5d-43d8-9828-7f3032c45185')\"\n", | |
| " title=\"Convert this dataframe to an interactive table.\"\n", | |
| " style=\"display:none;\">\n", | |
| "\n", | |
| " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", | |
| " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", | |
| " </svg>\n", | |
| " </button>\n", | |
| "\n", | |
| " <style>\n", | |
| " .colab-df-container {\n", | |
| " display:flex;\n", | |
| " gap: 12px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-convert {\n", | |
| " background-color: #E8F0FE;\n", | |
| " border: none;\n", | |
| " border-radius: 50%;\n", | |
| " cursor: pointer;\n", | |
| " display: none;\n", | |
| " fill: #1967D2;\n", | |
| " height: 32px;\n", | |
| " padding: 0 0 0 0;\n", | |
| " width: 32px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-convert:hover {\n", | |
| " background-color: #E2EBFA;\n", | |
| " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", | |
| " fill: #174EA6;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-buttons div {\n", | |
| " margin-bottom: 4px;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-convert {\n", | |
| " background-color: #3B4455;\n", | |
| " fill: #D2E3FC;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-convert:hover {\n", | |
| " background-color: #434B5C;\n", | |
| " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", | |
| " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", | |
| " fill: #FFFFFF;\n", | |
| " }\n", | |
| " </style>\n", | |
| "\n", | |
| " <script>\n", | |
| " const buttonEl =\n", | |
| " document.querySelector('#df-2d0aa552-9b5d-43d8-9828-7f3032c45185 button.colab-df-convert');\n", | |
| " buttonEl.style.display =\n", | |
| " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", | |
| "\n", | |
| " async function convertToInteractive(key) {\n", | |
| " const element = document.querySelector('#df-2d0aa552-9b5d-43d8-9828-7f3032c45185');\n", | |
| " const dataTable =\n", | |
| " await google.colab.kernel.invokeFunction('convertToInteractive',\n", | |
| " [key], {});\n", | |
| " if (!dataTable) return;\n", | |
| "\n", | |
| " const docLinkHtml = 'Like what you see? Visit the ' +\n", | |
| " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", | |
| " + ' to learn more about interactive tables.';\n", | |
| " element.innerHTML = '';\n", | |
| " dataTable['output_type'] = 'display_data';\n", | |
| " await google.colab.output.renderOutput(dataTable, element);\n", | |
| " const docLink = document.createElement('div');\n", | |
| " docLink.innerHTML = docLinkHtml;\n", | |
| " element.appendChild(docLink);\n", | |
| " }\n", | |
| " </script>\n", | |
| " </div>\n", | |
| "\n", | |
| "\n", | |
| "<div id=\"df-63f8ef44-dcf9-4489-a407-c6831fc83872\">\n", | |
| " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-63f8ef44-dcf9-4489-a407-c6831fc83872')\"\n", | |
| " title=\"Suggest charts\"\n", | |
| " style=\"display:none;\">\n", | |
| "\n", | |
| "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", | |
| " width=\"24px\">\n", | |
| " <g>\n", | |
| " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", | |
| " </g>\n", | |
| "</svg>\n", | |
| " </button>\n", | |
| "\n", | |
| "<style>\n", | |
| " .colab-df-quickchart {\n", | |
| " --bg-color: #E8F0FE;\n", | |
| " --fill-color: #1967D2;\n", | |
| " --hover-bg-color: #E2EBFA;\n", | |
| " --hover-fill-color: #174EA6;\n", | |
| " --disabled-fill-color: #AAA;\n", | |
| " --disabled-bg-color: #DDD;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-quickchart {\n", | |
| " --bg-color: #3B4455;\n", | |
| " --fill-color: #D2E3FC;\n", | |
| " --hover-bg-color: #434B5C;\n", | |
| " --hover-fill-color: #FFFFFF;\n", | |
| " --disabled-bg-color: #3B4455;\n", | |
| " --disabled-fill-color: #666;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart {\n", | |
| " background-color: var(--bg-color);\n", | |
| " border: none;\n", | |
| " border-radius: 50%;\n", | |
| " cursor: pointer;\n", | |
| " display: none;\n", | |
| " fill: var(--fill-color);\n", | |
| " height: 32px;\n", | |
| " padding: 0;\n", | |
| " width: 32px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart:hover {\n", | |
| " background-color: var(--hover-bg-color);\n", | |
| " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", | |
| " fill: var(--button-hover-fill-color);\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart-complete:disabled,\n", | |
| " .colab-df-quickchart-complete:disabled:hover {\n", | |
| " background-color: var(--disabled-bg-color);\n", | |
| " fill: var(--disabled-fill-color);\n", | |
| " box-shadow: none;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-spinner {\n", | |
| " border: 2px solid var(--fill-color);\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " animation:\n", | |
| " spin 1s steps(1) infinite;\n", | |
| " }\n", | |
| "\n", | |
| " @keyframes spin {\n", | |
| " 0% {\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " }\n", | |
| " 20% {\n", | |
| " border-color: transparent;\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " }\n", | |
| " 30% {\n", | |
| " border-color: transparent;\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " }\n", | |
| " 40% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " }\n", | |
| " 60% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " }\n", | |
| " 80% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " }\n", | |
| " 90% {\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " }\n", | |
| " }\n", | |
| "</style>\n", | |
| "\n", | |
| " <script>\n", | |
| " async function quickchart(key) {\n", | |
| " const quickchartButtonEl =\n", | |
| " document.querySelector('#' + key + ' button');\n", | |
| " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", | |
| " quickchartButtonEl.classList.add('colab-df-spinner');\n", | |
| " try {\n", | |
| " const charts = await google.colab.kernel.invokeFunction(\n", | |
| " 'suggestCharts', [key], {});\n", | |
| " } catch (error) {\n", | |
| " console.error('Error during call to suggestCharts:', error);\n", | |
| " }\n", | |
| " quickchartButtonEl.classList.remove('colab-df-spinner');\n", | |
| " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", | |
| " }\n", | |
| " (() => {\n", | |
| " let quickchartButtonEl =\n", | |
| " document.querySelector('#df-63f8ef44-dcf9-4489-a407-c6831fc83872 button');\n", | |
| " quickchartButtonEl.style.display =\n", | |
| " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", | |
| " })();\n", | |
| " </script>\n", | |
| "</div>\n", | |
| " </div>\n", | |
| " </div>\n" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 19 | |
| } | |
| ], | |
| "source": [ | |
| "# Merge dataframes after cleaning and tokenization\n", | |
| "merged_neg_reviews = pd.concat([worst_neg_rev, bad_neg_rev]).reset_index(drop=True)\n", | |
| "\n", | |
| "# Identify indices of worst_neg_rev and bad_neg_rev rows in dbs_neg_rev\n", | |
| "worst_neg_indices = dbs_neg_rev[(dbs_neg_rev['Rating'] == 1) |\n", | |
| " (dbs_neg_rev['Rating'] == 2)].index\n", | |
| "bad_neg_indices = dbs_neg_rev[dbs_neg_rev['Rating'] == 3].index\n", | |
| "\n", | |
| "# Update the corresponding rows in dbs_neg_rev with merged_neg_reviews\n", | |
| "dbs_neg_rev.loc[worst_neg_indices, :] = merged_neg_reviews[(merged_neg_reviews['Rating'] == 1) |\n", | |
| " (merged_neg_reviews['Rating'] == 2)]\n", | |
| "dbs_neg_rev.loc[bad_neg_indices, :] = merged_neg_reviews[merged_neg_reviews['Rating'] == 3]\n", | |
| "\n", | |
| "dbs_neg_rev.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "uAYCKdCUOHKF", | |
| "outputId": "db115b34-3bc1-44bc-c40b-33993eb9b636" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| " Author Author Country Date Rating \\\n", | |
| "0 William Miller GB 20/12/2019 1.0 \n", | |
| "1 Sridhar Iyer SG 7/10/2020 1.0 \n", | |
| "2 Ioana C GB 16/12/2020 1.0 \n", | |
| "3 Matt B US 18/12/2020 1.0 \n", | |
| "4 yi xiong yao SG 19/12/2020 1.0 \n", | |
| "\n", | |
| " Review \n", | |
| "0 moved bangkok singapore september intended lar... \n", | |
| "1 dbs terrible bank comes long wait branches tim... \n", | |
| "2 disappointed customer service dbs customer yea... \n", | |
| "3 called month ago ask close account said hold a... \n", | |
| "4 staff counter storeylast servicing counter num... " | |
| ], | |
| "text/html": [ | |
| "\n", | |
| " <div id=\"df-6048edd5-b39f-4edb-b76d-74cd9f997181\" class=\"colab-df-container\">\n", | |
| " <div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>Author</th>\n", | |
| " <th>Author Country</th>\n", | |
| " <th>Date</th>\n", | |
| " <th>Rating</th>\n", | |
| " <th>Review</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>William Miller</td>\n", | |
| " <td>GB</td>\n", | |
| " <td>20/12/2019</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>moved bangkok singapore september intended lar...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>Sridhar Iyer</td>\n", | |
| " <td>SG</td>\n", | |
| " <td>7/10/2020</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>dbs terrible bank comes long wait branches tim...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>Ioana C</td>\n", | |
| " <td>GB</td>\n", | |
| " <td>16/12/2020</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>disappointed customer service dbs customer yea...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>Matt B</td>\n", | |
| " <td>US</td>\n", | |
| " <td>18/12/2020</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>called month ago ask close account said hold a...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>yi xiong yao</td>\n", | |
| " <td>SG</td>\n", | |
| " <td>19/12/2020</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>staff counter storeylast servicing counter num...</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>\n", | |
| " <div class=\"colab-df-buttons\">\n", | |
| "\n", | |
| " <div class=\"colab-df-container\">\n", | |
| " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-6048edd5-b39f-4edb-b76d-74cd9f997181')\"\n", | |
| " title=\"Convert this dataframe to an interactive table.\"\n", | |
| " style=\"display:none;\">\n", | |
| "\n", | |
| " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", | |
| " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", | |
| " </svg>\n", | |
| " </button>\n", | |
| "\n", | |
| " <style>\n", | |
| " .colab-df-container {\n", | |
| " display:flex;\n", | |
| " gap: 12px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-convert {\n", | |
| " background-color: #E8F0FE;\n", | |
| " border: none;\n", | |
| " border-radius: 50%;\n", | |
| " cursor: pointer;\n", | |
| " display: none;\n", | |
| " fill: #1967D2;\n", | |
| " height: 32px;\n", | |
| " padding: 0 0 0 0;\n", | |
| " width: 32px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-convert:hover {\n", | |
| " background-color: #E2EBFA;\n", | |
| " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", | |
| " fill: #174EA6;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-buttons div {\n", | |
| " margin-bottom: 4px;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-convert {\n", | |
| " background-color: #3B4455;\n", | |
| " fill: #D2E3FC;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-convert:hover {\n", | |
| " background-color: #434B5C;\n", | |
| " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", | |
| " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", | |
| " fill: #FFFFFF;\n", | |
| " }\n", | |
| " </style>\n", | |
| "\n", | |
| " <script>\n", | |
| " const buttonEl =\n", | |
| " document.querySelector('#df-6048edd5-b39f-4edb-b76d-74cd9f997181 button.colab-df-convert');\n", | |
| " buttonEl.style.display =\n", | |
| " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", | |
| "\n", | |
| " async function convertToInteractive(key) {\n", | |
| " const element = document.querySelector('#df-6048edd5-b39f-4edb-b76d-74cd9f997181');\n", | |
| " const dataTable =\n", | |
| " await google.colab.kernel.invokeFunction('convertToInteractive',\n", | |
| " [key], {});\n", | |
| " if (!dataTable) return;\n", | |
| "\n", | |
| " const docLinkHtml = 'Like what you see? Visit the ' +\n", | |
| " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", | |
| " + ' to learn more about interactive tables.';\n", | |
| " element.innerHTML = '';\n", | |
| " dataTable['output_type'] = 'display_data';\n", | |
| " await google.colab.output.renderOutput(dataTable, element);\n", | |
| " const docLink = document.createElement('div');\n", | |
| " docLink.innerHTML = docLinkHtml;\n", | |
| " element.appendChild(docLink);\n", | |
| " }\n", | |
| " </script>\n", | |
| " </div>\n", | |
| "\n", | |
| "\n", | |
| "<div id=\"df-ee1272d6-5314-4d74-8b75-2af2d2f75ed3\">\n", | |
| " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-ee1272d6-5314-4d74-8b75-2af2d2f75ed3')\"\n", | |
| " title=\"Suggest charts\"\n", | |
| " style=\"display:none;\">\n", | |
| "\n", | |
| "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", | |
| " width=\"24px\">\n", | |
| " <g>\n", | |
| " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", | |
| " </g>\n", | |
| "</svg>\n", | |
| " </button>\n", | |
| "\n", | |
| "<style>\n", | |
| " .colab-df-quickchart {\n", | |
| " --bg-color: #E8F0FE;\n", | |
| " --fill-color: #1967D2;\n", | |
| " --hover-bg-color: #E2EBFA;\n", | |
| " --hover-fill-color: #174EA6;\n", | |
| " --disabled-fill-color: #AAA;\n", | |
| " --disabled-bg-color: #DDD;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-quickchart {\n", | |
| " --bg-color: #3B4455;\n", | |
| " --fill-color: #D2E3FC;\n", | |
| " --hover-bg-color: #434B5C;\n", | |
| " --hover-fill-color: #FFFFFF;\n", | |
| " --disabled-bg-color: #3B4455;\n", | |
| " --disabled-fill-color: #666;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart {\n", | |
| " background-color: var(--bg-color);\n", | |
| " border: none;\n", | |
| " border-radius: 50%;\n", | |
| " cursor: pointer;\n", | |
| " display: none;\n", | |
| " fill: var(--fill-color);\n", | |
| " height: 32px;\n", | |
| " padding: 0;\n", | |
| " width: 32px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart:hover {\n", | |
| " background-color: var(--hover-bg-color);\n", | |
| " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", | |
| " fill: var(--button-hover-fill-color);\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart-complete:disabled,\n", | |
| " .colab-df-quickchart-complete:disabled:hover {\n", | |
| " background-color: var(--disabled-bg-color);\n", | |
| " fill: var(--disabled-fill-color);\n", | |
| " box-shadow: none;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-spinner {\n", | |
| " border: 2px solid var(--fill-color);\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " animation:\n", | |
| " spin 1s steps(1) infinite;\n", | |
| " }\n", | |
| "\n", | |
| " @keyframes spin {\n", | |
| " 0% {\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " }\n", | |
| " 20% {\n", | |
| " border-color: transparent;\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " }\n", | |
| " 30% {\n", | |
| " border-color: transparent;\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " }\n", | |
| " 40% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " }\n", | |
| " 60% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " }\n", | |
| " 80% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " }\n", | |
| " 90% {\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " }\n", | |
| " }\n", | |
| "</style>\n", | |
| "\n", | |
| " <script>\n", | |
| " async function quickchart(key) {\n", | |
| " const quickchartButtonEl =\n", | |
| " document.querySelector('#' + key + ' button');\n", | |
| " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", | |
| " quickchartButtonEl.classList.add('colab-df-spinner');\n", | |
| " try {\n", | |
| " const charts = await google.colab.kernel.invokeFunction(\n", | |
| " 'suggestCharts', [key], {});\n", | |
| " } catch (error) {\n", | |
| " console.error('Error during call to suggestCharts:', error);\n", | |
| " }\n", | |
| " quickchartButtonEl.classList.remove('colab-df-spinner');\n", | |
| " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", | |
| " }\n", | |
| " (() => {\n", | |
| " let quickchartButtonEl =\n", | |
| " document.querySelector('#df-ee1272d6-5314-4d74-8b75-2af2d2f75ed3 button');\n", | |
| " quickchartButtonEl.style.display =\n", | |
| " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", | |
| " })();\n", | |
| " </script>\n", | |
| "</div>\n", | |
| " </div>\n", | |
| " </div>\n" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 20 | |
| } | |
| ], | |
| "source": [ | |
| "dbs_neg_rev = dbs_neg_rev.dropna()\n", | |
| "dbs_neg_rev.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "27ozNql5VBWo" | |
| }, | |
| "source": [ | |
| "#####Merging Preprocessed Positive and Negative Scrapped Dataset" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "H_iHvevcVH-Q", | |
| "outputId": "200e6739-fd51-4d85-d7a0-79ad1e928d80" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| " Author Author Country Date Rating \\\n", | |
| "0 Richard Rowe TH 27/5/2020 5.0 \n", | |
| "1 Ananya Agarwal IN 11/12/2020 5.0 \n", | |
| "2 Ashwin Krishnan P C IN 1/4/2021 5.0 \n", | |
| "3 Adeline Tan SG 9/7/2021 5.0 \n", | |
| "4 cindy win SG 17/7/2021 5.0 \n", | |
| "\n", | |
| " Review \n", | |
| "0 ve customer years exemplary experiences \n", | |
| "1 wow bhumi meri hai \n", | |
| "2 internet banking best world \n", | |
| "3 thank yr kind assistance service oriented n ch... \n", | |
| "4 customer service adviser yanty helpful happy a... " | |
| ], | |
| "text/html": [ | |
| "\n", | |
| " <div id=\"df-760cb867-fad6-4be8-83d5-685d5495295b\" class=\"colab-df-container\">\n", | |
| " <div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>Author</th>\n", | |
| " <th>Author Country</th>\n", | |
| " <th>Date</th>\n", | |
| " <th>Rating</th>\n", | |
| " <th>Review</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>Richard Rowe</td>\n", | |
| " <td>TH</td>\n", | |
| " <td>27/5/2020</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>ve customer years exemplary experiences</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>Ananya Agarwal</td>\n", | |
| " <td>IN</td>\n", | |
| " <td>11/12/2020</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>wow bhumi meri hai</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>Ashwin Krishnan P C</td>\n", | |
| " <td>IN</td>\n", | |
| " <td>1/4/2021</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>internet banking best world</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>Adeline Tan</td>\n", | |
| " <td>SG</td>\n", | |
| " <td>9/7/2021</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>thank yr kind assistance service oriented n ch...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>cindy win</td>\n", | |
| " <td>SG</td>\n", | |
| " <td>17/7/2021</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>customer service adviser yanty helpful happy a...</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>\n", | |
| " <div class=\"colab-df-buttons\">\n", | |
| "\n", | |
| " <div class=\"colab-df-container\">\n", | |
| " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-760cb867-fad6-4be8-83d5-685d5495295b')\"\n", | |
| " title=\"Convert this dataframe to an interactive table.\"\n", | |
| " style=\"display:none;\">\n", | |
| "\n", | |
| " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", | |
| " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", | |
| " </svg>\n", | |
| " </button>\n", | |
| "\n", | |
| " <style>\n", | |
| " .colab-df-container {\n", | |
| " display:flex;\n", | |
| " gap: 12px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-convert {\n", | |
| " background-color: #E8F0FE;\n", | |
| " border: none;\n", | |
| " border-radius: 50%;\n", | |
| " cursor: pointer;\n", | |
| " display: none;\n", | |
| " fill: #1967D2;\n", | |
| " height: 32px;\n", | |
| " padding: 0 0 0 0;\n", | |
| " width: 32px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-convert:hover {\n", | |
| " background-color: #E2EBFA;\n", | |
| " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", | |
| " fill: #174EA6;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-buttons div {\n", | |
| " margin-bottom: 4px;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-convert {\n", | |
| " background-color: #3B4455;\n", | |
| " fill: #D2E3FC;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-convert:hover {\n", | |
| " background-color: #434B5C;\n", | |
| " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", | |
| " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", | |
| " fill: #FFFFFF;\n", | |
| " }\n", | |
| " </style>\n", | |
| "\n", | |
| " <script>\n", | |
| " const buttonEl =\n", | |
| " document.querySelector('#df-760cb867-fad6-4be8-83d5-685d5495295b button.colab-df-convert');\n", | |
| " buttonEl.style.display =\n", | |
| " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", | |
| "\n", | |
| " async function convertToInteractive(key) {\n", | |
| " const element = document.querySelector('#df-760cb867-fad6-4be8-83d5-685d5495295b');\n", | |
| " const dataTable =\n", | |
| " await google.colab.kernel.invokeFunction('convertToInteractive',\n", | |
| " [key], {});\n", | |
| " if (!dataTable) return;\n", | |
| "\n", | |
| " const docLinkHtml = 'Like what you see? Visit the ' +\n", | |
| " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", | |
| " + ' to learn more about interactive tables.';\n", | |
| " element.innerHTML = '';\n", | |
| " dataTable['output_type'] = 'display_data';\n", | |
| " await google.colab.output.renderOutput(dataTable, element);\n", | |
| " const docLink = document.createElement('div');\n", | |
| " docLink.innerHTML = docLinkHtml;\n", | |
| " element.appendChild(docLink);\n", | |
| " }\n", | |
| " </script>\n", | |
| " </div>\n", | |
| "\n", | |
| "\n", | |
| "<div id=\"df-13b6c475-0289-4606-a549-fc9511f019b5\">\n", | |
| " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-13b6c475-0289-4606-a549-fc9511f019b5')\"\n", | |
| " title=\"Suggest charts\"\n", | |
| " style=\"display:none;\">\n", | |
| "\n", | |
| "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", | |
| " width=\"24px\">\n", | |
| " <g>\n", | |
| " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", | |
| " </g>\n", | |
| "</svg>\n", | |
| " </button>\n", | |
| "\n", | |
| "<style>\n", | |
| " .colab-df-quickchart {\n", | |
| " --bg-color: #E8F0FE;\n", | |
| " --fill-color: #1967D2;\n", | |
| " --hover-bg-color: #E2EBFA;\n", | |
| " --hover-fill-color: #174EA6;\n", | |
| " --disabled-fill-color: #AAA;\n", | |
| " --disabled-bg-color: #DDD;\n", | |
| " }\n", | |
| "\n", | |
| " [theme=dark] .colab-df-quickchart {\n", | |
| " --bg-color: #3B4455;\n", | |
| " --fill-color: #D2E3FC;\n", | |
| " --hover-bg-color: #434B5C;\n", | |
| " --hover-fill-color: #FFFFFF;\n", | |
| " --disabled-bg-color: #3B4455;\n", | |
| " --disabled-fill-color: #666;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart {\n", | |
| " background-color: var(--bg-color);\n", | |
| " border: none;\n", | |
| " border-radius: 50%;\n", | |
| " cursor: pointer;\n", | |
| " display: none;\n", | |
| " fill: var(--fill-color);\n", | |
| " height: 32px;\n", | |
| " padding: 0;\n", | |
| " width: 32px;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart:hover {\n", | |
| " background-color: var(--hover-bg-color);\n", | |
| " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", | |
| " fill: var(--button-hover-fill-color);\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-quickchart-complete:disabled,\n", | |
| " .colab-df-quickchart-complete:disabled:hover {\n", | |
| " background-color: var(--disabled-bg-color);\n", | |
| " fill: var(--disabled-fill-color);\n", | |
| " box-shadow: none;\n", | |
| " }\n", | |
| "\n", | |
| " .colab-df-spinner {\n", | |
| " border: 2px solid var(--fill-color);\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " animation:\n", | |
| " spin 1s steps(1) infinite;\n", | |
| " }\n", | |
| "\n", | |
| " @keyframes spin {\n", | |
| " 0% {\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " }\n", | |
| " 20% {\n", | |
| " border-color: transparent;\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " }\n", | |
| " 30% {\n", | |
| " border-color: transparent;\n", | |
| " border-left-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " }\n", | |
| " 40% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " border-top-color: var(--fill-color);\n", | |
| " }\n", | |
| " 60% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " }\n", | |
| " 80% {\n", | |
| " border-color: transparent;\n", | |
| " border-right-color: var(--fill-color);\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " }\n", | |
| " 90% {\n", | |
| " border-color: transparent;\n", | |
| " border-bottom-color: var(--fill-color);\n", | |
| " }\n", | |
| " }\n", | |
| "</style>\n", | |
| "\n", | |
| " <script>\n", | |
| " async function quickchart(key) {\n", | |
| " const quickchartButtonEl =\n", | |
| " document.querySelector('#' + key + ' button');\n", | |
| " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", | |
| " quickchartButtonEl.classList.add('colab-df-spinner');\n", | |
| " try {\n", | |
| " const charts = await google.colab.kernel.invokeFunction(\n", | |
| " 'suggestCharts', [key], {});\n", | |
| " } catch (error) {\n", | |
| " console.error('Error during call to suggestCharts:', error);\n", | |
| " }\n", | |
| " quickchartButtonEl.classList.remove('colab-df-spinner');\n", | |
| " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", | |
| " }\n", | |
| " (() => {\n", | |
| " let quickchartButtonEl =\n", | |
| " document.querySelector('#df-13b6c475-0289-4606-a549-fc9511f019b5 button');\n", | |
| " quickchartButtonEl.style.display =\n", | |
| " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", | |
| " })();\n", | |
| " </script>\n", | |
| "</div>\n", | |
| " </div>\n", | |
| " </div>\n" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 21 | |
| } | |
| ], | |
| "source": [ | |
| "dbs_rev = pd.concat([dbs_pos_rev, dbs_neg_rev]).reset_index(drop = True)\n", | |
| "dbs_rev.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "x37EXCJJc9W7" | |
| }, | |
| "source": [ | |
| "##**World Cloud Generation for Text Visualisation**" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "3VjpGtPcdcX0" | |
| }, | |
| "source": [ | |
| "A tag cloud (word cloud or wordle or weighted list in visual design) is a novelty visual representation of text data, typically used to depict keyword metadata (tags) on websites, or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color. This format is useful for quickly perceiving the most prominent terms to determine its relative prominence. When used as website navigation aids, the terms are hyperlinked to items associated with the tag." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "3dU6mzVvdRSd" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "def generate_word_cloud(Review):\n", | |
| " wordcloud = WordCloud(\n", | |
| " width = 3000,\n", | |
| " height = 2000,\n", | |
| " background_color = 'black').generate(str(Review))\n", | |
| " fig = plt.figure(\n", | |
| " figsize = (40, 30),\n", | |
| " facecolor = 'k',\n", | |
| " edgecolor = 'k')\n", | |
| " plt.imshow(wordcloud, interpolation = 'bilinear')\n", | |
| " plt.axis('off')\n", | |
| " plt.tight_layout(pad=0)\n", | |
| " plt.show()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "R_T3pjyrd8Eb" | |
| }, | |
| "source": [ | |
| "###Text Visualisation (Positive Reviews)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "71xTloD7eWq_" | |
| }, | |
| "source": [ | |
| "####Rating 5" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "RiK1UfSpeCY2", | |
| "outputId": "766f888f-ab77-46d6-b0cf-1fd87024bfbc" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "display_data", | |
| "data": { | |
| "text/plain": [ | |
| "<Figure size 4000x3000 with 1 Axes>" | |
| ], |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment