{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "kXCyEtDav_TR", "outputId": "09ba2cbd-cc6e-4569-e8ea-d94e3b81e071" }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import time\n", "#from google.colab import drive, output\n", "#drive.mount('/content/drive')\n", "from geopy.distance import geodesic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Instructions \n", "\n", "* Submit your code both as notebook file (.ipynb) and python script (.py) on LMS. The name of both files should be 'RollNo_PA01', for example: \"23100214_PA01\". Failing to submit any one of them will result in the reduction of marks.\n", "* All the cells must be run once before submission and should be displaying the results(graphs/plots etc). If output of the cells is not being displayed, marks will be dedcuted.\n", "* The code MUST be implemented independently. Any plagiarism or cheating of work from others or the internet will be immediately referred to the DC.\n", "* 10% penalty per day for 3 days after due date. No submissions will be accepted\n", "after that. \n", "* Use procedural programming style and comment your code properly.\n", "* **Deadline to submit this assignment is 27/02/2023 (23:55).**" ] }, { "cell_type": "markdown", "metadata": { "id": "I4c2QQ2Q0Hca" }, "source": [ "# Task 1\n", "You are required to implement a simple linear regression (`simple_LR`) class with 2 arguments that fits and plots a self-generated linear distribution,\n", "\n", "\\begin{equation}\n", " \\hat y = a + bX.\n", "\\end{equation}\n", "\n", "\n", "\n", "* The class consists of two functions; the paramterised constructor and `plot_model` that plots the fitted model on top of a scatter plot of the data.\n", "* The constructor receives 5 arguments:\n", " * $a$, y-intercept of the fitted line.\n", " * $b$, gradient of the fitted line.\n", " * $n$, the number of evenly spaced points to plot.\n", " * $x_{min}$, the minimum value that x can take in the interval.\n", " * $x_{max}$, the maximum value that x can take in the interval.\n", "* The `plot_model` function receives no arguments only outputs a plot.\n", " \n", "\n", "**Steps to follow:**\n", "\n", "\n", "1. Initialize the arguments inside the constructor.\n", "2. Inside plot equation generate a list of $n$ evenly spaced called $X$ values between $x_{min}$ and $x_{max}$.\n", "3. For the generated list find $y$ using $y = a + bX$.\n", "4. Add random normal noise to $y$ with $\\mu = 0$ and $\\sigma = 0.5$.\n", "5. For these $X$ and $y$ find the optimal weights using analytical solution given by,\n", "\n", "\\begin{equation}\n", "w = (X^TX)^{-1}X^Ty.\n", "\\end{equation}\n", "\n", "6. On the same figure, plot a scatter of the original $X$ and $y$ and a line plot of the fitted line. The plot should have axis labels and a legend showing the equation of the line.\n", "\n", "The code should be as **vectorized** as possible; For loops are not allowed. All of the steps 2 to 6 should be done inside `plot_equation` method." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "CbjDhTBv0cn9" }, "outputs": [], "source": [ "class simple_LR():\n", " def __init__(self, a, b, x_min, x_max, n):\n", " #Write your code here\n", " pass\n", " \n", " def plot_equation(self):\n", " #Write your code here\n", " pass" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 803 }, "id": "JhFfOAdJu8EY", "outputId": "8517a971-ce1b-4f28-f3b6-aed217fc7231" }, "outputs": [], "source": [ "#Do not modify this cell\n", "vals = [(3, -4), (5, -5), (1, 2)]\n", "for a, b in vals:\n", " L = simple_LR(a, b, 0, 5, 10)\n", " L.plot_equation()" ] }, { "cell_type": "markdown", "metadata": { "id": "6dqDzI9RHHch" }, "source": [ "# Task 2\n", "## Part 1\n", "You are required to implement a multivariate linear regression (`multivariate_LR`) class with 9 arguments.\n", "\n", "\\begin{equation}\n", " \\hat y = \\theta_0 + \\sum_{i = 1}^{d} \\theta_i X_i\n", "\\end{equation}\n", "\n", "\n", "\n", "* The class consists of **9** functions: paramterised constructor, `predict`, `mean_square_loss`, `loss_derivative`, `gradient_descent`, `plot_loss`, `animate_gradient_descent` and `adjusted_R_Squared`.\n", "* The constructor is passed 9 arguments which have the following description:\n", " * $X$, the feature matrix of the data. It should be a numpy matrix of dimension $n \\times m$.\n", " * $y$, the output vector, It should be a **1-D** numpy vector.\n", " * `train_size`, between **0** and **1**, corresponds to the fraction of the data allocated to the training set.\n", " * `epochs`, the maximum number of epochs for gradient descent.\n", " * `learning_rate`, the learning rate for the gradient descent.\n", " * `intercept`, a boolean variable: True if intercept is to be fitted else false.\n", " * `normalize`, a boolean variable: True if $X$ is to be normalized.\n", " * `method`, signifies the gradient descent method: `'batch'`, `'sgd'`, `'minibatch'`.\n", " * `batchsize`, number of batches for the minibatch method, set to 10 by default.\n", "\n", "**Description:**\n", "* Before passing $X$ and $y$ to the class, make sure they are **2-D** numpy arrays.\n", "* Initialize the arguments passed inside the constructor, normalize $X$ if `normalize = True`. Add an intercept column to $X$ if `intercept = True`. Also initialize a parameter $w$ which is a column vector that stores the weights. You should also randomly split $X$ and $y$ here based on the `train_size` argument. Also intialize the `loss_history` and `weight_history` parameters which store the loss and the weights respectively at each iteration.\n", "* The `predict` function receives no arguments returns the prediction vector for the test $X$.\n", "* The `mean_square_loss` function recevies 2 vectors $x$ and $y$ and returns the Euclidean distance between them.\n", "* The `loss_derivative` function recevies 2 vectors $X$ and $y$ and returns the value of the **GRADIENT** with the current weights.\n", "* The `gradient_descent` function recevies no arguments and performs gradient descent using the one of the 3 specified descent methods.\n", "* The `plot_loss` function receives no arguments and plots the loss curve using `loss_history`.\n", "* The `animate_gradient_descent` function receives no arguments and animates on a plot how the fitted line evolves on **1-D** using weights stored in `weight_history` . Use `time.sleep(1)` inside a for loop that iterates over `weight_history` plot the scatter of the original $X$ and $y$ variables as well as the fitted line at that iteration.\n", "* The `adjusted_R_Squared` function receives no arguments and returns the adjusted $R^2$ value.\n", "\n", "The code should be as vectorized as possible." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 279 }, "id": "p27BJtg8cEXa", "outputId": "2a22b283-36e6-467b-dcbe-266e408f7aeb" }, "outputs": [], "source": [ "#Do not modify this cell\n", "X = np.linspace(0, 10, 20)\n", "Y = 3 + 3 * X + np.random.normal(loc = 0, scale = 4, size = len(X))\n", "X = X.reshape((len(X), 1))\n", "Y = Y.reshape((len(X), 1))\n", "graph = plt.figure()\n", "graph = plt.scatter(X, Y)\n", "graph = plt.xlabel('x')\n", "graph = plt.ylabel('y')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "OfF_JVhfGhYG" }, "outputs": [], "source": [ "class multivariate_LR():\n", " def __init__(self, X, Y, train_size, epochs, learning_rate, intercept, normalize, method, batch_size = 10):\n", " \n", " def predict(self):\n", " pass\n", " \n", " def mean_square_loss(self, x, y):\n", " pass\n", "\n", " def loss_derivative(self, x, y):\n", " pass\n", " \n", " def gradient_descent(self):\n", " for i in range(self.epochs):\n", " if self.method == 'batch':\n", " pass\n", " #Write your code here\n", " elif self.method == 'sgd':\n", " pass\n", " #Write your code here\n", " elif self.method == 'minibatch':\n", " pass\n", " #Write your code here\n", "\n", " def plot_loss(self):\n", " pass\n", " \n", " def animate_gradient_descent(self):\n", " pass\n", " \n", " def adjusted_R_squared(self):\n", " pass" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 281 }, "id": "0GE7ucszJdcg", "outputId": "882480a7-62d7-4b5b-92f5-009f33d11856" }, "outputs": [], "source": [ "#Do not modify this cell as this is for testing\n", "multivariatelr = multivariate_LR(X, Y, 0.8, 20, 0.02, False, False, 'sgd')\n", "multivariatelr.gradient_descent()\n", "multivariatelr.animate_gradient_descent()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2\n", "Read the Bykea delivery dataset and split it into a feature matrix and an output vector that contains **delivery charge**. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "8qyIDvyVHJSt" }, "outputs": [], "source": [ "#Write your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the already imported `geodesic` function to calcuate distance between the pickup and dropoff points. Make a distance column in the feature matrix and drop the latitude and longitude columns. `geodesic` calculates distances as follows." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "nX_aLoH-HRxD" }, "outputs": [], "source": [ "geodesic((24.8607, 67.0011),(31.5204, 74.3587)).km" ] }, { "cell_type": "markdown", "metadata": { "id": "s69RpbJ-3QgO" }, "source": [ "Use the `multivariate_LR` class with appropriate arguments to plot the loss, return the adjusted $R^2$ value for the model, and return the model's prediction for the test data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 308 }, "id": "a5xh2iLT2GrL", "outputId": "933bb1b0-9b3e-400c-900b-f630ceef0523" }, "outputs": [], "source": [ "#Write your code here" ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 1 }