{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Linear regression exercises\n", "\n", "Credits: Matthew Graham, Pavlos Protopapas\n", "\n", "This notebook provides exercises on linear regression. \n", "\n", "First we need to do some Python setup." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "import sys\n", "import numpy as np\n", "import scipy as sp\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import sklearn as sk\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LinearRegression\n", "sns.set(style=\"ticks\")\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load and Explore Data\n", "\n", "For these exercises, we're going to use a data set of galaxies with known (spectroscopically confirmed) redshifts and SDSS magnitudes. We're interested in determining the redshift of a galaxy from its colors (photometric redshift).\n", "First we will load the data and have a look at it. The data can be downloaded from: http://www.astro.caltech.edu/~mjg/sdss_gal.csv.gz\n", "\n", "Note that you will need to uncompress the file before using it." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "sdss_gal_df = pd.read_csv('sdss_gal.csv', low_memory=False)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | u-g | \n", "g-r | \n", "r-i | \n", "i-z | \n", "redshift | \n", "
---|---|---|---|---|---|
0 | \n", "1.88235 | \n", "0.95459 | \n", "0.44631 | \n", "0.32659 | \n", "0.091214 | \n", "
1 | \n", "1.97871 | \n", "0.95931 | \n", "0.46358 | \n", "0.32285 | \n", "0.117409 | \n", "
2 | \n", "1.84007 | \n", "0.92670 | \n", "0.40268 | \n", "0.32295 | \n", "0.091852 | \n", "
3 | \n", "1.89717 | \n", "1.09666 | \n", "0.47545 | \n", "0.34684 | \n", "0.153276 | \n", "
4 | \n", "0.98144 | \n", "0.38145 | \n", "0.34404 | \n", "0.04365 | \n", "0.090731 | \n", "