Lesson 2 Project: Using the Position-Salary Dataset on Kaggle
In this project, we'll be using the "Position Salary" dataset from Kaggle, which is perfect for learning. Why? Because the data is relatively small, and it shows a clear relationship between job positions and salaries. You all like having jobs, right?
We will be utilizing the following topics:
- What linear regression is and how it works
- What polynomial regression is and how it differs from linear regression
- When to use polynomial regression instead of linear regression
- How to implement both regression types using Python and scikit-learn
- How to visualize and interpret your results
Instructions
Setting Up Your Environment
Before we begin, you'll need to install the required libraries to your system, if you haven't done so already. In a terminal instance, run this command to make sure you have all the right packages installed.
The Dataset
You can't do machine learning without any data, now. We're going to use the website Kaggle for this. Kaggle is a platform dedicated to hosting data and competitions around machine learning. It's where today's dataset is hosted on.
- Go to Kaggle and search for "Position Salary Dataset" or use this direct link: Position Salary Dataset
- Download the dataset (CSV file) to your system.
- Save it to your project folder.
The Program
Now that you have everything installed, let's begin.
- First, let's import all the necessary libraries:
- Now let's load the dataset and take a look at it:
- For regression, we need to separate our features (X) and target variable (y):
- Before implementing polynomial regression, let's first create a linear regression model to use as a baseline for comparison:
- Now let's implement polynomial regression:
- Let's visualize both models to compare them:
- Now that you have built your models, you can use them to make predictions:
Extend Your Project
- Now that you know how to set up a dataset like this on your Jetson, apply the same technique to these other projects.
- CalCOFI: Over 60 years of oceanographic data: Is there a relationship between water salinity & water temperature? Can you predict the water temperature based on salinity?
- Weather in Szeged 2006-2016: Is there a relationship between humidity and temperature? What about between humidity and apparent temperature? Can you predict the apparent temperature given the humidity?
- Weather Conditions in World War Two: Is there a relationship between the daily minimum and maximum temperature? Can you predict the maximum temperature given the minimum temperature?