Lesson 2 Lecture

What is Regression Analysis?

We're sure that you're aware of how cause and effect works. If a thing A changes, thing B also changes. When demand goes up, supply goes down. When more lanes are added to freeways, more cars use the freeway. People frequently graph data onto charts to see relations between things and to identify trends. Most commonly, you'll see it in the form of your typical X and Y coordinate graph with points plotted and a line going through.

This process is called regression analysis -- the process of plotting points, identifying statistical trends, and then using that data to make predictions. If we sold 9 muffins on Friday, and then 12 muffins on Saturday, and then 14 muffins on Sunday, then what can we say about the relationship of the days of the week and the muffins we sell, and more importantly, how many muffins do we expect to sell on Monday?

Here's another example: imagine you're trying to figure out how the price of a house might be connected to things like its size, number of bedrooms, or location. Regression analysis helps us understand these relationships and make predictions based on what we know. For example, if you've ever noticed that larger houses tend to cost more, or that houses closer to the city center are generally more expensive, you're already thinking in terms of regression analysis.

Linear Regression

The Concept

Linear regression is a form of regression where we suspect that data on a graph follows a pretty linear, straightforward trend. For example, because of how the points are arranged, you could reasonably draw a line through the points on our muffin sales chart and use that line to predict how many muffins you will sell on a certain day. We call this the line of best fit. Linear regression is really just programmatically drawing that line.

Let's use a simple example: imagine you're running a lemonade stand. You notice that on hotter days, you sell more lemonade. The relationship between temperature and sales might follow a simple pattern:

This is just like the equation y = mx + b that we use in linear regression, where:

  • Sales (y) is what we're trying to predict
  • Temperature (x) is what we're using to make predictions
  • The "some number" (m) tells us how much more lemonade we sell for each degree hotter
  • Base sales (b) is how much we sell even on a cool day

Strengths and Limitations

Linear regression functions as a reliable tool - it's simple to understand, easy to use, and gets the job done in many situations. Linear regression performs well with straightforward relationships but struggles with complex patterns.

For instance, while linear regression might work well for predicting house prices based on square footage in a small neighborhood, it might not capture the complex relationship between a house's features and its price in a diverse city where factors like school districts, historical significance, or architectural style play important roles.

When to Use Linear Regression

Linear regression is perfect for situations where you expect a straightforward relationship. For example, it works well when predicting:

  • How much more product you'll sell when you lower the price
  • How a student's study time relates to their test scores
  • How the amount of fertilizer affects crop yield

Polynomial Regression

The Concept

Here is a short exercise; draw a line of best fit through these points.

You will notice that it's not really that possible. At best, you can draw a perfectly straight horizontal line that connects the start and end. But the line becomes really inaccurate when it comes to the maximum and minimum of the data. In other words, this data is just not appropriate for a line of best fit.

Instead, a polynomial is a better option. For those who aren't super math-savvy, a polynomial is a kind of math equation that involves several powers of exponents. To oversimplify things, they're just curves.

While linear regression draws straight lines, polynomial regression creates curved lines that can bend and twist to better fit the data. Picture a flexible garden hose that can curve around obstacles, compared to a straight pipe that cannot.

The trick behind polynomials is that all polynomials can be set up to fit over any set of plotted points, as long as the points don't do something weird like form a loop-de-loop (which they never will, anyway). The general formula for a polynomial is like so.

Polynomial regression techniques all center around finding values of

b_0
,
b_1
,
b_2
, etc. as well as the exponents to use such that the final equation, when plotted, will fit the points on a graph.

Strengths and Limitations

Polynomial regression serves as a versatile tool - it can handle many different situations. However, polynomial regression can be unnecessarily complex for straightforward relationships.

For example, while it might work great for predicting how temperature affects ice cream sales (which typically peak at moderate temperatures), it could lead to unrealistic predictions if we try to predict sales at extreme temperatures we haven't observed before.

When to Use Polynomial Regression

Polynomial regression is particularly useful when you notice curved patterns in your data. Some real-world examples include:

  • Predicting crop yields based on rainfall (too little or too much rain both reduce yields)
  • Modeling the relationship between car speed and fuel efficiency
  • Understanding how study time relates to test scores (diminishing returns after a certain point)

Comparing Linear and Polynomial Regression

Visual and Mathematical Differences

Consider the key differences between linear and polynomial regression:

  • Linear regression creates straight-line relationships - it's direct and predictable
  • Polynomial regression creates curved relationships - it can model complex patterns in the data

Choosing Between Them

Selecting between these methods requires understanding your data's characteristics. For simple, linear relationships, linear regression is sufficient. For complex, non-linear patterns, polynomial regression provides the necessary flexibility.

Frankly, the only practical way to go about it is to use both and see which one is more accurate. Alternatively, you can use a program to plot your data onto a graph, and based on how the graph looks you can figure out which one to use. If the points form a general line, then of course you would use a linear regression technique. Otherwise, consider using a polynomial regression technique.