Working with data

Day 7 Project: Movie Budgets

Welcome to the first end of week project in the 30 Days of Python series! For this project, we're going to be writing a program to analyse some movie data.

In particular we're going to finding the average budget of the films in our data set, and we're going to identify high budget films that exceed the average budget we calculate.

The brief

Below you'll find a list which contains the relevant data about a selection of movies. Each item in the list is a tuple containing a movie name and movie budget in that order:

movies = [
    ("Eternal Sunshine of the Spotless Mind", 20000000),
    ("Memento", 9000000),
    ("Requiem for a Dream", 4500000),
    ("Pirates of the Caribbean: On Stranger Tides", 379000000),
    ("Avengers: Age of Ultron", 365000000),
    ("Avengers: Endgame", 356000000),
    ("Incredibles 2", 200000000)
]

For this project, your program should do the following:

Calculate the average budget of all movies in the data set.
Print out every movie that has a budget higher than the average you calculated. You should also print out how much higher than the average the movie's budget was.
Print out how many movies spent more than the average you calculated.

If you want a little extra challenge, allow users to add more movies to the data set before running the calculations.

You can do this by asking the user how many movies they want to add, which will allow you to use a for loop and range to repeat some code a given number of times. Inside the for loop, you can write some code that takes in some user input and appends a movie tuple containing the collected data to the movie list.

Our solution

Below is our solution walkthrough, including the Extra challenges in this project.

We've got a video walkthrough as well which covers the main parts of this project.

Before we can do anything else, we really need to know the average budget for our data set, so this should be our first step.

In order to calculate the average budget, we need two things:

The total budget spend across all movies.
The number of movies.

To calculate the sum of all of the budgets, we can start by creating a variable called total_budget and setting an initial value of 0.

Then using a for loop, we can go through each movie in the movies list. We'll take the budget for each movie and add it to our total, like so:

movies = [
    ("Eternal Sunshine of the Spotless Mind", 20000000),
    ("Memento", 9000000),
    ("Requiem for a Dream", 4500000),
    ("Pirates of the Caribbean: On Stranger Tides", 379000000),
    ("Avengers: Age of Ultron", 365000000),
    ("Avengers: Endgame", 356000000),
    ("Incredibles 2", 200000000)
]

total_budget = 0

for movie in movies:
    total_budget = total_budget + movie[1]

Now that we have that, we can calculate the average by dividing total_budget by the number of movies.

We'll use the len() function to get the number of movies.

movies = [
    ("Eternal Sunshine of the Spotless Mind", 20000000),
    ("Memento", 9000000),
    ("Requiem for a Dream", 4500000),
    ("Pirates of the Caribbean: On Stranger Tides", 379000000),
    ("Avengers: Age of Ultron", 365000000),
    ("Avengers: Endgame", 356000000),
    ("Incredibles 2", 200000000)
]

total_budget = 0

for movie in movies:
    total_budget = total_budget + movie[1]

average_budget = total_budget / len(movies)

Great! We've got the average budget.

However, since we're dealing with millions of dollars here, it doesn't make a lot of sense for us to keep this value as a float. I think we should change it to an integer so that we don't get random decimal values in our output.

movies = [
    ("Eternal Sunshine of the Spotless Mind", 20000000),
    ("Memento", 9000000),
    ("Requiem for a Dream", 4500000),
    ("Pirates of the Caribbean: On Stranger Tides", 379000000),
    ("Avengers: Age of Ultron", 365000000),
    ("Avengers: Endgame", 356000000),
    ("Incredibles 2", 200000000)
]

total_budget = 0

for movie in movies:
    total_budget = total_budget + movie[1]

average_budget = int(total_budget / len(movies))

Next up, we need to once again iterate through the movies, checking whether each one is over the budget average. If a movie has a budget greater than the average budget we just calculated, we'll print it. Otherwise we'll just move onto the next movie.

movies = [
    ("Eternal Sunshine of the Spotless Mind", 20000000),
    ("Memento", 9000000),
    ("Requiem for a Dream", 4500000),
    ("Pirates of the Caribbean: On Stranger Tides", 379000000),
    ("Avengers: Age of Ultron", 365000000),
    ("Avengers: Endgame", 356000000),
    ("Incredibles 2", 200000000)
]

total_budget = 0

for movie in movies:
    total_budget = total_budget + movie[1]

average_budget = int(total_budget / len(movies))

for movie in movies:
    if movie[1] > average_budget:
        over_average_cost = movie[1] - average_budget
        print(f"{movie[0]} cost ${movie[1]}: ${over_average_cost} over average.")

For each movie we're also calculating how much over the average budget they are, and printing it out as part of a nice formatted string. Here we've used an f-string, but you could also use the format method if you prefer.

Finally, we need to keep track of how many movies are over the average budget, so that we can tell the user that information at the end.

You can do this in two ways:

We can create a new variable that will keep count of the number of movies that are over budget. You could add 1 to this variable inside the if statement.
We can create a list where we put the movies that are over the average budget. At the end we can calculate the length of that list using len, which will tell us the number of movies over the average budget.

I'm going to go for the second option in this case:

movies = [
    ("Eternal Sunshine of the Spotless Mind", 20000000),
    ("Memento", 9000000),
    ("Requiem for a Dream", 4500000),
    ("Pirates of the Caribbean: On Stranger Tides", 379000000),
    ("Avengers: Age of Ultron", 365000000),
    ("Avengers: Endgame", 356000000),
    ("Incredibles 2", 200000000)
]

high_budget_movies = []
total_budget = 0

for movie in movies:
    total_budget = total_budget + movie[1]

average_budget = int(total_budget / len(movies))

for movie in movies:
    if movie[1] > average_budget:
        high_budget_movies.append(movie)
        over_average_cost = movie[1] - average_budget
        print(f"{movie[0]} cost ${movie[1]}: ${over_average_cost} over average.")

print(f"There were {len(high_budget_movies)} movies with over average budgets.")

This looks great, but if we want to be really fancy, we can add comma separators to the long budget figures to make them easier to read. In order to do this, we need to change this line:

print(f"{movie[0]} cost ${movie[1]}: ${over_average_cost} over average.")

To this:

print(f"{movie[0]} cost ${movie[1]:,}: ${over_average_cost:,} over average.")

This :, is part of a special formatting language that built into Python that allows us to format strings in various ways. It's not something that we're going to be covering in depth in this series, but you can find information in the official documentation.

Extra

For the extra part of this assignment, we're going to:

Ask the user how many movies they want to add to the list.
Use range and a for loop to perform some option the specified number of times.
Ask the user for a movie name and budget during each iteration of the loop, and append a tuple to the movies list containing this information.

We'll add this code directly below the definition of our movies variable.

movies = [
    ("Eternal Sunshine of the Spotless Mind", 20000000),
    ("Memento", 9000000),
    ("Requiem for a Dream", 4500000),
    ("Pirates of the Caribbean: On Stranger Tides", 379000000),
    ("Avengers: Age of Ultron", 365000000),
    ("Avengers: Endgame", 356000000),
    ("Incredibles 2", 200000000)
]

new_movie_count = int(input("Enter how many new movies you wish to add: "))

for _ in range(new_movie_count):
    name = input("Enter new movie name: ")
    budget = int(input("Enter new movie budget: "))
    new_movie = (name, budget)
    movies.append(new_movie)

high_budget_movies = []
total_budget = 0

for movie in movies:
    total_budget = total_budget + movie[1]

average_budget = int(total_budget / len(movies))

for movie in movies:
    if movie[1] > average_budget:
        high_budget_movies.append(movie)
        over_average_cost = movie[1] - average_budget
        print(f"{movie[0]} cost ${movie[1]}: ${over_average_cost} over average.")

print(f"There were {len(high_budget_movies)} movies with over average budgets.")

And that's it!

This project was a fair bit longer than the previous ones we've tried, so well done for getting this far. I hope completing this project has helped cement your knowledge of the first week. Now you're ready for week 2!