love baseball? Want to code? Start here. Using Python and baseball's biggest stars and stats, we'll learn variables, loops, lists, and more!
This project is in progress and open to feedback. 💡
Why Python?
Python is one of the easiest programming languages to read and write, which makes it perfect for learning to code for the first time. Python is also lightweight and great for data - Python doesn’t force you to memorize lots of symbols. You focus on what you want to do, not how complicated the language is.
Python is like learning the rules of baseball before learning advanced analytics — it builds a foundation you can grow on.
Open Terminal
Applications → Utilities → Terminal
Run:
python3 --version
What you’ll see
✅ Python 3.x.x → Python is installed
❌ command not found → we’ll install it
macOS sometimes ships with a system Python — we will NOT use python, only python3.
What is Homebrew? Homebrew is a popular, free, open-source package manager for macOS and Linux that simplifies installing, updating, and managing software from the command line.
So to install homebrew, we'll simply copy the following command in our terminal. (Open Terminal appliction)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
After it finishes, let's verify that it was installed by checking the version.
brew --version
brew install python
Verify:
python3 --version
pip3 --versionWe should see something like this:
Python 3.12.x
pip 23.xIn terminal, first, we'll navigate to our Desktop. (alternatively, we can use another location but for beginners, I recommed Desktop)
mkdir baseball_python
cd baseball_pythoncd changes our directory to baseball_pyhton
Create our first file:
touch baseball.py💥 You created your first python file! Congrats.
Install VSCode
Download from VSCode
Once downloaded, we can open up our project.
In your terminal window, we can simply type:
code .The project should open in VSCode.
You can also open VSCode, select Open and select your baseball.py file.
If code doesn't work, Cmd + Shift + P -> Shell Command: Install 'Code' command
Open baseball.py and type (or copy):
print("Hello, baseball Python!")❗️ you will need to add the Python extension in VSCode. It should prompt you when adding a python file.
In VS Code:
- Click the Extensions icon on the left sidebar
- Search for Python
- Install Python
This extension gives us:
- Syntax highlighting
- Helpful error messages
- Easy run buttons
Run your file either in VSCode or in Terminal.
In VSCode, select ▶Run. The intergrated terminal should open and you should see:
Hello, baseball Python!
Or in Terminal, type:
python3 baseball.pyYou'll then see:
Hello, baseball Python!
🎉 That’s it. You just ran your first Python file.
Why this matters
print()shows output.pyfiles are Python programspython3 filename.pyruns the file in Terminal
Python really shines when working with data, and baseball is full of it.
We’ll install a few libraries that help us work with stats.
pip install pandas numpy matplotlibwhat these do:
- pandas → tables of player stats
- numby → math and calculations
- matplotlib → charts and graphs
Create a new file:
touch stats.pyOpen it and add:
import pandas as pd
data = {
"player": ["Judge", "Ohtani", "Trout"],
"hits": [2, 3, 1],
"at_bats": [4, 5, 3]
}
df = pd.DataFrame(data)
df["average"] = df["hits"] / df["at_bats"]
print(df)Run it and you should see the following output:
| Player | Hits | At Bats | Batting Average |
|---|---|---|---|
| Judge | 2 | 4 | 0.500 |
| Ohtani | 3 | 5 | 0.600 |
| Trout | 1 | 3 | 0.333 |
👏 Now we're ready to code with baseball ⚾️
Before we can do anything interesting in Python, we need to understand variables.
What is a variable?
A variable is a named container that stores a value.
In baseball terms: *A stat (hits, at-bats, runs) is a value
*The stat name is the variable
Example: open baseball.py and replace the code with this:
player_name = "Judge"
hits = 2
at_bats = 4Here's what's happening:
player_namestores text (a string)hitsstores a number- ```at_bats```` stores a number
Using variables together
Now let’s calculate a batting average:
batting_average = hits / at_bats
print(player_name)
print(batting_average)When we run the file, you should see:
Judge
0.5Making the output easier to read
Python lets us format output so it looks nicer:
print(f"{player_name}'s batting average is {batting_average:.3f}")Output:
Judge's batting average is 0.500Why Variables matter:
- Store baseball stats
- Reuse values
- Change data without rewriting code
If Judge gets another hit, we only change one number:
hits = 3Everything else updates automatically.
Takeaways
- Variables store information
- Numbers don’t need quotes
- Text does need quotes
- Variables make stats reusable and flexible
Python needs to know what kind of data it’s working with.
These are called data types.
For now, we’ll focus on the two most important ones:
- Strings (text)
- Numbers (integers and decimals) Strings (text)
A string is any text wrapped in quotes.
Baseball examples:
player_name = "Judge"
team = "Yankees"
position = "RF"Key rule:
- Strings must be inside quotes (" " or ' ')
If you forget the quotes, Python thinks it’s a variable and will throw an error.
Numbers
Numbers are used for math and do NOT use quotes.
hits = 2
at_bats = 4
batting_average = 0.500There are two main number types you’ll see:
Integers (whole numbers)
hits = 2
at_bats = 4Floats (decimals)
batting_average = 0.500Why data types matter
Python treats strings and numbers very differently.
This works (math with numbers):
hits = 2
at_bats = 4
print(hits / at_bats)Output:
0.5This does NOT work (math with strings):
hits = "2"
at_bats = "4"
print(hits / at_bats)❌ Python will error because text can’t be divided.
Mixing strings and numbers (the right way)
If you want to print text and numbers together, use an f-string:
player_name = "Judge"
hits = 2
at_bats = 4
average = hits / at_bats
print(f"{player_name} has a batting average of {average:.3f}")Output:
Judge has a batting average of 0.500Checking a variable’s data type
You can ask Python what type something is:
print(type(player_name))
print(type(hits))
print(type(average))Output:
<class 'str'>
<class 'int'>
<class 'float'>Common beginner mistake 🚨
❌ This looks right but is wrong:
hits = "2"✅ This is correct:
hits = 2
Remember:
- Quotes = text
- No quotes = number
Key takeaways
- Strings = text (names, teams, positions)
- Integers = whole numbers (hits, at-bats)
- Floats = decimals (averages)
- Python needs correct data types to do math
So far, we’ve worked with one player at a time. But baseball is a team sport — we need a way to store multiple players together.
That’s where lists come in.
Creating a List
players = ["Judge", "Ohtani", "Trout"]Things to remember:
- Lists use square brackets []
- Items are separated by commas
- Order matters
Accessing items in a list
Each item in a list has a position called an index. Indexes start at 0, not 1.
print(players[0])
print(players[1])
print(players[2])Output:
Judge
Ohtani
TroutAdding players to a list
You can add a new player to the roster using .append() :
players.append("Betts")
print(players)Output:
['Judge', 'Ohtani', 'Trout', 'Betts']Counting players on the roster
To see how many players are in the list, use len() :
print(len(players))Output:
4Lists of numbers (baseball stats)
Lists aren’t just for names — they’re great for stats too:
hits = [2, 3, 1]
at_bats = [4, 5, 3]Each position lines up with the same player:
hits[0]→ Judgehits[1]→ Ohtanihits[2]→ Trout
Using list values in calculations
average = hits[0] / at_bats[0]
print(average)
Output:
0.5Common beginner mistakes 🚨
❌ Using parentheses instead of brackets:
players = ("Judge", "Ohtani", "Trout")❌ Forgetting indexes start at 0:
print(players[1])(This prints the second player, not the first.)
Key takeaways
- Lists store multiple values
- Lists use square brackets
- Indexes start at 0
- Lists are perfect for rosters and stat groups
Lists are great for storing multiple values, but they don’t tell us what each value represents.
For example, this works:
hits = [2, 3, 1]But which number is hits? At-bats? Walks?
That’s where dictionaries come in.
What is a dictionary?
A dictionary stores data as key–value pairs.
Think of a dictionary like a player card:
-
The key is the stat name
-
The value is the stat itself
Creating a dictionary
Here’s a dictionary for one player:
player = {
"name": "Judge",
"hits": 2,
"at_bats": 4
}Things to notice:
-
Dictionaries use curly braces { }
-
Keys are strings
-
Each key maps to a value using :
Accessing values in a dictionary
You access values by using the key name:
print(player["name"])
print(player["hits"])
print(player["at_bats"])Output:
Judge
2
4Using dictionary values in calculations
average = player["hits"] / player["at_bats"]
print(average)Output:
0.5Now the code clearly shows what stat is being used.
Updating values
If a player gets another hit, you can update the dictionary:
player["hits"] = 3Recalculate:
average = player["hits"] / player["at_bats"]
print(average)Output:
0.75Adding new stats
You can add new key–value pairs at any time:
player["walks"] = 1
player["rbi"] = 2Common beginner mistakes 🚨
❌ Forgetting quotes around keys:
player[hits]✅ Correct:
player["hits"]Now we’re going to combine what you’ve learned so far:
- Lists (multiple players)
- Dictionaries (player stats)
This is how you represent a real roster in Python.
What are we building
Instead of one player:
player = {"name": "Judge", "hits": 2, "at_bats": 4}We'll store many players:
roster = [
{"name": "Judge", "hits": 2, "at_bats": 4},
{"name": "Ohtani", "hits": 3, "at_bats": 5},
{"name": "Trout", "hits": 1, "at_bats": 3}
]Each item in the list is a dictionary representing one player.
Accessing a player
print(roster[0])Output:
{'name': 'Judge', 'hits': 2, 'at_bats': 4}Accessing a specific stat
print(roster[0]["name"])
print(roster[0]["hits"])Output:
Judge
2Looping through the roster
This is where Python starts to feel powerful.
for player in roster:
average = player["hits"] / player["at_bats"]
print(f"{player['name']} batting average: {average:.3f}")Output:
Judge batting average: 0.500
Ohtani batting average: 0.600
Trout batting average: 0.333Why this matters
This pattern is used everywhere:
- Sports analytics
- Databases
- APIs
- Real-world applications
If you understand this, you’re officially past the beginner line.
So far, we’ve manually accessed players one at a time. That doesn’t scale. Baseball has lineups, innings, and seasons.
That’s where for loops come in.
What is a for loop?
A for loop lets Python repeat an action for each item in a collection.
In baseball terms:
- A lineup = list
- Each at-bat = one loop
Basic for loop example
players = ["Judge", "Ohtani", "Trout"]
for player in players:
print(player)Output:
Judge
Ohtani
TroutPython reads this as: For each player in the list, print the player’s name.”
Looping through a roster of player stats
Using our roster from the previous step:
roster = [
{"name": "Judge", "hits": 2, "at_bats": 4},
{"name": "Ohtani", "hits": 3, "at_bats": 5},
{"name": "Trout", "hits": 1, "at_bats": 3}
]
for player in roster:
average = player["hits"] / player["at_bats"]
print(f"{player['name']} average: {average:.3f}")Output:
Judge average: 0.500
Ohtani average: 0.600
Trout average: 0.333Why indentation matters 🚨
Python uses indentation, not braces.
This works:
for player in roster:
print(player["name"])This does NOT:
for player in roster:
print(player["name"])Indentation tells Python what belongs inside the loop.
Simulating an inning
Think of a loop like an inning where each batter comes up once:
batters = ["Judge", "Ohtani", "Trout"]
for batter in batters:
print(f"{batter} is at the plate")Output:
Judge is at the plate
Ohtani is at the plate
Trout is at the plateKey takeaways
- For loops repeat actions
- Loops work perfectly with lists
- Indentation matters in Python
- Loops model real baseball sequences
So far, we’ve written the same calculations more than once. In real programming, we don’t want to repeat ourselves.
That’s where functions come in.
What is a function?
A function is a reusable block of code that performs a specific task.
In baseball terms:
- A function is like a stat formula
- You give it inputs (hits, at-bats)
- It gives you an output (batting average)
Your first function
def batting_average(hits, at_bats):
return hits / at_batsWhat this means:
defstarts a functionhitsandat_batsare inputs (parameters)returnsends back a result
Using the function
avg = batting_average(2, 4)
print(avg)Output:
0.5Using the function with player data
player = {"name": "Judge", "hits": 2, "at_bats": 4}
avg = batting_average(player["hits"], player["at_bats"])
print(f"{player['name']} average: {avg:.3f}")
Output:
Judge average: 0.500Using functions inside loops
roster = [
{"name": "Judge", "hits": 2, "at_bats": 4},
{"name": "Ohtani", "hits": 3, "at_bats": 5},
{"name": "Trout", "hits": 1, "at_bats": 3}
]
for player in roster:
avg = batting_average(player["hits"], player["at_bats"])
print(f"{player['name']} average: {avg:.3f}")Output:
Judge average: 0.500
Ohtani average: 0.600
Trout average: 0.333Why functions matter
Functions:
- Prevent duplicated code
- Make programs easier to read
- Let you change logic in one place
- Mirror real baseball formulas
Key takeaways
- Functions package logic into reusable blocks
- Inputs go in parentheses
returnsends data back- Functions model real baseball stats
Batting average is useful, but baseball uses multiple stats to measure performance.
In this step, we’ll:
- Create multiple stat formulas
- Use functions for each one
- Combine stats like real baseball analytics
The stats we’ll calculate
We’ll start with three common ones:
Batting Average (BA) BA = hits / at_bats
On-Base Percentage (OBP) OBP = (hits + walks) / (at_bats + walks)
Slugging Percentage (SLG) SLG = total_bases / at_bats
OPS OPS = OBP + SLG
Batting Average function (review)
def batting_average(hits, at_bats):
return hits / at_batsOn-Base Percentage (OBP)
def on_base_percentage(hits, walks, at_bats):
return (hits + walks) / (at_bats + walks)Slugging Percentage (SLG)
def slugging_percentage(total_bases, at_bats):
return total_bases / at_batsOPS (combining stats)
def ops(obp, slg):
return obp + slgUsing the functions with a player
player = {
"name": "Judge",
"hits": 2,
"walks": 1,
"at_bats": 4,
"total_bases": 5
}
ba = batting_average(player["hits"], player["at_bats"])
obp = on_base_percentage(player["hits"], player["walks"], player["at_bats"])
slg = slugging_percentage(player["total_bases"], player["at_bats"])
player_ops = ops(obp, slg)
print(f"{player['name']} BA: {ba:.3f}")
print(f"{player['name']} OBP: {obp:.3f}")
print(f"{player['name']} SLG: {slg:.3f}")
print(f"{player['name']} OPS: {player_ops:.3f}")Output:
Judge BA: 0.500
Judge OBP: 0.600
Judge SLG: 1.250
Judge OPS: 1.850Why this matters
This is real-world programming:
- Small, focused functions
- Reusable logic
- Clear formulas
- Readable output
This is how analytics code is actually written.
Key takeaways
- One stat = one function
- Functions can build on each other
- Code mirrors real baseball formulas
- This is foundational analytics logic
In baseball, decisions are everywhere:
- Is the player a starter or a bench player?
- Did they qualify for the batting title?
- Is their average above .300?
Python uses conditionals to make decisions like these.
The if statement
average = 0.325
if average >= 0.300:
print("This player is hitting .300 or better!")Output:
This player is hitting .300 or better!
if / else
What if the condition isn't true?
average = 0.275
if average >= 0.300:
print("Hitting .300 or better!")
else:
print("Below .300")Output:
Below .300
if / elif / else (multiple conditions)
average = 0.285
if average >= 0.300:
print("Elite hitter")
elif average >= 0.250:
print("Solid hitter")
else:
print("Struggling at the plate")Output:
Solid hitter
Using conditionals with player data
player = {"name": "Judge", "hits": 2, "at_bats": 4}
average = player["hits"] / player["at_bats"]
if average >= 0.300:
status = "All-Star caliber"
elif average >= 0.250:
status = "Starting lineup"
else:
status = "Needs work"
print(f"{player['name']}: {average:.3f} - {status}")Output:
Judge: 0.500 - All-Star caliber
Conditionals inside loops
Now let's evaluate an entire roster:
roster = [
{"name": "Judge", "hits": 2, "at_bats": 4},
{"name": "Ohtani", "hits": 3, "at_bats": 5},
{"name": "Trout", "hits": 1, "at_bats": 5}
]
for player in roster:
avg = player["hits"] / player["at_bats"]
if avg >= 0.300:
status = "🔥 Hot"
elif avg >= 0.250:
status = "✓ Solid"
else:
status = "❄️ Cold"
print(f"{player['name']}: {avg:.3f} {status}")Output:
Judge: 0.500 🔥 Hot
Ohtani: 0.600 🔥 Hot
Trout: 0.200 ❄️ Cold
Comparison operators
| Operator | Meaning |
|---|---|
== |
Equal to |
!= |
Not equal to |
> |
Greater than |
< |
Less than |
>= |
Greater than or equal |
<= |
Less than or equal |
Common beginner mistake 🚨
❌ Using = instead of == for comparison:
if average = 0.300: # Wrong!✅ Correct:
if average == 0.300: # Right!Remember: = assigns a value, == compares values.
Key takeaways
ifchecks a conditionelifchecks additional conditionselsehandles everything else- Conditionals let your code make decisions
- Indentation matters!
Real baseball data isn't perfect. What happens when a player has 0 at-bats?
player = {"name": "Pinch Runner", "hits": 0, "at_bats": 0}
average = player["hits"] / player["at_bats"]❌ Error: ZeroDivisionError: division by zero
In Python, dividing by zero crashes your program. Let's fix that.
Using conditionals to prevent errors
def safe_batting_average(hits, at_bats):
if at_bats == 0:
return 0.0
return hits / at_batsNow it's safe:
avg = safe_batting_average(0, 0)
print(avg) # Output: 0.0Applying this to a roster
roster = [
{"name": "Judge", "hits": 2, "at_bats": 4},
{"name": "Ohtani", "hits": 3, "at_bats": 5},
{"name": "Pinch Runner", "hits": 0, "at_bats": 0}
]
def safe_batting_average(hits, at_bats):
if at_bats == 0:
return 0.0
return hits / at_bats
for player in roster:
avg = safe_batting_average(player["hits"], player["at_bats"])
print(f"{player['name']}: {avg:.3f}")Output:
Judge: 0.500
Ohtani: 0.600
Pinch Runner: 0.000
Using try/except for error handling
Python also has a way to catch errors when they happen:
def batting_average_safe(hits, at_bats):
try:
return hits / at_bats
except ZeroDivisionError:
return 0.0This says: "Try to divide. If it fails with a ZeroDivisionError, return 0 instead."
When to use each approach
| Approach | When to use |
|---|---|
if check |
When you can predict the problem |
try/except |
When errors might come from many sources |
Real-world example: Qualifying for batting title
In MLB, you need 3.1 plate appearances per team game to qualify for the batting title.
def qualifies_for_batting_title(plate_appearances, team_games=162):
required = team_games * 3.1
return plate_appearances >= required
player = {"name": "Judge", "plate_appearances": 550}
if qualifies_for_batting_title(player["plate_appearances"]):
print(f"{player['name']} qualifies for the batting title")
else:
print(f"{player['name']} does not qualify")Output:
Judge qualifies for the batting title
Key takeaways
- Always consider edge cases (zero, missing data, etc.)
- Use
ifstatements to check before calculating - Use
try/exceptto catch unexpected errors - Defensive coding prevents crashes
Now let's do something every baseball fan loves: rankings.
We'll learn to:
- Sort players by a stat
- Filter players who meet criteria
- Find the league leader
Sorting a list
Python's sorted() function arranges items in order:
averages = [0.275, 0.320, 0.298, 0.345]
sorted_averages = sorted(averages)
print(sorted_averages)Output:
[0.275, 0.298, 0.32, 0.345]
Sorting in descending order (highest first)
sorted_averages = sorted(averages, reverse=True)
print(sorted_averages)Output:
[0.345, 0.32, 0.298, 0.275]
Sorting a roster by batting average
Here's where it gets useful:
roster = [
{"name": "Judge", "hits": 150, "at_bats": 450},
{"name": "Ohtani", "hits": 180, "at_bats": 500},
{"name": "Trout", "hits": 140, "at_bats": 400},
{"name": "Betts", "hits": 170, "at_bats": 520}
]
# Add batting average to each player
for player in roster:
player["average"] = player["hits"] / player["at_bats"]
# Sort by average (highest first)
sorted_roster = sorted(roster, key=lambda p: p["average"], reverse=True)
print("Batting Average Leaders:")
for i, player in enumerate(sorted_roster, 1):
print(f"{i}. {player['name']}: {player['average']:.3f}")Output:
Batting Average Leaders:
1. Ohtani: 0.360
2. Trout: 0.350
3. Judge: 0.333
4. Betts: 0.327
What is lambda?
lambda p: p["average"] is a mini-function that tells Python what to sort by.
Read it as: "For each player p, use their average value."
Filtering players
What if we only want players hitting .300 or better?
hot_hitters = [p for p in roster if p["average"] >= 0.300]
print("Players hitting .300+:")
for player in hot_hitters:
print(f" {player['name']}: {player['average']:.3f}")Output:
Players hitting .300+:
Judge: 0.333
Ohtani: 0.360
Trout: 0.350
Betts: 0.327
Finding the leader
To get just the top player:
leader = max(roster, key=lambda p: p["average"])
print(f"Batting champ: {leader['name']} ({leader['average']:.3f})")Output:
Batting champ: Ohtani (0.360)
Combining it all: A leaderboard function
def print_leaderboard(roster, stat, title, top_n=5):
sorted_players = sorted(roster, key=lambda p: p[stat], reverse=True)
print(f"\n{title}")
print("-" * 25)
for i, player in enumerate(sorted_players[:top_n], 1):
print(f"{i}. {player['name']}: {player[stat]:.3f}")
# Usage
print_leaderboard(roster, "average", "Batting Average Leaders")Key takeaways
sorted()arranges items in orderreverse=Truesorts highest to lowestlambdacreates quick sorting rules- List comprehensions filter data efficiently
max()andmin()find extremes
So far, we've typed our data by hand. Real baseball analysis uses data files.
The most common format is CSV (Comma-Separated Values).
What a CSV file looks like
name,team,hits,at_bats,home_runs
Judge,Yankees,177,550,62
Ohtani,Angels,160,497,34
Trout,Angels,123,400,40
Creating a sample CSV file
First, let's create one to practice with:
# Create a sample CSV file
csv_content = """name,team,hits,at_bats,home_runs
Judge,Yankees,177,550,62
Ohtani,Angels,160,497,34
Trout,Angels,123,400,40
Betts,Dodgers,180,550,35
Soto,Padres,156,480,30"""
with open("players.csv", "w") as file:
file.write(csv_content)
print("Created players.csv")Reading a CSV file
Python's csv module makes this easy:
import csv
roster = []
with open("players.csv", "r") as file:
reader = csv.DictReader(file)
for row in reader:
player = {
"name": row["name"],
"team": row["team"],
"hits": int(row["hits"]),
"at_bats": int(row["at_bats"]),
"home_runs": int(row["home_runs"])
}
roster.append(player)
print(f"Loaded {len(roster)} players")Output:
Loaded 5 players
Why int()?
CSV files store everything as text. We convert numbers using int() so we can do math with them.
Processing the loaded data
Now we can use everything we've learned:
# Add batting average
for player in roster:
player["average"] = player["hits"] / player["at_bats"]
# Find home run leader
hr_leader = max(roster, key=lambda p: p["home_runs"])
print(f"Home Run Leader: {hr_leader['name']} ({hr_leader['home_runs']} HR)")
# Find batting average leader
avg_leader = max(roster, key=lambda p: p["average"])
print(f"Batting Champ: {avg_leader['name']} ({avg_leader['average']:.3f})")
# List all players hitting .300+
print("\n.300 Hitters:")
for p in roster:
if p["average"] >= 0.300:
print(f" {p['name']}: {p['average']:.3f}")Output:
Home Run Leader: Judge (62 HR)
Batting Champ: Betts (0.327)
.300 Hitters:
Judge: 0.322
Ohtani: 0.322
Trout: 0.308
Betts: 0.327
Soto: 0.325
Using pandas (the easier way)
Remember pandas from Step 6? It makes CSV handling much simpler:
import pandas as pd
df = pd.read_csv("players.csv")
df["average"] = df["hits"] / df["at_bats"]
print(df.sort_values("average", ascending=False))Output:
name team hits at_bats home_runs average
3 Betts Dodgers 180 550 35 0.327273
4 Soto Padres 156 480 30 0.325000
0 Judge Yankees 177 550 62 0.321818
1 Ohtani Angels 160 497 34 0.321932
2 Trout Angels 123 400 40 0.307500
Key takeaways
- CSV files store data as text with commas
csv.DictReaderloads CSV rows as dictionaries- Convert strings to numbers for math
- pandas makes data work even easier
Numbers tell a story, but charts make it visual.
Let's use matplotlib to create baseball stat visualizations.
Setup
Make sure matplotlib is installed:
pip install matplotlibYour first chart: Bar chart of home runs
import matplotlib.pyplot as plt
players = ["Judge", "Ohtani", "Trout", "Betts", "Soto"]
home_runs = [62, 34, 40, 35, 30]
plt.figure(figsize=(10, 6))
plt.bar(players, home_runs, color='dodgerblue')
plt.title("2022 Home Run Leaders")
plt.xlabel("Player")
plt.ylabel("Home Runs")
plt.savefig("home_runs.png")
plt.show()
print("Chart saved as home_runs.png")Breaking it down:
plt.figure()- Creates a new chartplt.bar()- Makes a bar chartplt.title()- Adds a titleplt.xlabel()/plt.ylabel()- Labels the axesplt.savefig()- Saves the imageplt.show()- Displays the chart
Horizontal bar chart (better for names)
import matplotlib.pyplot as plt
players = ["Judge", "Ohtani", "Trout", "Betts", "Soto"]
averages = [0.322, 0.322, 0.308, 0.327, 0.325]
# Sort for visual clarity
sorted_data = sorted(zip(averages, players))
sorted_averages, sorted_players = zip(*sorted_data)
plt.figure(figsize=(10, 6))
plt.barh(sorted_players, sorted_averages, color='forestgreen')
plt.title("Batting Average Leaders")
plt.xlabel("Batting Average")
plt.xlim(0.250, 0.350) # Set x-axis range
plt.savefig("batting_averages.png")
plt.show()Scatter plot: Power vs. Contact
import matplotlib.pyplot as plt
players = ["Judge", "Ohtani", "Trout", "Betts", "Soto"]
home_runs = [62, 34, 40, 35, 30]
averages = [0.322, 0.322, 0.308, 0.327, 0.325]
plt.figure(figsize=(10, 6))
plt.scatter(averages, home_runs, s=100, color='red')
# Add player names as labels
for i, name in enumerate(players):
plt.annotate(name, (averages[i], home_runs[i]),
textcoords="offset points", xytext=(5, 5))
plt.title("Power vs. Contact")
plt.xlabel("Batting Average")
plt.ylabel("Home Runs")
plt.savefig("power_vs_contact.png")
plt.show()Using data from our roster
import matplotlib.pyplot as plt
roster = [
{"name": "Judge", "hits": 177, "at_bats": 550, "home_runs": 62},
{"name": "Ohtani", "hits": 160, "at_bats": 497, "home_runs": 34},
{"name": "Trout", "hits": 123, "at_bats": 400, "home_runs": 40},
{"name": "Betts", "hits": 180, "at_bats": 550, "home_runs": 35},
{"name": "Soto", "hits": 156, "at_bats": 480, "home_runs": 30}
]
# Calculate averages
for player in roster:
player["average"] = player["hits"] / player["at_bats"]
# Sort by home runs
roster_sorted = sorted(roster, key=lambda p: p["home_runs"], reverse=True)
names = [p["name"] for p in roster_sorted]
hrs = [p["home_runs"] for p in roster_sorted]
plt.figure(figsize=(10, 6))
plt.bar(names, hrs, color=['gold', 'silver', 'peru', 'gray', 'gray'])
plt.title("Home Run Standings")
plt.ylabel("Home Runs")
plt.savefig("hr_standings.png")
plt.show()Key takeaways
- matplotlib creates professional charts
plt.bar()for bar chartsplt.barh()for horizontal barsplt.scatter()for comparing two stats- Always label your axes and add a title
plt.savefig()saves your work
Congratulations! You've learned:
✅ Variables and data types
✅ Lists and dictionaries
✅ Loops and conditionals
✅ Functions
✅ File handling
✅ Data visualization
Now let's combine everything into a complete baseball analytics program.
The Final Project: Baseball Stats Dashboard
Create a new file called baseball_dashboard.py:
import csv
import matplotlib.pyplot as plt
# ============================================
# FUNCTIONS
# ============================================
def load_players(filename):
"""Load players from a CSV file."""
roster = []
try:
with open(filename, "r") as file:
reader = csv.DictReader(file)
for row in reader:
player = {
"name": row["name"],
"team": row["team"],
"hits": int(row["hits"]),
"at_bats": int(row["at_bats"]),
"home_runs": int(row["home_runs"]),
"walks": int(row.get("walks", 0)),
"total_bases": int(row.get("total_bases", row["hits"]))
}
roster.append(player)
except FileNotFoundError:
print(f"File {filename} not found. Using sample data.")
roster = get_sample_data()
return roster
def get_sample_data():
"""Return sample player data."""
return [
{"name": "Judge", "team": "Yankees", "hits": 177, "at_bats": 550,
"home_runs": 62, "walks": 111, "total_bases": 391},
{"name": "Ohtani", "team": "Angels", "hits": 160, "at_bats": 497,
"home_runs": 34, "walks": 72, "total_bases": 300},
{"name": "Trout", "team": "Angels", "hits": 123, "at_bats": 400,
"home_runs": 40, "walks": 63, "total_bases": 244},
{"name": "Betts", "team": "Dodgers", "hits": 180, "at_bats": 550,
"home_runs": 35, "walks": 70, "total_bases": 320},
{"name": "Soto", "team": "Padres", "hits": 156, "at_bats": 480,
"home_runs": 30, "walks": 135, "total_bases": 270}
]
def calculate_stats(player):
"""Calculate advanced stats for a player."""
hits = player["hits"]
at_bats = player["at_bats"]
walks = player["walks"]
total_bases = player["total_bases"]
# Batting Average
player["avg"] = hits / at_bats if at_bats > 0 else 0
# On-Base Percentage
player["obp"] = (hits + walks) / (at_bats + walks) if (at_bats + walks) > 0 else 0
# Slugging Percentage
player["slg"] = total_bases / at_bats if at_bats > 0 else 0
# OPS
player["ops"] = player["obp"] + player["slg"]
return player
def print_leaderboard(roster, stat, title, top_n=5):
"""Print a formatted leaderboard."""
sorted_players = sorted(roster, key=lambda p: p[stat], reverse=True)
print(f"\n{'='*40}")
print(f" {title}")
print(f"{'='*40}")
for i, player in enumerate(sorted_players[:top_n], 1):
value = player[stat]
if stat in ["avg", "obp", "slg", "ops"]:
print(f" {i}. {player['name']:12} ({player['team']:8}) {value:.3f}")
else:
print(f" {i}. {player['name']:12} ({player['team']:8}) {value}")
def create_comparison_chart(roster, filename="player_comparison.png"):
"""Create a multi-stat comparison chart."""
# Sort by OPS
sorted_roster = sorted(roster, key=lambda p: p["ops"], reverse=True)[:5]
names = [p["name"] for p in sorted_roster]
avg = [p["avg"] for p in sorted_roster]
obp = [p["obp"] for p in sorted_roster]
slg = [p["slg"] for p in sorted_roster]
x = range(len(names))
width = 0.25
fig, ax = plt.subplots(figsize=(12, 6))
bars1 = ax.bar([i - width for i in x], avg, width, label='AVG', color='#1f77b4')
bars2 = ax.bar(x, obp, width, label='OBP', color='#2ca02c')
bars3 = ax.bar([i + width for i in x], slg, width, label='SLG', color='#d62728')
ax.set_ylabel('Value')
ax.set_title('Player Stat Comparison (AVG / OBP / SLG)')
ax.set_xticks(x)
ax.set_xticklabels(names)
ax.legend()
ax.set_ylim(0, 0.8)
plt.tight_layout()
plt.savefig(filename)
print(f"\nChart saved as {filename}")
def get_player_grade(ops):
"""Return a letter grade based on OPS."""
if ops >= 1.000:
return "A+ (MVP Caliber)"
elif ops >= 0.900:
return "A (All-Star)"
elif ops >= 0.800:
return "B (Above Average)"
elif ops >= 0.700:
return "C (Average)"
elif ops >= 0.600:
return "D (Below Average)"
else:
return "F (Struggling)"
def print_player_report(player):
"""Print a detailed report for one player."""
print(f"\n{'='*40}")
print(f" PLAYER REPORT: {player['name']}")
print(f"{'='*40}")
print(f" Team: {player['team']}")
print(f" Hits: {player['hits']}")
print(f" At Bats: {player['at_bats']}")
print(f" Home Runs: {player['home_runs']}")
print(f" Walks: {player['walks']}")
print(f"{'─'*40}")
print(f" AVG: {player['avg']:.3f}")
print(f" OBP: {player['obp']:.3f}")
print(f" SLG: {player['slg']:.3f}")
print(f" OPS: {player['ops']:.3f}")
print(f"{'─'*40}")
print(f" Grade: {get_player_grade(player['ops'])}")
print(f"{'='*40}")
# ============================================
# MAIN PROGRAM
# ============================================
def main():
print("\n" + "🔷"*20)
print(" BASEBALL STATS DASHBOARD")
print("🔷"*20)
# Load data
roster = load_players("players.csv")
print(f"\nLoaded {len(roster)} players")
# Calculate stats for all players
for player in roster:
calculate_stats(player)
# Print leaderboards
print_leaderboard(roster, "avg", "BATTING AVERAGE LEADERS")
print_leaderboard(roster, "home_runs", "HOME RUN LEADERS")
print_leaderboard(roster, "obp", "ON-BASE PERCENTAGE LEADERS")
print_leaderboard(roster, "ops", "OPS LEADERS")
# Print individual report for top OPS player
top_player = max(roster, key=lambda p: p["ops"])
print_player_report(top_player)
# Create visualization
create_comparison_chart(roster)
# Summary stats
print("\n" + "="*40)
print(" LEAGUE SUMMARY")
print("="*40)
avg_ops = sum(p["ops"] for p in roster) / len(roster)
avg_hr = sum(p["home_runs"] for p in roster) / len(roster)
print(f" Average OPS: {avg_ops:.3f}")
print(f" Average Home Runs: {avg_hr:.1f}")
print(f" Players .300+: {len([p for p in roster if p['avg'] >= 0.300])}")
print("="*40)
print("\n✅ Dashboard complete!\n")
if __name__ == "__main__":
main()Running the Final Project
python3 baseball_dashboard.pyExpected Output:
🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷
BASEBALL STATS DASHBOARD
🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷🔷
Loaded 5 players
========================================
BATTING AVERAGE LEADERS
========================================
1. Betts (Dodgers ) 0.327
2. Soto (Padres ) 0.325
3. Ohtani (Angels ) 0.322
4. Judge (Yankees ) 0.322
5. Trout (Angels ) 0.308
========================================
HOME RUN LEADERS
========================================
1. Judge (Yankees ) 62
2. Trout (Angels ) 40
3. Betts (Dodgers ) 35
4. Ohtani (Angels ) 34
5. Soto (Padres ) 30
...
✅ Dashboard complete!
Congratulations! 🎉 You've completed the Learn to Code with Baseball tutorial.
Here's everything you now know:
| Concept | Baseball Application |
|---|---|
| Variables | Store player stats |
| Data Types | Strings (names) vs Numbers (stats) |
| Lists | Rosters and lineups |
| Dictionaries | Player cards with multiple stats |
| Loops | Process entire lineups |
| Conditionals | Evaluate performance |
| Functions | Reusable stat formulas |
| Error Handling | Handle missing data |
| Sorting/Filtering | Create leaderboards |
| File I/O | Load real data |
| Visualization | Charts and graphs |
Ready to keep learning? Here are some ideas:
-
Get Real Data
- Check out Baseball Reference
- Explore the pybaseball library
-
Add More Stats
- WAR (Wins Above Replacement)
- wOBA (Weighted On-Base Average)
- Exit Velocity and Launch Angle
-
Build a Web App
- Learn Flask or Streamlit
- Create an interactive dashboard
-
Explore Machine Learning
- Predict player performance
- Project prospect development
-
Connect to APIs
- MLB's official Stats API
- Real-time game data
You started with print("Hello, baseball Python!") and ended with a full analytics dashboard.
That's real programming. That's real progress.
Now go build something amazing. ⚾🐍
Questions or feedback? Open an issue on GitHub!
