This directory (/Game-AI-Agent/) contains three core programming Projects (PAs) for : Artificial Intelligence. Each PA focuses on implementing AI agents for distinct games, leveraging algorithms like A*, tree traversal, and Q-learning with neural networks. Below is a summary of the goals, content, and key games covered.
- Apply AI Concepts: Implement algorithms taught in class (e.g., search, reinforcement learning) to solve real-world game scenarios.
- Hands-on Learning: Develop agents that interact with stochastic environments, adversarial opponents, and complex reward systems.
- Performance Optimization: Meet autograder criteria for win rates, scores, and efficiency.
- Game: Stealth-based strategy (Sepia engine).
- Goal: Navigate a maze, destroy an enemy townhall, and escape undetected by archers.
- Key AI:
- A* for pathfinding with risk-weighted edges.
- Reward engineering for "danger" (e.g., proximity to enemies).
- Files:
StealthAgent.java(main agent).- Maze configurations (e.g.,
BigMaze.xml).
- Game: Turn-based Pokémon battles with randomness.
- Goal: Defeat opponents using tree-search agents that handle stochastic outcomes (e.g., move priority, status effects).
- Key AI:
- Expectiminimax with chance nodes.
- Utility heuristics for infinite game trees.
- Files:
TreeTraversalAgent.java(implements battle logic).- Difficulty tiers (Easy/Medium/Hard opponent teams).
- Game: Tetris with accelerated piece placement.
- Goal: Maximize score using Q-learning and neural networks.
- Key AI:
- Q-learning with replay buffers.
- Neural network for Q-value approximation.
- Reward function engineering (e.g., penalizing gaps).
- Files:
TetrisQAgent.java(RL agent).params.model(trained network weights).
| Game | Type | Core Challenge | AI Technique |
|---|---|---|---|
| Stealth Maze | Real-time strategy | Avoid detection, optimize path under risk | A* with custom edge weights |
| Stochastic Pokémon | Turn-based battle | Handle randomness (priority, status effects) | Expectiminimax + chance nodes |
| Tetris | Puzzle | Maximize score through piece placement | Q-learning + neural networks |
- PA1: Demonstrated search algorithm adaptability in adversarial environments.
- PA2: Mastered stochastic game trees and heuristic design.
- PA3: Implemented RL from scratch, including neural network training.
For detailed instructions, refer to individual PA documentation in the /pas/ subdirectories.
Modifies standard A* to prioritize safety over shortest distance by:
-
Danger Scoring: Each map position gets a danger value based on:
danger_score = Σ (1 / distance_to_archer) for all archers
-
Weighted Path Cost:
total_cost = base_distance + (danger_factor * danger_score)
-
Infiltration Phase:
- Finds path to townhall avoiding high-danger zones
- Uses conservative danger weights (λ=0.7)
-
Exfiltration Phase:
- After destroying townhall, uses aggressive weights (λ=0.3)
- Prioritizes speed over safety
- Path Caching: Stores safe routes for reuse
- Lazy Updates: Only recalculates danger for changed areas
- Adaptive λ: Automatically adjusts based on remaining HP
| Method | Purpose |
|---|---|
findSafePath() |
Main A* implementation |
getDangerScore() |
Calculates threat levels |
updateWeights() |
Adjusts λ dynamically |
-
Success Rates:
- Small maps: 100%
- Large maps: ~20% (due to time constraints)
-
Speed: <50ms per decision
# Quick Run
java -cp "./lib/*:." Main data/small_map.xmlThe agent implements a modified expectiminimax algorithm to handle Pokémon battles' stochastic nature. Unlike classic minimax, it incorporates:
- Chance nodes for probabilistic outcomes (move accuracy, status effects)
- Depth-limited search due to the game's branching factor
- Utility heuristics for non-terminal states
-
State Evaluation
- Primary heuristic: HP differential between teams
- Terminal states:
- Win: +999,999 utility
- Loss: -999,999 utility
- Non-terminal: Σ(active HP) - Σ(opponent HP)
-
Chance Node Handling
- Uses
getPotentialEffects()API to:- Generate all possible move outcomes
- Get associated probabilities
- Implements outcome trimming:
- Sorts outcomes by probability
- Keeps top N outcomes (configurable)
- Renormalizes probabilities
- Uses
-
Opponent Modeling
- Assumes opponent minimizes our utility
- Evaluates all opponent moves
- Propagates worst-case (for us) scenario
-
Branch Pruning
- Limits evaluated outcomes per move (
MAX_OUTCOMES_PER_MOVE) - Reduces computation from O(b^d) to O(k^d), where k = pruned branches
- Limits evaluated outcomes per move (
-
Asynchronous Execution
- Runs search in background thread
- Implements timeout (6 minutes/move)
- Falls back to simple heuristic if timeout occurs
-
Efficient State Representation
- Uses game engine's
BattleView - Minimizes object creation
- Reuses probability calculations
- Uses game engine's
-
stochasticTreeSearch(BattleView)- Entry point for move selection
- Orchestrates 2-ply search:
- Our move → Possible outcomes → Opponent responses
-
computeMyMoveValue()- Calculates expected utility for our move
- Aggregates weighted outcomes
- Propagates to opponent turn
-
computeOpponentMinValue()- Simulates opponent's optimal play
- Selects move minimizing our utility
- Handles opponent's chance nodes
MAX_OUTCOMES_PER_MOVE = 8maxDepth = 1000(effectively 2-ply)maxThinkingTimePerMoveInMS = 360,000(6 minutes)
- Easy (Brock): ~90%
- Medium (Sabrina): ~60%
- Hard (Lance): ~30%
- Average: 2–3 minutes/move
- Worst-case: 6 minutes (timeout)
- ~500MB heap space
- Linear in search depth
// Run against Brock's team (Easy)
java -cp "./lib/*:." edu.bu.labs.pokemon.Easy
// Use custom agent
java -cp "./lib/*:." edu.bu.labs.pokemon.Main --tilAgent src.labs.pokemon.agents.MinimaxAgent-
Improved Heuristics
- Incorporate status effects
- Add type advantage calculations
- Consider stat boosts
-
Advanced Pruning
- Alpha-beta for deterministic branches
- Probability threshold cutting
-
Opening Book
- Cache common early-game sequences
- Reduce computation time
This design effectively balances computation time with decision quality, handling Pokémon's stochastic nature while remaining practical for turn-based play. The agent demonstrates competency across all difficulty levels while maintaining reasonable runtime performance.
The agent implements a reinforcement learning system that:
- Learns a Q-function approximation using a neural network
- Balances exploration vs exploitation with decaying ε-greedy policy
- Uses engineered features to represent board states and piece placements
- 3-layer fully connected network
- Input: Board state (flattened) + 7 engineered features
- Hidden layers: 512 and 256 ReLU units
- Output: Single Q-value (unbounded)
-
Experience Collection:
- Plays games with ε-greedy policy
- Stores transitions (state, action, reward, next_state) in replay buffer
-
Learning Phase:
-
Samples mini-batches from replay buffer
-
Updates network using Bellman equation:
Q(s, a) = R(s, a) + γ * max(Q(s', a')) -
Uses Mean Squared Error loss
-
-
Evaluation:
- Periodically tests current policy
- Tracks average score over evaluation games
- Raw board state (20x10 grayscale image)
- Engineered features:
- Max stack height
- Number of holes
- Bottom row completion
- Piece type encoding
- Pivot position
- Rotation state
- Column balance
- Line clears: +500 per line
- Height penalty: -0.4 per row
- Hole penalty: -0.4 per hole
- Survival bonus: +2 per step
- T-Spin bonus: +50
- Game over penalty: -15
- Bottom row completion: +4 per filled column
- Starts with ε = 0.9
- Decays to ε = 0.03 over training
- Exploration moves use heuristic evaluation with noise
- Scores potential moves based on:
- Line completion potential
- Hole filling
- Neighbor blocks
- Height considerations
- Column balancing
- Special bonuses for T-pieces
- Target performance: 20+ average score
- Extra credit: 40+ average score
- Typical training time: Days on SCC cluster
- Initial exploration: 90%
- Min exploration: 3%
- Exploration decay: 0.95 per cycle
- Discount factor (γ): 0.99
- Learning rate: 0.001 (via Adam optimizer)
initQFunction()– Neural network architecturegetQFunctionInput()– State feature engineeringgetReward()– Custom reward shapingshouldExplore()– ε-greedy policygetExplorationMove()– Heuristic move selection
- Experience replay buffer
- Mini-batch training
- Target network (optional extension)
- Feature normalization
# Train the agent (with logging)
java -cp "./lib/*:." edu.bu.tetris.Main | tee training.log
# Plot learning curve
python learning_curve.py training.log-
Advanced Features:
- Add lookahead for next pieces
- Incorporate attack/defense metrics
- Better T-Spin detection
-
Network Improvements:
- Convolutional layers for spatial features
- Dueling network architecture
- Prioritized experience replay
-
Training Enhancements:
- Curriculum learning
- Dynamic reward shaping
- Parallel self-play
- A* Algorithm: Through PA1’s stealth navigation task, I learned how to adapt classical algorithms—such as incorporating risk-aware weights—to solve real-world problems.
- Game Tree Search: PA2’s Pokémon battle simulation deepened my understanding of how the Expectiminimax algorithm can be applied to stochastic game environments.
- Deep Reinforcement Learning: PA3’s Tetris project taught me how to design neural network architectures and reward functions to implement Q-learning.
- Java Development: All three projects were implemented in Java, which strengthened my object-oriented programming skills.
- Performance Optimization: I learned techniques such as pruning, caching, and asynchronous computation to improve performance in compute-intensive tasks.
- Debugging Techniques: I analyzed game logs and used visualization tools to iteratively improve AI behavior.
- Importance of Feature Engineering: In the Tetris project, I realized that designing effective state representations is more critical than the complexity of the model architecture.
- The Art of Exploration vs. Exploitation: Tuning the ε-greedy policy helped me understand how to balance short-term performance with long-term learning.
- Reward Shaping Sensitivity: I discovered that poorly designed reward functions can lead to unintended AI behaviors.
- Value of Incremental Development: I learned the importance of first building a functional baseline before progressively refining the solution.
- Evaluation Metric Design: Effective evaluation requires not only tracking final outcomes but also monitoring signals during the learning process.
- Resource-Aware Planning: I practiced designing long-term training tasks under limited computational resources.
I am deeply grateful for these three projects, which helped me move from theoretical knowledge to practical implementation. They not only improved my programming abilities but also shaped my thinking in solving complex problems. These experiences will become valuable assets in my future technical journey and strengthened my passion and respect for the field of Artificial Intelligence.
"These projects were like three keys that opened the door to AI application development. From search algorithms to reinforcement learning, I witnessed the limitless potential of intelligent systems—and clearly realized how much more I have to learn."