Artificial Intelligence, Neural Networks, Reinforcement Learning - AI Game Agents

Folder Overview

This directory (/Game-AI-Agent/) contains three core programming Projects (PAs) for : Artificial Intelligence. Each PA focuses on implementing AI agents for distinct games, leveraging algorithms like A*, tree traversal, and Q-learning with neural networks. Below is a summary of the goals, content, and key games covered.

Objectives

Apply AI Concepts: Implement algorithms taught in class (e.g., search, reinforcement learning) to solve real-world game scenarios.
Hands-on Learning: Develop agents that interact with stochastic environments, adversarial opponents, and complex reward systems.
Performance Optimization: Meet autograder criteria for win rates, scores, and efficiency.

Content Summary

1. PA1: Infiltration & Exfiltration

Game: Stealth-based strategy (Sepia engine).
Goal: Navigate a maze, destroy an enemy townhall, and escape undetected by archers.
Key AI:
- A* for pathfinding with risk-weighted edges.
- Reward engineering for "danger" (e.g., proximity to enemies).
Files:
- StealthAgent.java (main agent).
- Maze configurations (e.g., BigMaze.xml).

2. PA2: Stochastic Pokémon

Game: Turn-based Pokémon battles with randomness.
Goal: Defeat opponents using tree-search agents that handle stochastic outcomes (e.g., move priority, status effects).
Key AI:
- Expectiminimax with chance nodes.
- Utility heuristics for infinite game trees.
Files:
- TreeTraversalAgent.java (implements battle logic).
- Difficulty tiers (Easy/Medium/Hard opponent teams).

3. PA3: Tetris

Game: Tetris with accelerated piece placement.
Goal: Maximize score using Q-learning and neural networks.
Key AI:
- Q-learning with replay buffers.
- Neural network for Q-value approximation.
- Reward function engineering (e.g., penalizing gaps).
Files:
- TetrisQAgent.java (RL agent).
- params.model (trained network weights).

Key Games Covered

Game	Type	Core Challenge	AI Technique
Stealth Maze	Real-time strategy	Avoid detection, optimize path under risk	A* with custom edge weights
Stochastic Pokémon	Turn-based battle	Handle randomness (priority, status effects)	Expectiminimax + chance nodes
Tetris	Puzzle	Maximize score through piece placement	Q-learning + neural networks

Outcomes

PA1: Demonstrated search algorithm adaptability in adversarial environments.
PA2: Mastered stochastic game trees and heuristic design.
PA3: Implemented RL from scratch, including neural network training.

For detailed instructions, refer to individual PA documentation in the /pas/ subdirectories.

PA1: Stealth Agent

Core Algorithm: Risk-Aware A*

Key Idea

Modifies standard A* to prioritize safety over shortest distance by:

Danger Scoring: Each map position gets a danger value based on:

danger_score = Σ (1 / distance_to_archer) for all archers

Weighted Path Cost:

total_cost = base_distance + (danger_factor * danger_score)

How It Works

Infiltration Phase:
- Finds path to townhall avoiding high-danger zones
- Uses conservative danger weights (λ=0.7)
Exfiltration Phase:
- After destroying townhall, uses aggressive weights (λ=0.3)
- Prioritizes speed over safety

Optimization Tricks

Path Caching: Stores safe routes for reuse
Lazy Updates: Only recalculates danger for changed areas
Adaptive λ: Automatically adjusts based on remaining HP

Key Methods

Method	Purpose
`findSafePath()`	Main A* implementation
`getDangerScore()`	Calculates threat levels
`updateWeights()`	Adjusts λ dynamically

Performance

Success Rates:
- Small maps: 100%
- Large maps: ~20% (due to time constraints)
Speed: <50ms per decision

# Quick Run
java -cp "./lib/*:." Main data/small_map.xml

PA2: Stochastic Pokémon Battle Agent

Core Algorithm: Expectiminimax with Chance Nodes

Key Idea

The agent implements a modified expectiminimax algorithm to handle Pokémon battles' stochastic nature. Unlike classic minimax, it incorporates:

Chance nodes for probabilistic outcomes (move accuracy, status effects)
Depth-limited search due to the game's branching factor
Utility heuristics for non-terminal states

Algorithm Structure

Critical Components

State Evaluation
- Primary heuristic: HP differential between teams
- Terminal states:
  - Win: +999,999 utility
  - Loss: -999,999 utility
- Non-terminal: Σ(active HP) - Σ(opponent HP)
Chance Node Handling
- Uses getPotentialEffects() API to:
  - Generate all possible move outcomes
  - Get associated probabilities
- Implements outcome trimming:
  - Sorts outcomes by probability
  - Keeps top N outcomes (configurable)
  - Renormalizes probabilities
Opponent Modeling
- Assumes opponent minimizes our utility
- Evaluates all opponent moves
- Propagates worst-case (for us) scenario

Key Optimizations

Branch Pruning
- Limits evaluated outcomes per move (MAX_OUTCOMES_PER_MOVE)
- Reduces computation from O(b^d) to O(k^d), where k = pruned branches
Asynchronous Execution
- Runs search in background thread
- Implements timeout (6 minutes/move)
- Falls back to simple heuristic if timeout occurs
Efficient State Representation
- Uses game engine's BattleView
- Minimizes object creation
- Reuses probability calculations

Implementation Details

Core Methods:

stochasticTreeSearch(BattleView)
- Entry point for move selection
- Orchestrates 2-ply search:
  - Our move → Possible outcomes → Opponent responses
computeMyMoveValue()
- Calculates expected utility for our move
- Aggregates weighted outcomes
- Propagates to opponent turn
computeOpponentMinValue()
- Simulates opponent's optimal play
- Selects move minimizing our utility
- Handles opponent's chance nodes

Configuration Parameters:

MAX_OUTCOMES_PER_MOVE = 8
maxDepth = 1000 (effectively 2-ply)
maxThinkingTimePerMoveInMS = 360,000 (6 minutes)

Performance Characteristics

Success Rates:

Easy (Brock): ~90%
Medium (Sabrina): ~60%
Hard (Lance): ~30%

Computation Time:

Average: 2–3 minutes/move
Worst-case: 6 minutes (timeout)

Memory Usage:

~500MB heap space
Linear in search depth

Usage Example

// Run against Brock's team (Easy)
java -cp "./lib/*:." edu.bu.labs.pokemon.Easy

// Use custom agent
java -cp "./lib/*:." edu.bu.labs.pokemon.Main --tilAgent src.labs.pokemon.agents.MinimaxAgent

Extension Points

Improved Heuristics
- Incorporate status effects
- Add type advantage calculations
- Consider stat boosts
Advanced Pruning
- Alpha-beta for deterministic branches
- Probability threshold cutting
Opening Book
- Cache common early-game sequences
- Reduce computation time

This design effectively balances computation time with decision quality, handling Pokémon's stochastic nature while remaining practical for turn-based play. The agent demonstrates competency across all difficulty levels while maintaining reasonable runtime performance.

PA3: Tetris Q-Learning Agent

Core Algorithm: Deep Q-Learning with Neural Network

Key Idea

The agent implements a reinforcement learning system that:

Learns a Q-function approximation using a neural network
Balances exploration vs exploitation with decaying ε-greedy policy
Uses engineered features to represent board states and piece placements

Architecture Components

Neural Network Model:

3-layer fully connected network
Input: Board state (flattened) + 7 engineered features
Hidden layers: 512 and 256 ReLU units
Output: Single Q-value (unbounded)

Training Process:

Experience Collection:
- Plays games with ε-greedy policy
- Stores transitions (state, action, reward, next_state) in replay buffer
Learning Phase:
- Samples mini-batches from replay buffer
- Updates network using Bellman equation:
```
Q(s, a) = R(s, a) + γ * max(Q(s', a'))
```
- Uses Mean Squared Error loss
Evaluation:
- Periodically tests current policy
- Tracks average score over evaluation games

State Representation:

Raw board state (20x10 grayscale image)
Engineered features:
- Max stack height
- Number of holes
- Bottom row completion
- Piece type encoding
- Pivot position
- Rotation state
- Column balance

Reward Function:

Line clears: +500 per line
Height penalty: -0.4 per row
Hole penalty: -0.4 per hole
Survival bonus: +2 per step
T-Spin bonus: +50
Game over penalty: -15
Bottom row completion: +4 per filled column

Exploration Strategy

ε-Greedy Policy:

Starts with ε = 0.9
Decays to ε = 0.03 over training
Exploration moves use heuristic evaluation with noise

Heuristic Evaluation (for exploration):

Scores potential moves based on:
- Line completion potential
- Hole filling
- Neighbor blocks
- Height considerations
- Column balancing
- Special bonuses for T-pieces

Performance Characteristics

Training Metrics:

Target performance: 20+ average score
Extra credit: 40+ average score
Typical training time: Days on SCC cluster

Key Parameters:

Initial exploration: 90%
Min exploration: 3%
Exploration decay: 0.95 per cycle
Discount factor (γ): 0.99
Learning rate: 0.001 (via Adam optimizer)

Implementation Highlights

Critical Methods:

initQFunction() – Neural network architecture
getQFunctionInput() – State feature engineering
getReward() – Custom reward shaping
shouldExplore() – ε-greedy policy
getExplorationMove() – Heuristic move selection

Optimization Techniques:

Experience replay buffer
Mini-batch training
Target network (optional extension)
Feature normalization

Usage Example

# Train the agent (with logging)
java -cp "./lib/*:." edu.bu.tetris.Main | tee training.log

# Plot learning curve
python learning_curve.py training.log

Extension Points

Advanced Features:
- Add lookahead for next pieces
- Incorporate attack/defense metrics
- Better T-Spin detection
Network Improvements:
- Convolutional layers for spatial features
- Dueling network architecture
- Prioritized experience replay
Training Enhancements:
- Curriculum learning
- Dynamic reward shaping
- Parallel self-play

Summary and Reflection

Technical Gains

1. Algorithm Implementation Skills

A* Algorithm: Through PA1’s stealth navigation task, I learned how to adapt classical algorithms—such as incorporating risk-aware weights—to solve real-world problems.
Game Tree Search: PA2’s Pokémon battle simulation deepened my understanding of how the Expectiminimax algorithm can be applied to stochastic game environments.
Deep Reinforcement Learning: PA3’s Tetris project taught me how to design neural network architectures and reward functions to implement Q-learning.

2. Engineering Practice

Java Development: All three projects were implemented in Java, which strengthened my object-oriented programming skills.
Performance Optimization: I learned techniques such as pruning, caching, and asynchronous computation to improve performance in compute-intensive tasks.
Debugging Techniques: I analyzed game logs and used visualization tools to iteratively improve AI behavior.

Cognitive Growth

1. On AI Development

Importance of Feature Engineering: In the Tetris project, I realized that designing effective state representations is more critical than the complexity of the model architecture.
The Art of Exploration vs. Exploitation: Tuning the ε-greedy policy helped me understand how to balance short-term performance with long-term learning.
Reward Shaping Sensitivity: I discovered that poorly designed reward functions can lead to unintended AI behaviors.

2. On Problem Solving

Value of Incremental Development: I learned the importance of first building a functional baseline before progressively refining the solution.
Evaluation Metric Design: Effective evaluation requires not only tracking final outcomes but also monitoring signals during the learning process.
Resource-Aware Planning: I practiced designing long-term training tasks under limited computational resources.

Acknowledgments

I am deeply grateful for these three projects, which helped me move from theoretical knowledge to practical implementation. They not only improved my programming abilities but also shaped my thinking in solving complex problems. These experiences will become valuable assets in my future technical journey and strengthened my passion and respect for the field of Artificial Intelligence.

"These projects were like three keys that opened the door to AI application development. From search algorithms to reinforcement learning, I witnessed the limitless potential of intelligent systems—and clearly realized how much more I have to learn."

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
doc		doc
lib		lib
src		src
.DS_Store		.DS_Store
Pipfile		Pipfile
README.md		README.md
cp.srcs		cp.srcs
labs-stealth.srcs		labs-stealth.srcs
learning_curve.py		learning_curve.py
params.model		params.model
pas-stealth.srcs		pas-stealth.srcs
pokeLab.srcs		pokeLab.srcs
pokePA.srcs		pokePA.srcs
qFunction47.model		qFunction47.model
scripted.srcs		scripted.srcs
tetris.srcs		tetris.srcs
viter.srcs		viter.srcs
zomlog.srcs		zomlog.srcs

Folders and files

Latest commit

History

Repository files navigation

Artificial Intelligence, Neural Networks, Reinforcement Learning - AI Game Agents

Folder Overview

Objectives

Content Summary

1. PA1: Infiltration & Exfiltration

2. PA2: Stochastic Pokémon

3. PA3: Tetris

Key Games Covered

Outcomes

PA1: Stealth Agent

Core Algorithm: Risk-Aware A*

Key Idea

How It Works

Optimization Tricks

Key Methods

Performance

PA2: Stochastic Pokémon Battle Agent

Core Algorithm: Expectiminimax with Chance Nodes

Key Idea

Algorithm Structure

Critical Components

Key Optimizations

Implementation Details

Core Methods:

Configuration Parameters:

Performance Characteristics

Success Rates:

Computation Time:

Memory Usage:

Usage Example

Extension Points

PA3: Tetris Q-Learning Agent

Core Algorithm: Deep Q-Learning with Neural Network

Key Idea

Architecture Components

Neural Network Model:

Training Process:

State Representation:

Reward Function:

Exploration Strategy

ε-Greedy Policy:

Heuristic Evaluation (for exploration):

Performance Characteristics

Training Metrics:

Key Parameters:

Implementation Highlights

Critical Methods:

Optimization Techniques:

Usage Example

Extension Points

Summary and Reflection

Technical Gains

1. Algorithm Implementation Skills

2. Engineering Practice

Cognitive Growth

1. On AI Development

2. On Problem Solving

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages