Reinforcement Learning Homework Solution

This repository contains my homework solution for the Machine Learning course at HTWG Konstanz.

Prerequisites

To run the MATLAB code in this project, the following toolboxes are required:

Statistics and Machine Learning Toolbox
Deep Learning Toolbox
Reinforcement Learning Toolbox

Project Description

The RL environment consists of a 3x4 grid with 12 states, where the agent's goal is to navigate to the target state (s12) to receive a reward of +1. Two states represent holes (s5, s7) giving a reward of -1 (penalty) when entered, and all other non-terminal states have no immediate reward.

Movement decisions are stochastic, with an 80% probability of moving in the chosen direction and a 20% combined chance of moving orthogonally. Colliding with a wall keeps the agent in its current state without moving. The agent starts from state s1 and can move in four directions: left (a1), down (a2), right (a3), and up (a4).

The grid is illustrated below:

+----+----+----+----+
| s9 | s10| s11| s12| <- s12 is the target state with a reward of +1.
+----+----+----+----+
| s8 | s7 | s6 | s5 | <- s7 and s5 are holes with a reward of -1.
+----+----+----+----+
| s1 | s2 | s3 | s4 | <- s1 is the starting state.
+----+----+----+----+

Tasks Accomplished

The following tasks were completed in this project:

Environment Implementation: The RL environment was implemented in MATLAB, considering the stochastic nature of the agent's movements and the rewards associated with each state.
State-Value Function for "Up" Policy: Using a Policy-Evaluation-Aligorithmus introduced in the lectures, the state-value function was computed for a fixed policy that always moves 'up'.
Optimal Strategy Computation: An optimal strategy was computed using Policy-Improvement-Algorithmus discussed in the lectures.
Value Iteration Algorithm: Taking a Markov Decision Process (MDP) and a discount factor as inputs. The outputs are the state-value function and an optimal strategy. This algorithm was applied to the given MDP with a discount factor of 0.9, enabling the determination of the most beneficial actions without a predefined strategy.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Policy_Evaluation.m		Policy_Evaluation.m
Policy_Improvement.m		Policy_Improvement.m
README.md		README.md
RL_Algorithm_Policy_Interation.m		RL_Algorithm_Policy_Interation.m
RL_Algorithm_Value_Interation.m		RL_Algorithm_Value_Interation.m
RL_Dynamic_Programing.m		RL_Dynamic_Programing.m
Value_Iteration.m		Value_Iteration.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning Homework Solution

Prerequisites

Project Description

The grid is illustrated below:

Tasks Accomplished

About

Uh oh!

Releases

Packages

Languages

cfeng-dev/Path_Seeker

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Homework Solution

Prerequisites

Project Description

The grid is illustrated below:

Tasks Accomplished

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages