mcp_vs_mdp_explained

janMagnusHeimann
3
This document contrasts Markov Reward Processes (MRPs) and Markov Decision Processes (MDPs) in Reinforcement Learning. MRPs model states and rewards (no actions) to evaluate state values (V(s)) via the Bellman equation. MDPs add actions and policies (π) to find optimal strategies (V ∗ ,Q ∗ ) using Bellman optimality equations.

Content