Deep Learning with Pytorch in a Nutshell
  • Content
  • Image recognition
  • Object detection
  • Semantic segmentation
  • GAN
  • Image style transfer
  • Face recognition
  • Interpretability
  • Word embedding
  • Pytorch
  • Optimization
  • Special layers
    • Transposed convolution
  • Neural architecture search
  • Reinforcement learning
    • Proof of Bellman equation
    • Tabular solution method
Powered by GitBook
On this page
  • Bellman equation for state-value function
  • Notation 1
  • Notation 2
  • Bellman equation for action-value function
  • Notation 1
  • Notation 2
  1. Reinforcement learning

Proof of Bellman equation

Bellman equation for state-value function

Notation 1

vπ(s)=E[Gt∣St=s]=E[rt+1+γGt+1∣St=s]=∑a,r,s′rπ(a∣s)p(s′,r∣s,a)+γ∑a,r,s′E[Gt+1∣St+1=s′]p(s′,r∣s,a)π(a∣s)=∑aπ(a∣s){∑r,s′p(s′,r∣s,a)[r+γvπ(s′)]}\begin{align} v_\pi (s) &= E \left[ G_t | S_t = s \right] \\ &= E \left[ r_{t+1} + \gamma G_{t+1} | S_t = s \right]\\ &= \sum_{a,r,s'} r\pi(a|s) p(s',r|s,a) + \gamma \sum_{a,r,s'} E \left[ G_{t+1} | S_{t+1} = s' \right] p(s',r|s,a) \pi(a|s)\\ &= \sum_a \pi(a|s) \lbrace{ \sum_{r,s'} p(s',r|s,a) \lbrack r + \gamma v_\pi(s')\rbrack \rbrace} \end{align}vπ​(s)​=E[Gt​∣St​=s]=E[rt+1​+γGt+1​∣St​=s]=a,r,s′∑​rπ(a∣s)p(s′,r∣s,a)+γa,r,s′∑​E[Gt+1​∣St+1​=s′]p(s′,r∣s,a)π(a∣s)=a∑​π(a∣s){r,s′∑​p(s′,r∣s,a)[r+γvπ​(s′)]}​​

Notation 2

Rss′a=E[rt+1∣St=s,St+1=s′,At=a]Pss′a=P(St+1∣St=s,At=a)\begin{align} &R^a_{ss'} = E \lbrack r_{t+1} | S_t=s, S_{t+1}=s', A_t=a \rbrack \\ &P^a_{ss'} = P(S_{t+1} | S_t=s, A_t=a) \end{align}​Rss′a​=E[rt+1​∣St​=s,St+1​=s′,At​=a]Pss′a​=P(St+1​∣St​=s,At​=a)​​
vπ(s)=E[Gt∣St=s]=E[rt+1+γGt+1∣St=s]=∑aπ(a∣s)∑r,s′rp(s′,r∣s,a)+γ∑aπ(a∣s)∑s′p(s′∣s,a)E[Gt+1∣St+1=s′]=∑aπ(a∣s){∑r,s′rp(r∣s′,s,a)p(s′∣s,a)+γ∑s′p(s′∣s,a)E[Gt+1∣St+1=s′]}=∑aπ(a∣s)∑s′Pss′a{Rss′a+γvπ(s′)}\begin{align} v_\pi(s) &= E \left[ G_t | S_t = s \right] \\ &= E \left[ r_{t+1} + \gamma G_{t+1} | S_t = s \right] \\ &= \sum_a \pi(a|s) \sum_{r,s'}rp(s',r|s,a) + \gamma \sum_a \pi(a|s) \sum_{s'}p(s'|s,a)E \left[ G_{t+1} | S_{t+1} = s' \right]\\ &= \sum_a \pi(a|s) \{ \sum_{r,s'}rp(r|s',s,a)p(s'|s,a) + \gamma \sum_{s'}p(s'|s,a)E \left[ G_{t+1} | S_{t+1} = s' \right] \}\\ &= \sum_a \pi(a|s) \sum_{s'} P^a_{ss'} \{ R^a_{ss'} + \gamma v_\pi(s') \}\\ \end{align}vπ​(s)​=E[Gt​∣St​=s]=E[rt+1​+γGt+1​∣St​=s]=a∑​π(a∣s)r,s′∑​rp(s′,r∣s,a)+γa∑​π(a∣s)s′∑​p(s′∣s,a)E[Gt+1​∣St+1​=s′]=a∑​π(a∣s){r,s′∑​rp(r∣s′,s,a)p(s′∣s,a)+γs′∑​p(s′∣s,a)E[Gt+1​∣St+1​=s′]}=a∑​π(a∣s)s′∑​Pss′a​{Rss′a​+γvπ​(s′)}​​

Bellman equation for action-value function

Notation 1

Notation 2

Qπ(s,a)=∑s′Pss′a[Rss′a+γ∑a′π(s′,a′)Qπ(s′,a′)]Q_\pi(s,a)=\sum_{s'}P^a_{ss'} \lbrack R^a_{ss'} + \gamma \sum_{a'} \pi(s',a') Q_\pi(s',a') \rbrackQπ​(s,a)=s′∑​Pss′a​[Rss′a​+γa′∑​π(s′,a′)Qπ​(s′,a′)]
PreviousReinforcement learningNextTabular solution method

Last updated 6 years ago

https://joshgreaves.com/reinforcement-learning/understanding-rl-the-bellman-equations/