We are given a set of samples from some (unknown) distribution. How can we generate more samples from this unknown distribution? In this post, we will see how to solve this problem using energy-based models and Langevin Monte Carlo (LMC) sampling algorithm.
In the policy gradient algorithms, we were looking at stochastic optimal policy $\pi_\theta$ in a continuous state and action spaces setting. However, there are scenarios where we want to have a deterministic policy in a continuous state and action spaces setting.
In the policy gradient algorithms, we were looking at stochastic optimal policy $\pi_\theta$ in a continuous state and action spaces setting. However, there are scenarios where we want to have a deterministic policy in a continuous state and action spaces setting.