$$
\newcommand{\ind}{\perp\!\!\!\!\perp}
\newcommand{\gG}{\mathcal{G}}
\newcommand{\gH}{\mathcal{H}}
\newcommand{\gD}{\mathcal{D}}
\newcommand{\sE}{\mathbb{E}}
\newcommand{\E}{\mathbb{E}}
\newcommand{\coloneqq}{≔}
\newcommand{\E}{\mathbb{E}}
\newcommand{\Ls}{\mathcal{L}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\emp}{\tilde{p}}
\newcommand{\lr}{\alpha}
\newcommand{\reg}{\lambda}
\newcommand{\rect}{\mathrm{rectifier}}
\newcommand{\softmax}{\mathrm{softmax}}
\newcommand{\sigmoid}{\sigma}
\newcommand{\softplus}{\zeta}
\newcommand{\KL}{D_{\mathrm{KL}}}
\newcommand{\Var}{\mathrm{Var}}
\newcommand{\standarderror}{\mathrm{SE}}
\newcommand{\Cov}{\mathrm{Cov}}
% Wolfram Mathworld says $L^2$ is for function spaces and $\ell^2$ is for vectors
% But then they seem to use $L^2$ for vectors throughout the site, and so does
% wikipedia.
\newcommand{\normlzero}{L^0}
\newcommand{\normlone}{L^1}
\newcommand{\normltwo}{L^2}
\newcommand{\normlp}{L^p}
\newcommand{\normmax}{L^\infty}
\newcommand{\parents}{Pa} % See usage in notation.tex. Chosen to match Daphne's book.
$$

Note Blog
This post is a about diffusion models that have received quite a bit attention recently and such as with DALLE [1] or IMAGEN[2] in a text to image use-case.
For me the motivation was to rather understand the underlying mechanics of the method that is surprisingly enough intuitive, but has some non-zero mathematics requirement to be “rigorous” about it.
The main paper that motivates this blog post is the (Denoising Diffusion Probabilistic Models)[https://arxiv.org/abs/2006.11239] paper from Ho et al.(2020).
As of this moment, the paper is quite recent so I dare to say that this is a very novel method to `learning generative models`

.

This is one of the most typical approaches to reducing variance of MC estimates. Consider the setup where we want to estimate $\E[f(x)]$ with MC, imagine that we have access to a function $h(x)$, now we can compute the expectation of the difference $\E[f(x) - h(x)]$. This might not be that interesting, since by linearity of expectation we would arrive to just $\E[f(x)] - \E[h(x)]$, but things get interesting when we estimate the quantity via MC

New work really being pushed by Yoshua Bengio, a model class that enables tractable probabilistic inference and has connections to Markov Chain Monte-Carlo with also the ability to learn densities.

The problem of credit assignment is a long-lasting issue in reinforcement learning. In short, it’s about which action is actually “caused” the reward in the future. Here I am looking at two papers that address this problem.

- A Convergent and Efficient Deep Q Network Algorithm
- Offline Reinforcement Learning with Soft Behavior Regularization

Here I will continually update the research papers that I have read, comment them, brainstorm some ideas of improvement.

It all began with AlphaGo. Where are we now? MuZero.

My notes on causal model learning and modular computation line of work which connects to learning independent mechanisms. (very roughly written, just capturing the gist of a couple of papers with excerpts).

First we start with the question, what does it mean for data to be out of distribution. In tradition statistical inference we concern ourselves with the generalization gap. This is defined by the difference in expected error of our algorithm in the training set vs. the test set. Now, in classical approaches the test set is assumed to be sampled from the **same** distribution, that is the true data distribution \(p(x)\).