The Padding Oracle Attack
There are a bunch of writeups on this attack, such as Robert Heaton's blog post, SkullSecurity (and their follow-up), GDS, mpgn, Grymoire, this crypto.SX post and the paper by Serge Vaudenay. Dan Boneh has a lecture on it and this video is good too. Here’s my digestion of it.
Let there be $n$ blocks of plaintext, each of blocksize $k$. For $i \in [1, n]$, $$\begin{aligned} c_i &= E(p_i \oplus c_{i-1}), \\ p_i &= D(c_i) \oplus c_{i-1} \end{aligned}$$ where $c_i$ is the $i$th cyphertext block, $p_i$ the $i$th plaintext block and $c_0$ the initialisation vector.
Construct some block, which we will call $c'$. If we concatenate it with the last cyphertext block $c_n$, the corresponding plaintext of $c'|c_n$ would be $p_1'|p_2'$, $$\begin{aligned} p'_1 = D(c') \oplus c_0, \quad p'_2 &= D(c_n) \oplus c' \\ &= D(E(p_n \oplus c_{n-1})) \oplus c' \\ &= p_n \oplus c_{n-1} \oplus c', \end{aligned}$$ giving $$ p_n = p_2' \oplus c' \oplus c_{n-1}. $$
As $p_2'$ is at the end of our plaintext block it must have valid padding. This puts a constraint on $c'$. If only there was some way to tell if the plaintext had valid padding...
The Padding Oracle
For $p$ the plaintext corresponding to cyphertext $c$, define the padding oracle $\mathcal{O}$ as $$ \mathcal{O}(c) = \begin{cases} 1 & \text{if } p \text{ has valid padding}, \\ 0 & \text{otherwise}. \end{cases} $$ We can now construct $c'$ such that $p_2' = D(c_n) \oplus c'$ has valid padding, by submitting $c'|c_n$ to the oracle. If $\mathcal{O}(c'|c_n) = 1$, the padding in $p_2'$ is valid.
The Last Byte
Borrowing some programming notation, if we consider the last byte in $p_2'$ to be $p_2'[k] = \mathtt{\backslash x01}$, $p_2'$ will (trivially) have valid padding. If we can construct a $c'$ such that this is the case, we will know that the last byte of $p_2'$ is one.
For such a $c'$, $c'_k$, say, $$\begin{aligned} p_n[k] &= p_2'[k] \oplus c'_k[k] \oplus c_{n-1}[k] \\ &= 1 \oplus c'_k[k] \oplus c_{n-1}[k]. \end{aligned}$$ (Here $1 = \mathtt{\backslash x01}$. We’ll use this shorthand throughout.)
To construct $c'_k$, it is enough to notice that since $p'_2 = D(c_n) \oplus c'_k$ and we are only interested in $p'_2[k] = D(c_n)[k] \oplus c'_k[k]$, we can let $$ c'_k = (\mathtt{\backslash x00}, \mathtt{\backslash x00}, \ldots, \mathtt{\backslash x00}, b)$$ (i.e. 15 bytes of zero, or whatever, and the final byte $c'_k[k] = b$). Iterating over all possible values of $b$ and submitting the cyphertext to the oracle will give us the value of $b$ that produces valid padding. Thus, we’ve found the last byte of the plaintext.
The Second-Last Byte
Another valid plaintext would be $p_2'[k] = p_2'[k-1] = \mathtt{\backslash x02}$.
To construct $c'_{k-1}$, notice that $c'_{k-1}[k]$ will simply be the previous $c'_k[k]$ xored appropriately. $$\begin{aligned} p_2'[k] \oplus c'_k[k] &= 1, \\ p_2'[k] \oplus c'_k[k] &\oplus 1 \oplus 2 = 2, \end{aligned}$$ but $$ p_2'[k] \oplus c'_{k-1}[k] = 2 $$ by definition of $c'_{k-1}$, so equating these we get $$ p_2'[k] \oplus c'_{k-1}[k] = p_2'[k] \oplus c'_k[k] \oplus 1 \oplus 2, $$ giving $$ c'_{k-1}[k] = c'_k[k] \oplus 1 \oplus 2. $$
$c'_{k-1}[k-1]$ is found by testing all possible values until $\mathcal{O}(c'_{k-1}|c_n) = 1$. Then the second-last byte is $$ p_n[k-1] = 2 \oplus c'_{k-1}[k-1] \oplus c_{n-1}[k-1]. $$
All the Bytes
In general, we get each byte of the plaintext block via $$ p_n[k - i] = (i + 1) \oplus c_{k-i}'[k-i] \oplus c_{n-1}[k-i], $$ for $i \in [0, k)$, where $c_i'$ is $c'$ constructed for the $i$th byte via $$ c'_{k-i}[k - j] = c'_{k - i + 1}[k - j] \oplus i \oplus (i + 1) $$ where $j \leq i$.
All the Blocks
This is repeated for each block, excluding the first (because we don’t know the initialisation vector).
Caveats
Notice that for finding the last byte $p_n[k]$ we iterated over all possible values of $c'[k] = b$ until $\mathcal{O}(c'|c_n) = 1$ and concluded then that $p_2'[k] = \mathtt{\backslash x01}$. However, there is a small probability ($\frac{1}{256}$) that $p_2'[k-1] = \mathtt{\backslash x02}$ and our oracle is satisfied because we’ve inadvertently found the cyphertext byte that corresponds to $p_2'[k] = \mathtt{\backslash x02}$.
This could similarly happen if $p_2'[k-2] = p_2'[k-1] = \mathtt{\backslash x03}$, where we would get a positive result if $p_2' = \mathtt{\backslash x03} \neq \mathtt{\backslash x01}$, and so on. This becomes increasingly unlikely, but is easy to mitigate by instead approaching the problem from the left.
Take $c_{n-1}|c_n$, which will obviously satisfy the oracle (and give $p_2' = p_n$). Then vary the bytes in $c_{n-1}$ from left to right and query the oracle each time. If the oracle returns false on the $i$th byte, then the remaining $k - i$ bytes are all equal to $k-i$. We also gain a substantial speedup on the last block as we get all the plaintext at once.
Alternatively, we can check that when we determine the last byte it really is $\mathtt{\backslash x01}$. Then all the other bytes must be correct, too. To do this, vary the second last byte. If the padding is invalid, we know that we were wrong about the last byte so we try again. This only needs to be checked once per block. We lose the speedup advantage above, but it is simpler.
Code
I wrote a program that does this.
Here it is in action.
Update: I gave a talk on this, the slides are here.