## The XOR Fourier and quantum algorithms

I am teaching a course in quantum computing this semester, and the hardest part is building some intuition on why quantum algorithms work. Right now, we are discussing the most basic of quantum algorithms, the Deutsch-Jozsa and the Bernstein-Vazirani algorithms. The popular textbooks on quantum computing don't do justice to why these algorithms work, but Ryan O'Donnel quantum computing course does. Both video lectures and notes are available. Here, I want to synthesize at a high level, Ryan's explanation.

We are familiar with the standard Fourier series. Given any periodic function over the reals, we can decompose it into a sum of sinusoidal functions with different frequencies. We can think of each sinusoid as a pattern, and any function as a sum of these patterns. The different patterns are orthogonal and complete. And the Fourier transform is the process of going from the "data basis" to the "pattern basis". This give rise to the idea of trying to decompose functions defined over other fields, using appropriate patterns.

In the quantum algorithms mentioned above, the classical functions take binary strings to a single binary bit. When implemented as a quantum gate, we get the transformation $∣x⟩_{n}∣y⟩_{1}→∣x⟩_{n}∣y+f(x)⟩_{1}$, where the plus is the bitwise xor operation. It is quite easy to show that if the last bit is in the $∣−⟩$ state, then the last bit remains unchanged by the function, and instead the state picks up an overall phase of $(−1)_{f(x)}$. This phase will either be a $+1$ or a $−1$ depending on the value of $f(x)$. Hence, we can ignore the last qubit, concentrate on the input qubits alone. If we have some superposition in the state of the input qubits, then $∑_{x}∣x⟩→∑_{x}(−1)_{f(x)}∣x⟩$. So, the action of a function implemented quantumly is to change the amplitudes of the various states in the superposition.

What will the patterns look like in this case? They will be some special superpositions, that are orthogonal and complete. We want to stick to superpositions, where each entry is a $+1$ or $−1$. We know that for $n$-bit functions, we should have $2_{n}$ patterns, because there are this many $n$-bit to $1$-bit functions. It is a combinatorial exercise to construct a list of such patterns that are orthogonal. As it turns out for one-bit to one-bit functions, these is just the columns of the Hadamard gate, and for $n$-bit to $1$-bit functions, these are just the columns of the tensor product of $n$ Hadamard gates (all up to a pesky normalization constant). The most important of these patterns is the one with all positive signs, $∑_{x}∣x⟩$. We can also visualize this as a vector with all ones in it.

Now, we turn to quantum algorithms. The structure of these basis algorithms is the same. Suppose, we have some unknown function (from a set of known functions), which we want to identify. Classically, we would have to query the function many times to determine what values it takes. Quantumly, we can query it with a superposition of basis states. Let's query it with the all positive pattern. The reason being that all basis states are treated identically and equally by this pattern. We can create the all positive pattern by applying the Hadamard gate to the all zero basis state. Another way of saying the above is that the all zero basis state is in the pattern basis. We use the Hadamards to take it to the data basis, to create the state $[1,…,1]$.

Next, we query the function with this pattern to get a state that encodes all the values that the function takes as a phase. This can be visualized as a vector with each entry as a phase, $[(−1)_{f(0…0)},…,(−1)_{f(1…1)}]$. In order to extract information about the function, we rotate back to pattern basis. This can be done by applying the inverse of the Hadamard gates, but the Hadamard is fortunately self-inverse. So, we apply a tensor product of Hadamards again. This is equivalent to projecting onto the pattern basis. Now, we have a vector $[p(0…0),…,p(1…1)]$, where each $p$ is the strength of the relevant pattern.

Now, the core idea is that if our set of known functions can be divided into groups, where each one group outputs to a different subspace in the pattern basis, then the above formulation works. For instance, the Bernstein-Vazirani problem is simply to determine the unknown $s$, where $f(x)=s⋅x$. But there are $2_{n}$ such functions (because $s$ is $n$ bits), and each maps onto a unique pattern. Hence, the above formulation will yield at the output exactly one pattern, and on measurement we will get exactly that pattern, and hence can identify $s$.

The Deutsch-Jozsa algorithm is a bit more complicated. Here, we have two subsets of functions, balanced and constant. As it turns out constant functions map only to the all positive pattern, while the balanced functions never map onto this pattern. Hence, the presence or absence of the all positive pattern on measurement is an indicator variable for constant and balanced functions respectively.