The Anti-Infinity Parade

Mar 17, 2021

The XOR Fourier and quantum algorithms

I am teaching a course in quantum computing this semester, and the hardest part is building some intuition on why quantum algorithms work. Right now, we are discussing the most basic of quantum algorithms, the Deutsch-Jozsa and the Bernstein-Vazirani algorithms. The popular textbooks on quantum computing don't do justice to why these algorithms work, but Ryan O'Donnel quantum computing course does. Both video lectures and notes are available. Here, I want to synthesize at a high level, Ryan's explanation.

We are familiar with the standard Fourier series. Given any periodic function over the reals, we can decompose it into a sum of sinusoidal functions with different frequencies. We can think of each sinusoid as a pattern, and any function as a sum of these patterns. The different patterns are orthogonal and complete. And the Fourier transform is the process of going from the "data basis" to the "pattern basis". This give rise to the idea of trying to decompose functions defined over other fields, using appropriate patterns.

In the quantum algorithms mentioned above, the classical functions take binary strings to a single binary bit. When implemented as a quantum gate, we get the transformation <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi mathvariant="normal">∣<{http://www.w3.org/1998/Math/MathML}mi>x<{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mo stretchy="false">⟩<{http://www.w3.org/1998/Math/MathML}mi>n<{http://www.w3.org/1998/Math/MathML}mi mathvariant="normal">∣<{http://www.w3.org/1998/Math/MathML}mi>y<{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mo stretchy="false">⟩<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}mo>→<{http://www.w3.org/1998/Math/MathML}mi mathvariant="normal">∣<{http://www.w3.org/1998/Math/MathML}mi>x<{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mo stretchy="false">⟩<{http://www.w3.org/1998/Math/MathML}mi>n<{http://www.w3.org/1998/Math/MathML}mi mathvariant="normal">∣<{http://www.w3.org/1998/Math/MathML}mi>y<{http://www.w3.org/1998/Math/MathML}mo>+<{http://www.w3.org/1998/Math/MathML}mi>f<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">(<{http://www.w3.org/1998/Math/MathML}mi>x<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">)<{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mo stretchy="false">⟩<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">|x\rangle_n |y\rangle_1 \to |x\rangle_n |y+f(x)\rangle_1, where the plus is the bitwise xor operation. It is quite easy to show that if the last bit is in the <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi mathvariant="normal">∣<{http://www.w3.org/1998/Math/MathML}mo>−<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">⟩<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">|-\rangle state, then the last bit remains unchanged by the function, and instead the state picks up an overall phase of <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mo stretchy="false">(<{http://www.w3.org/1998/Math/MathML}mo>−<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}msup><{http://www.w3.org/1998/Math/MathML}mo stretchy="false">)<{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>f<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">(<{http://www.w3.org/1998/Math/MathML}mi>x<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">)<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">(-1)^{f(x)}. This phase will either be a <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mo>+<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">+1 or a <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mo>−<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">-1 depending on the value of <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>f<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">(<{http://www.w3.org/1998/Math/MathML}mi>x<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">)<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">f(x). Hence, we can ignore the last qubit, concentrate on the input qubits alone. If we have some superposition in the state of the input qubits, then <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mo>∑<{http://www.w3.org/1998/Math/MathML}mi>x<{http://www.w3.org/1998/Math/MathML}mi mathvariant="normal">∣<{http://www.w3.org/1998/Math/MathML}mi>x<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">⟩<{http://www.w3.org/1998/Math/MathML}mo>→<{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mo>∑<{http://www.w3.org/1998/Math/MathML}mi>x<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">(<{http://www.w3.org/1998/Math/MathML}mo>−<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}msup><{http://www.w3.org/1998/Math/MathML}mo stretchy="false">)<{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>f<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">(<{http://www.w3.org/1998/Math/MathML}mi>x<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">)<{http://www.w3.org/1998/Math/MathML}mi mathvariant="normal">∣<{http://www.w3.org/1998/Math/MathML}mi>x<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">⟩<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">\sum_x |x\rangle \to \sum_x (-1)^{f(x)} |x\rangle. So, the action of a function implemented quantumly is to change the amplitudes of the various states in the superposition.

What will the patterns look like in this case? They will be some special superpositions, that are orthogonal and complete. We want to stick to superpositions, where each entry is a <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mo>+<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">+1 or <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mo>−<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">-1. We know that for <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>n<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">n-bit functions, we should have <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}msup><{http://www.w3.org/1998/Math/MathML}mn>2<{http://www.w3.org/1998/Math/MathML}mi>n<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">2^n patterns, because there are this many <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>n<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">n-bit to <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">1-bit functions. It is a combinatorial exercise to construct a list of such patterns that are orthogonal. As it turns out for one-bit to one-bit functions, these is just the columns of the Hadamard gate, and for <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>n<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">n-bit to <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">1-bit functions, these are just the columns of the tensor product of <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>n<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">n Hadamard gates (all up to a pesky normalization constant). The most important of these patterns is the one with all positive signs, <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mo>∑<{http://www.w3.org/1998/Math/MathML}mi>x<{http://www.w3.org/1998/Math/MathML}mi mathvariant="normal">∣<{http://www.w3.org/1998/Math/MathML}mi>x<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">⟩<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">\sum_x |x\rangle. We can also visualize this as a vector with all ones in it.

Now, we turn to quantum algorithms. The structure of these basis algorithms is the same. Suppose, we have some unknown function (from a set of known functions), which we want to identify. Classically, we would have to query the function many times to determine what values it takes. Quantumly, we can query it with a superposition of basis states. Let's query it with the all positive pattern. The reason being that all basis states are treated identically and equally by this pattern. We can create the all positive pattern by applying the Hadamard gate to the all zero basis state. Another way of saying the above is that the all zero basis state is in the pattern basis. We use the Hadamards to take it to the data basis, to create the state <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mo stretchy="false">[<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}mo separator="true">,<{http://www.w3.org/1998/Math/MathML}mo>…<{http://www.w3.org/1998/Math/MathML}mo separator="true">,<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">]<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">[1,\dots,1].

Next, we query the function with this pattern to get a state that encodes all the values that the function takes as a phase. This can be visualized as a vector with each entry as a phase, <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mo stretchy="false">[<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">(<{http://www.w3.org/1998/Math/MathML}mo>−<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}msup><{http://www.w3.org/1998/Math/MathML}mo stretchy="false">)<{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>f<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">(<{http://www.w3.org/1998/Math/MathML}mn>0<{http://www.w3.org/1998/Math/MathML}mo>…<{http://www.w3.org/1998/Math/MathML}mn>0<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">)<{http://www.w3.org/1998/Math/MathML}mo separator="true">,<{http://www.w3.org/1998/Math/MathML}mo>…<{http://www.w3.org/1998/Math/MathML}mo separator="true">,<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">(<{http://www.w3.org/1998/Math/MathML}mo>−<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}msup><{http://www.w3.org/1998/Math/MathML}mo stretchy="false">)<{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>f<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">(<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}mo>…<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">)<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">]<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">[(-1)^{f(0\dots 0)},\dots, (-1)^{f(1\dots 1)}]. In order to extract information about the function, we rotate back to pattern basis. This can be done by applying the inverse of the Hadamard gates, but the Hadamard is fortunately self-inverse. So, we apply a tensor product of Hadamards again. This is equivalent to projecting onto the pattern basis. Now, we have a vector <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mo stretchy="false">[<{http://www.w3.org/1998/Math/MathML}mi>p<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">(<{http://www.w3.org/1998/Math/MathML}mn>0<{http://www.w3.org/1998/Math/MathML}mo>…<{http://www.w3.org/1998/Math/MathML}mn>0<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">)<{http://www.w3.org/1998/Math/MathML}mo separator="true">,<{http://www.w3.org/1998/Math/MathML}mo>…<{http://www.w3.org/1998/Math/MathML}mo separator="true">,<{http://www.w3.org/1998/Math/MathML}mi>p<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">(<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}mo>…<{http://www.w3.org/1998/Math/MathML}mn>1<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">)<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">]<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">[p(0\dots 0),\dots,p(1\dots 1)], where each <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>p<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">p is the strength of the relevant pattern.

Now, the core idea is that if our set of known functions can be divided into groups, where each one group outputs to a different subspace in the pattern basis, then the above formulation works. For instance, the Bernstein-Vazirani problem is simply to determine the unknown <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>s<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">s, where <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>f<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">(<{http://www.w3.org/1998/Math/MathML}mi>x<{http://www.w3.org/1998/Math/MathML}mo stretchy="false">)<{http://www.w3.org/1998/Math/MathML}mo>=<{http://www.w3.org/1998/Math/MathML}mi>s<{http://www.w3.org/1998/Math/MathML}mo>⋅<{http://www.w3.org/1998/Math/MathML}mi>x<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">f(x) = s\cdot x. But there are <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}msup><{http://www.w3.org/1998/Math/MathML}mn>2<{http://www.w3.org/1998/Math/MathML}mi>n<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">2^n such functions (because <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>s<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">s is <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>n<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">n bits), and each maps onto a unique pattern. Hence, the above formulation will yield at the output exactly one pattern, and on measurement we will get exactly that pattern, and hence can identify <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>s<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">s.

The Deutsch-Jozsa algorithm is a bit more complicated. Here, we have two subsets of functions, balanced and constant. As it turns out constant functions map only to the all positive pattern, while the balanced functions never map onto this pattern. Hence, the presence or absence of the all positive pattern on measurement is an indicator variable for constant and balanced functions respectively.

Dec 04, 2020

Impossibility proofs

Patrick Honner over at Quanta Magazine has an essay on what impossibility proofs in mathematics can teach us. He brings up a number of beneficial effects of exploring such proofs, described here with my interpretation.

  • At some point in our mathematical training, we start encountering problems that simply don't have a solution. But, what we can do is prove that a solution does not exist. This can be done by showing that the various conditions imposed by the problems cannot be simultaneously satisfied. We learn to always check for every future problem, whether the conditions even admit a solution. And if we build this habit, we have leveled up as a mathematician.

  • "Proving that something is impossible is a powerful act of mathematics. It shifts our perspective from that of rule follower to that of rule enforcer." This is a even more advanced version of the previous point. The maturity to become a rule enforcer is another leveling up act. In the journey to becoming an intellectual, this is when we start to build intellectual honesty.

  • One of the goal in mathematics is to understand the consequences of a set of axioms. We realize that this means figuring out the true statements or theorems that can be derived from these axioms, but we must also list out important false statements within this system of axioms. In this way, we determine the boundary of our mathematical system, that discriminates between the true and the false.

  • Finally, once we have conjectured some statement as false, we might need to create new mathematics, sometimes new fields, to prove that conjecture. This is the domain of the expert mathematician.

As a physicist, I have always been interested in the impossibility results within physics. These have different character than those in mathematics. For instance, the impossibility of going faster than the speed of light in flat spacetime, is axiomatic in relativity. In quantum information, there are a wealth of impossibility theorems, such as the no-cloning theorem, the monogamy of entanglement, etc. These place strong constraints on the type of universes that can exist, and which cannot. These theorems often take a central place in the philosophy of physics, and are often understood to characterize a theory as much as its axioms. In fact, one the favorite exercises for bored researchers is to figure out if various theories of physics can be derived from such results taken as axioms.

Perhaps, one day I will write a review article, which will discuss the many major and minor impossibility theorems in physics.

Nov 29, 2020

Quantum Mechanics in Your Face

Recently, a 1994 lecture by Sidney Coleman was transcribed and put on arXiv. The whole lecture is filled with wisdom, and clarity of thought regarding the conceptual framework of quantum mechanics. One particular point really struck me as something to permanently keep in my mind, and here described from my own perspective.

One of the worst pedagogical techniques involving quantum mechanics is talking about the wave-particle duality. This concept needs to die, because it only serves to confuse more than illuminate. The issue is that in quantum mechanics, the fundamental building blocks of matter (and massless particles) are these entities, whose state is modeled as a vector in Hilbert space. On the other hand, classical physics postulates that every entity in the universe is either a particle or a wave. The quantum entities have different behavior than particles or waves (obviously, otherwise what need is there for a new theory), but people continue to try to teach quantum mechanics by trying to pigeon hole quantum entities into the particle cubbyhole or the wave one.

For the last decade, I have wondered why, until Sidney Coleman puts the hammer on the nail. He says, people believe deep down that the world is really governed by classical mechanics, and therefore, they explain everything in terms of classical mechanics, or at least try to. (And let the student walk away even more confused). Instead, SC gives the correct pedagogy, aptly summarized on his slide:

"Every successful physical theory swallows its predecessor alive."

But it does so by interpreting the concepts of the old theory in terms of the new, NOT the other way around.

Thus our aim is NOT "the interpretations of quantum mechanics." It is the interpretation of classical mechanics

You can't say it much better than this. Once you have a more fundamental theory, you need to start your pedagogy from a clean slate. Construct the concepts of the new theory, and once they are fully defined, do you discuss how they reduce to the concepts of the old theory in certain circumstances.

He gives the example of thermodynamics and statistical mechanics, which subsumes the former. You don't want to explain the concept of molecular motion (only found within statistical mechanics) by appealing to the concept of heat. That is nonsensical. You explain heat in terms of molecular motion, the more fundamental quantity.

The final example he pulls out is one from the Fun to Imagine movie with Richard Feynman. In one segment, the host asks Feynman to explain why magnets attract or repel each other. Feynman simply answers that because "magnets repel each other". But the host insists on an explanation. Which results in Feynman monologuing on the what it means to explain something. I insist you listen to Feynman himself. But the short of it is that he says that, I can't explain it to you in terms of anything more fundamental that you understand (or even he understands). Electromagnetism is the most fundamental theory in its domain that explains a wealth of other phenomena, but we don't know why it is itself true. Our framework of explanation stops when we get to the axioms of electromagnetism. We can't explain electromagnetism in terms of everyday phenomena, because everyday phenomena is itself only explainable in terms of electromagnetism. The golden quote is,

"But I can't really do a good job, any job, of explaining magnetic force in terms of something else you are more familiar with, because I don't understand it in terms of anything else you are more familiar with."

Finally, to come back to quantum mechanics. Stop trying to explain the behavior of quantum entities in terms of classical entities, the fundamental quantity in terms of the derived. Instead, figure out how you can recover the conceptual framework of classical mechanics from that of quantum mechanics.

Nov 22, 2020

Qworld, QPakistan and the QBronze workshops

I recently have had the pleasure of joining QWorld, an initiative to spread knowledge and skills related to quantum technologies across the world.

This started with the hardworking Jibran Rashid putting in the effort to get a group of Pakistani academics to start QPakistan, as one of the QCousins within QWorld. Our first big event was a QBronze workshop, which is an introduction to programming quantum computers. In just five days, participants learn to write very simple programs using the popular quantum computing framework and simulators provided by IBM's qiskit.

Despite a very short notice, we got almost 250 people from across Pakistan to sign up. This was far more than we expected, since this is near the end of the Fall semester in the middle of a pandemic, and if students are my university are a representative sample, most students are struggling to keep up their classes. Only about half showed up, though, on the first day, and every day we lost some. By the end we were able to award diplomas to about 15-20 students who completed enough coding exercises.

The learning methodology of the workshop is very hands on. We did a one hour lecture every day, followed by two hours of time where students worked through very detailed Jupyter notebooks that consisted of a mix of explanations, example code and exercises. The workshop was conducted via Discord, which was a delight. We had a number of text channels for support, plus one voice channel per mentor. I would hang out in my channel and students would drop by one by one to ask questions or get help on problems. This replicated some of the features of in person workshops.

One thing we failed to do, which I think is a great idea for online workshops is helping students pair program. We should have set up a system for students to be randomly paired up. Then they could jump into a voice call with each other, and screen share, and code together. This will really help people feel like they are part of a group working towards a shared objective, besides helping them learn better and faster. Hopefully, we will implement this in a future iteration.

Which brings me to our next workshop, happening this week, a global QBronze workshop. I am looking foward to introducing students from across the world, the wonder that quantum computing.

Nov 14, 2020

The difference between classical and quantum

Once you imagine physics through the lens of information theory, you can answer some fundamental questions in completely different ways. One of the oft asked questions by physicists, where you can get an interesting answer, is as follows: is a given system classical or quantum mechanical in nature?

The information-theoretic answer is to ask the question, what resources are required to simulate the behavior of the system? The state of computational complexity (informed by relevant physics), at the moment, is as follows. We find that there are certain computational problems, that can be naturally thought of as simulating certain quantum mechanical systems, for which we have not found any efficient algorithms. The easiest class of such problems is called BQP. We also believe that quantum mechanical systems can be engineered to create quantum computers, that can simulate Turing machines efficiently, and that can solve BQP problems efficiently.

Really, the difference between BQP and P (problems efficiently solvable on a Turing machine) is the problems in the former have a solution space that blows up exponentially as the problem size grows, but that blow up is in a way that a quantum computer can exploit the exponential blow up of multi-qubit Hilbert space to solve the problem efficiently. The second condition is necessary so we don't include problems in NP-hard into BQP.

Now, there are many systems, that are conventionally thought of as quantum mechanical by physicists, because they usually describe these systems using the framework of quantum mechanics. But try simulating these systems. Use every trick in the book to reduce the computational time of the simulation. If you discover that the system can be simulated efficiently on a Turing machine, then, the information theorist will say that the system is classical. The Turing machine algorithm captures a description of system that does not blow up as the system size grows, and so the run time of the algorithm is polynomial.

Conversely, if your quantum mechanical equations defy speed up on a classical computer, then you have tamed the beast of the exponential Hilbert space, and you can claim that the problem is quantum mechanical in nature.

And there you have it. If a classical computer can efficiently simulate the behavior of a system, it is a classical system. If not, then it is quantum.

Nov 10, 2020

The difference between Physics and Chemistry

In a hallway discussion at the University, somebody claimed that atomic and molecular Chemistry is just Physics because the laws that govern the former are those well-studied by the latter. I have often held similar views, but when somebody else poses the same same, it is easier to be assess the idea critically.

There is a whole Wikipedia article on this distinction between the two. The discussion is what is commonly accepted, that the two fields have differences in scope and approach, with blurry boundaries at various points. I think this is a valid answer from a non-technical perspective, which will satisfy almost anyone that there are questions that physicists are not interested in, and other questions that chemists are not interested in.

However, I want to give a technical answer to the distinction, one that I think generalizes through the entire hierarchy of sciences, starting from Physics, going through Chemistry and Biology, moving on to Psychology, and then branching into Economics and Sociology, etc. My observation is not why these fields diverged in the first place, rather, here is a simple rule that can be used easily post-hoc find the division between fields. The rule is not super strong, and I don't think it helps to darken blurry boundaries.

The insight comes from computational complexity, applied to the mathematical models associated with the theories that are used in these fields. For example, the fundamental equations of physics are those of classical and quantum physics, from which most physical phenomena that physicists are usually interested in answering, can be inferred. However, these equations can be analytically solved for the set of relatively simple systems, and can be numerically solved for a set of slightly more complex systems. In the case of quantum mechanics, you can basically calculate the behavior of simple atoms or of crystals. Anything more complex or a system which is too large, and the computational complexity of the equations blows up and becomes out of the reach of current computers.

At this point, to be able to model the system, you have to make some serious approximations and write some simpler equations. These are the equations of applied Physics, or with further approximations, the equations of Chemistry. And there is your divide. Physics is at the top the staircase. Computational complexity has forced you down one step into Chemistry. At this point, it becomes difficult to answer the questions Physics asks, because the approximations you made threw away information from the equations. Your new equations can now answer a different set of questions, and therefore, both your scope and approach have changed.

In certain systems, the system size can gradually be increased, or you can make a series of ever more serious approximations. In such systems, the step between Physics and Chemistry will look more like a slope, or a blurry boundary.

The above applies to any other pair of adjacent fields. At some point, the equations of Chemistry become too complicated and you have to step down to the even simpler equations of Biology, and then you make a big jump to behavioral psychology, and then to Economics. While, humans have different reasons for making those jumps, the background reason is that computational complexity of the underlying phenomena makes it easy for humans to make that choice.

One final point. While I have characterized the difference as computational complexity, you can just as well make similar arguments with experimental complexity - the resources required to do an experiment. Note that Physics labs usually have the most complex equipment, while near the bottom of the staircase, Economics experiments require basically no specialized equipment.

May 08, 2020

Scientific experts and authority

Somebody posted on Hacker News the Baloney Detection Kit by Carl Sagan. Wonderful reading, as is the full book, Demon Haunted World: Science as the Candle in the Dark.

One person, perhaps trolling, asked

'Arguments from authority carry little weight — "authorities" have made mistakes in the past. They will do so again in the future. Perhaps a better way to say it is that in science there are no authorities; at most, there are experts.'

It'd be interesting to see what is left after applying this consistently to scientific announcements. E.g. much of the pro climate change arguments I see are: all the scientists think it is happening, so it must be happening. Not saying they are wrong, but why should only science get a pass when it comes to argument from authority?

I like to answer such questions seriously, for the benefit of other readers, interested in a serious answer to such a question. This person, in particular, was confused about the meaning of "authority" as used by Carl Sagan. They also forget that people have limited time in their life and can't do everything. You can read my responses to them in the link.

Here, I pose and answer a slightly different question. Suppose you have some scientific question that you want answered, such as "Are humans causing the climate of Earth to substantially change?" You observe the people have different answers to this question. Who should you belief?

The answer depends, to a high degree on what you know. Here I present an simplified algorithm.

  1. If you are an expert in the topic itself. You have the both the skills and knowledge, gained over many years of hard work, to answer this question. You should form an answer. Next, compare your answers with other experts in the field, and debate them if your answer differs.

  2. If you are a semi-expert. Usually, this means you are an expert in some other field. You have some skills to roughly evaluate if experts in this field are talking sense, but you don't have the knowledge or the full set of skills required to answer the question. If you want to spend time on the question, you should read what various experts have to say, critically evaluate them, and adopt the answer that makes most sense to you. Otherwise, do what the non-experts should do.

  3. If you are a non-expert. You neither have the skills or the knowledge to assess which experts are right and which are wrong. You should figure out what the most common answer among the experts is, and just believe it. Your attempts to evaluate the experts will fail. You attempts to come up with an answer yourself will fail.

In all cases, you should ignore what non-experts, or even semi-experts, think is the answer to the question.

What does this algorithm not cover? First, determing who is an expert can be very difficult if you are a semi- or non-expert. This is where, without the aid of good social institutions, there is no real way out.

Second, some scientific questions are connect to political questions, and experts might be biased. Again, there is no easy way out. Generally, in this case, you need to listen to the semi-experts, who don't have a vested interest in answering the question one way or the other, and who have done their best to evaluate the experts.

As in anything to do with humans, there is no method, that does not have edge cases or which does not break down in certain conditions.

Later, the commentator, said

Let's take the Galileo case. There were a consensus of experts on Aristotelian cosmology, the best rational worldview of that day, and Galileo's model did not match observations at that point. So, his proposal was rejected. Obviously, Galileo was right to not listen to the consensus of experts in that case.

So, what makes current scientific consensus different than the Aristotelian consensus of Galileo's day? Why are dissidents wrong to question modern scientific consensus (of which climate change is only one controversial element), but right to question the medieval consensus? In both cases, the consensus meets the then current standard for rational worldview, so we can't use hindsight.

Clearly, the answer is that Galileo was justified in holding onto this answer because he was an expert. Experts should not defer to anyone, even other experts, or even a consensus of other experts. Similarly, there probably are climate change dissidents who are actual experts in the field. But unless, you are a semi-expert, you should not be paying attention to these people when debating policies. Any climate change dissidents, who are not experts, go straight out of the window.

Feb 02, 2020

Bell's theorem for temporal order

To figure a unification of quantum theory and gravity, the physics community has recently started pursuing a new direction. The core idea of this direction is rather simple. Suppose, an object with mass is placed in a quantum superposition of two location, then presumably the gravitational interaction of this object with other nearby objects should also be in a quantum superposition. As an example, consider placing a massive object in a superposition of being on the left and right of a test mass. The test mass should fell a superposition of being pulled to the left or right.

What does it even mean to be in a superposition of being pulled to the left or right? First of all, you can check this with electromagnetism, by imagining the massive object is a proton and the test mass is an electron. The formalism of quantum theory tells you already how the electron will move in such a scenario. Gravity is different from electromagnetism (or the strong and weak forces), and the mechanics of the test mass's motion can be more complicated when gravity is fully accounted for. The difference is that the gravity of a massive object not only exerts forces on the test mass, but also determines in which causal order other objects interact with this test mass, or even if they even interact with it at all.

According to Einstein's theory of gravity, mass changes the metric of spacetime, which determines how far different spacetime points (called events) are. It also asserts that there is a maxiumum speed at which an object may move in spacetime (the speed of light). Two events are said to be causally connected if information (moving at the speed of light) can transmit from one event to the other. If it can, then the earlier event can causally influence physical systems at the later event. Otherwise, it cannot.

Now, by placing a massive object just so, one can change the distance between two events. For example, suppose two objects were not causally connected, but the placement of the massive object, reduces the distance between the two objects so that now they are causally connected. Or vice versa, two causally connected events can be made to be causally disconnected. Even more interestingly, consider two causally disconnected events <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>A<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">A and <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>B<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">B, such that <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>A<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">A and <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>B<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">B are simultaneous in some reference frame. Now, one placement of a massive object forces <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>A<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">A to be in the causal past of <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>B<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">B, and another placement of the massive object forces <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>B<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">B to be in the past of <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>A<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">A.

Imagine that there is some physical system traversing through spacetime, that passes through this region where <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>A<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">A and <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>B<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">B are. Now, depending on the placement of massive object, this physical system will either pass through <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>A<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">A first or <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>B<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">B. Morever, there is someone sitting at <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>A<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">A and someone sitting at <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}mi>B<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">B, whose job is to transform the state of the physical system by <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mi>U<{http://www.w3.org/1998/Math/MathML}mi>A<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">U_A and <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mi>U<{http://www.w3.org/1998/Math/MathML}mi>B<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">U_B respectively. So, the physical system will either be transformed by <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mi>U<{http://www.w3.org/1998/Math/MathML}mi>B<{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mi>U<{http://www.w3.org/1998/Math/MathML}mi>A<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">U_BU_A or <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mi>U<{http://www.w3.org/1998/Math/MathML}mi>A<{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mi>U<{http://www.w3.org/1998/Math/MathML}mi>B<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">U_AU_B, depending on the placement of the massive object.

Finally, to complete the picture, imagine the massive object is in the superposition of the two locations, so the physical system is effected by a superposition of <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mi>U<{http://www.w3.org/1998/Math/MathML}mi>B<{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mi>U<{http://www.w3.org/1998/Math/MathML}mi>A<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">U_BU_A and <{http://www.w3.org/1998/Math/MathML}math><{http://www.w3.org/1998/Math/MathML}semantics><{http://www.w3.org/1998/Math/MathML}mrow><{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mi>U<{http://www.w3.org/1998/Math/MathML}mi>A<{http://www.w3.org/1998/Math/MathML}msub><{http://www.w3.org/1998/Math/MathML}mi>U<{http://www.w3.org/1998/Math/MathML}mi>B<{http://www.w3.org/1998/Math/MathML}annotation encoding="application/x-tex">U_AU_B. Such a scenario is called one of indefinite causal order. The causal order in which the state of the physical system is transformed cannot be partially ordered. Note that superpositions are different from statistical mixture of causal orders. Deterministic or statistical mixtures are termed classical, while others are termed non-classical.

I am reading this fascinating paper, titled Bell's theorem for temporal order, which develops this idea much further. The fundamental question the authors try to tackle is, whether there is an experimental test which can distinguish a classically ordered causal order from that of non-classical one. Meaning, the experiment should be such that no classical system can reproduce the results of one consisting of superpositions of causal orders.

The regular Bell theorem specifies an experimental test that distinguishes between local hidden variable theories from quantum theories. Two entangled particles are distributed to two distant agents, who subsequently measure these particles in certain pre-agreed upon ways. The correlatons between the results of their measurements are higher than what would be possible in a universe governed by classical laws.

Here, the authors construct a similar test. Instead of starting with an entangled state, they start a separable state. The separable state cannot be used in the regular Bell test. What they do is use the superposition of causal orders to convert this separable state into an entangled state. Then they perform the regular Bell test on this entangled state. If the state is genuinly entangled, then the Bell test will be passed, otherwise it will be failed. Since, the source of the entangled state is the indefinite causal order, success of the subsequent test is evidence that indefinite causal order working as expected.

What I would like to know is what other information processing tasks can be designed whose possibility or difficulty will be determined by the laws of the universe we live in. One way forward is to look at the list of currently known information processing tasks in which operations occur in some well defined order and modify them so they involve indefinite causal orders.