Jeremy CôtéMathematical and scientific thinking for the curious.
https://cotejer.github.io//
Evolving Qubits With Bits<p>As a quantum theorist, my job is to study quantum systems and understand their inner workings. However, since I’m a theoretical physicist and not a experimental physicist, most of my “experiments” come in the form of simulations. My laboratory is my computer, and this means writing numerical experiments.</p>
<p>But wait a second, you tell me. Isn’t the whole point of quantum computers to do things that our regular computers can’t? And aren’t there issues with exponential memory?</p>
<p>These are both very good questions, and we’ll dive into them below. But in short: Yes, these are issues that limit the experiments I can do. And it’s a reason I’d like to get my hands on a good, error-correcting quantum computer!</p>
<!--more-->
<p>In the absence of one<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, I make do with classical simulations. However, depending on the job you want to do, different techniques are preferable. Since I’ve begun my PhD, I’ve learned how to simulate quantum systems using a variety of techniques. Each has their own quirks, and for someone as code-averse as I was, I’m happy to say that I’ve slowly found my groove.</p>
<p>The thing I’ve learned through all of these methods though is that simulating a system (any kind of system) is often <em>very</em> different from the actual system itself. When I write a simulation, what I’m trying to do is map the system I want to study to objects in my code which I can handle more easily. To put it concretely: While I imagine my “simulation” of qubits to be some strange quantum system, the reality is that it’s a bunch of lists and arrays being transformed using the tools of linear algebra.</p>
<p>At first, this really bugged me. After all, I want to simulate my quantum system, but what I’m really spending my time on is converting it into a language my computer can understand. These layers of abstraction between the system that I want to simulate and its <em>representation</em> in a computer is something I’ve learned to deal with. In a way, writing a good simulation is an art form in and of itself, since you have to figure out an equivalent way to represent your system as code.</p>
<p>As you will see, there are many ways to simulate quantum systems now, and each one has its own representation. Therefore, if you want maximal control over your simulation, it’s a good idea to learn the techniques required. There <em>is</em> a point in which you should just say, “Okay, I don’t need to know how <em>this</em> underlying thing works,” but I think this point is further than I used to believe.</p>
<p>Finally, working with numerical simulations has taught me something very important about mathematics: It’s one thing to see an equation in a paper. It’s a whole other thing to implement it in code and see it give you results. Often, an equation hides a lot of numerical baggage which needs to be implemented before you get a result. As my friend said in a presentation, “You don’t understand X unless you’ve coded it yourself.” This is something I’ve heard from others as well<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, and I think there’s a lot of truth to that statement. Which is why I try not to shy away from learning new coding practices when it comes to my research.</p>
<p>This essay will be broken up into three categories, which roughly describe the kinds of methods for simulating quantum systems that I’ve worked on. I’ll mention the specific libraries and tools I’ve used, but the point here is to discuss the broad categories, since those are more important than the specific tool.</p>
<h2 id="state-vector-simulators">State Vector Simulators</h2>
<p>This is probably the “closest” simulator to the kind you would imagine in reality<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. What I mean by this is that you start with your quantum state $\vert\psi\rangle$ and it evolves according to various quantum gates that apply on your system.</p>
<p>Schematically, the evolution looks like this:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1619274773/Blog/QuantumCircuit.png" alt="An example quantum circuit" /></p>
<p>In other words, you start with an initial quantum state, and each gate you apply is a unitary matrix which multiplies your initial state. So if I apply some quantum gate $U$ on my system, the resulting state is $U\vert\psi\rangle$.</p>
<p>One thing you might notice is that a circuit which is drawn left-to-right (in the sense that the initial state is on the left) is written in the reverse way when seen as matrix multiplication.</p>
<p>For me, the state vector picture is the easiest way to work with a quantum experiment. I just have to define my gates (the usual suspects like the Pauli operators and the CNOT gates are already defined), and then specify how they apply to a specific qubit. The libraries which do this sort of simulation then take care of actually building the right matrix $U$ to apply to the system.</p>
<p>The main library I’ve used here is <a href="https://qiskit.org/">Qiskit</a>, because of the above-mentioned partnership we have with IBM.</p>
<p>A <em>very</em> important point I want to make here is how this underlying matrix $U$ is built. Let’s say we have three qubits, and we want to apply an <em>X</em> gate on the second qubit. As an operator, it would look like this:</p>
<p>\(U = 1 \otimes X \otimes 1.\)
This is fine as a notation, but as a matrix, it becomes 8×8, which is starting to get big. In fact, if you have N qubits, then each individual qubit matrix is 2×2, so in total the size of the resulting matrix $U$ is 2<sup>N</sup>×2<sup>N</sup>. As you can probably imagine, this doesn’t scale too well for large system sizes.</p>
<p>At the end of the day, the reason this isn’t sustainable is because the vector needed to describe quantum state has 2<sup>N</sup> components for N qubits. That’s fine for small systems, but if we want to start making claims in the “thermodynamic limit” of some quantum system when N→∞, we quickly get stuck.</p>
<p>So the state vector picture is a nice starting point because there’s a very direct connection to what’s going on. The elements you see in the circuit are matrices, and they keep on multiplying the initial quantum state until you get to the end. There are a few more subtleties when it comes to measurements, but that’s the gist.</p>
<p>The drawback is that you can’t scale up very high in the circuit picture. To give you a rough estimate, in Qiskit’s online version of their statevector simulator (this directly deals with the quantum state) and the QASM simulator (more for dealing with measurement results), their upper limit is 32 qubits. I think you can potentially go higher on your own hardware if you have enough RAM, but when I say “higher” I mean something like two extra qubits. Again, this has to do with the exponential cost of storing a quantum state. Each new qubit requires <em>double</em> the information to store. This is the crux of the bottleneck, since doing linear algebra with a big object like that is not easy.</p>
<p>But what if you didn’t need all those components? What if some of them you knew were always zero, or some were fixed by a property of your system? Then, is it possible to shrink the amount of space needed to simulate a circuit?</p>
<p>The answer is <em>yes</em>, and this brings us to tensor networks.</p>
<h2 id="tensor-network-simulators">Tensor Network Simulators</h2>
<p>The term “tensor network” feels like a buzzword to me in a similar manner that “machine learning” does. To me, these two words just sound cool, and make it seem like a lot of neat research can be done with them.</p>
<p>In fact, saying this section is about tensor networks is like saying this essay is about quantum theory. <em>Sure</em>, but that’s pretty vague. There are a bunch of different areas of tensor network research, but the one I’ll be focusing on here deals with a representation of quantum states called “matrix product states” (MPS).</p>
<p>People are really good at naming things, so it turns out that the main object of an MPS is <em>not</em> a matrix, but a tensor<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. I plan on getting into tensor networks much more in future essays (because this is related to a lot of my research), but here are the essential bits.</p>
<p>When you have your quantum state $\vert \psi\rangle$, you have to keep track of all 2<sup>N</sup> components. In fact, if we want to write things out fully, we can write down our quantum state (over three qubits, for example) as:
\(\vert \psi\rangle = \sum_{i,j,k} c_{ijk} \vert ijk\rangle.\)
The coefficients $c_{ijk}$ are precisely all of the different entries of the state, and the vector $\vert ijk\rangle$ is the basis state, which tells us <em>where</em> to put the coefficient in our state $\vert \psi\rangle$. As a diagram, we can think of it like this:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1619274773/Blog/BasisState.png" alt="The basis states." /></p>
<p>The MPS form of a quantum state is a way to split the coefficients $c_{ijk}$ such that they are each defined on a <em>single</em> qubit. This doesn’t come for free though, so what we end up doing is creating tensors that can be chained together to reproduce the coefficient $c_{ijk}$ if we multiply everything out. I won’t show you all of the indices in this essay because it would scare you off (a future essay should cover it), but as a diagram, here’s the idea:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1619274773/Blog/MPS.png" alt="A diagram for a matrix product state." /></p>
<p>With this, we can potentially save space on the number of elements we need to store (see Reference 1 for more). In my case, using the MPS form of a quantum state allows me to study the entanglement entropy of a quantum state while never having the full 2<sup>N</sup> object in memory. This allows for more qubits to be simulated. For example, Qiskit has a MPS backend, and you can go up to 100 qubits with it.</p>
<p>The downside is that you can’t do anything that builds the whole state vector. For example, you can’t check what the full quantum state is, only what it is in its “decomposed” form. You can think of it like this: We’ve broken up a huge quantum state into a bunch of small packages which we can handle. What we <em>can’t</em> handle is putting them all back together.</p>
<p>Applying gates to an MPS is conceptually straightforward but I found it was difficult to get it working in code. That’s because you need to take care of so many indices. In a sense, it’s “just” multi-dimensional linear algebra, but that “just” is doing a lot of work.</p>
<p>What’s nice though about applying gates is that you can apply them without the operator becoming too “big”. They only apply to a single site (or perhaps a few sites) at a time, which reduces their size.</p>
<p>There are costs to this method though. The main one is that you need to do a ton of singular value decompositions, and this becomes expensive as you have more and more qubits arrange in a line. There are alternative ways to deal with this, but I’m not as familiar with them.</p>
<h2 id="clifford-circuit-simulators">Clifford Circuit Simulators</h2>
<p>The final method I’ve learned how to do involves very particular gates called “Clifford” gates. Quantum circuits that are built up of only these gates, which are the CNOT, SWAP, Pauli X, Y, Z, the phase gate P, and the Hadamard gate H, are special. That’s because we can <em>drastically</em> reduce the number of objects we need to store in memory when simulating these circuits.</p>
<p>Usually, we want to keep track of 2<sup>N</sup> components (or perhaps somewhat less if we’re using tensor networks), which grows quickly. But in the case of Clifford circuits, the number of objects we need to track only grows linearly with N.</p>
<p>How can this be?</p>
<p>It has to do with the fact that we can represent quantum states in Clifford circuits using what are called “stabilizer generators”. If you’re a long-time reader, you may remember I covered stabilizers in <a href="https://cotejer.github.io/game-of-loops">“A Game of Loops”</a>, where the stabilizers were the plaquette operators acting on my surface code. Using this, I was able to think of my quantum state’s evolution mainly through these operators (though in that case I also looked at the full 2<sup>N</sup> objects).</p>
<p>I won’t go through all the technical details of the stabilizer formalism here, but the gist is that we can represent quantum states using a set of operators which “stabilize” the system, and do evolution by evolving these objects. See Reference 2 for more on this formalism.</p>
<p>This means that if we have a stabilizer G acting on the quantum state, then G essentially acts as the identity. It’s <em>not</em> the identity, but the combined action of the operator on the quantum state looks like an identity operation.</p>
<p>To give you a small example, imagine we start the quantum state in the computational basis state $\vert \psi\rangle = \vert 00110\rangle$. It turns out that there are precisely five stabilizer generators which stabilize the state (apart from the identity). They are:
\(Z\otimes1\otimes1\otimes1\otimes1, \\
1\otimes Z\otimes1\otimes1\otimes1, \\
1\otimes1\otimes -Z\otimes1\otimes1, \\
1\otimes1\otimes1\otimes -Z\otimes1, \\
1\otimes1\otimes1\otimes1\otimes Z.\)
You can check that applying these operators to the quantum state results in no net change. And using some principles of group theory, it turns out that these can generate any sort of other stabilizer we might dream up, such as $1\otimes1\otimes Z\otimes Z\otimes1$, which is a combination of the third and fourth generators above.</p>
<p>So this is great for representing a quantum state, but the real magic is that this representation gives us a way to simulate the <em>evolution</em> of a quantum state.</p>
<p>That’s because a stabilizer state will remain a stabilizer state as you evolve it in a Clifford circuit. The only difference is that the generators will change. To evolve a quantum state then, all we have to do is track how these generators change (and remember, there are only N of them).</p>
<p>For example, suppose we apply a Hadamard gate on the first qubit. This gives us the transformation $H\vert 0\rangle = \vert +\rangle$, so our new quantum state is $\vert +0110\rangle$. We then change the stabilizer generators to account for this, and we end up finding:
\(X\otimes1\otimes1\otimes1\otimes1, \\
1\otimes Z\otimes1\otimes1\otimes1, \\
1\otimes1\otimes -Z\otimes1\otimes1, \\
1\otimes1\otimes1\otimes -Z\otimes1, \\
1\otimes1\otimes1\otimes1\otimes Z.\)
The rules for changing the generators can be found in Reference 2. I won’t go through all of them here, but there are tools that track these changes automatically, and all you have to do is supply the quantum circuit. The idea here is that these generators can be represented by a binary matrix of size N×2N, where N is the number of qubits. There are 2N columns because we will essentially keep track of the X and Z operators separately. These are all we need to specify our state, and so each one will get its own N×N matrix. Within this binary matrix, a 1 tells us that there’s a Pauli X or Z at the given site. Then, simulating the circuit requires updating this matrix.</p>
<p>Personally, I haven’t created my own simulator. The one I’ve used is called <a href="https://github.com/quantumlib/Stim">Stim</a>, by <a href="https://algassert.com/">Craig Gidney</a>. It’s fast, it works for my purposes, and I don’t have to reinvent the wheel. There’s also a Clifford circuit simulator in Qiskit, where the online version says it can simulate up to 5000 qubits. I’ve used Stim to simulate circuits with about 3000 qubits, so I can attest that this is achievable (though I was using a supercomputer). Now that I think of it though, the longest part of my computation was probably what I did <em>after</em> simulating the circuit and getting this matrix. The linear algebra on such a matrix becomes quite long.</p>
<p>Using this formalism is fantastic if you want to simulate really big system sizes. Unfortunately, it also depends on what you want to study. For example, it’s possible to use these Clifford circuits to get the entanglement entropy<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>, but it’s not possible (as far as I know) to get the entanglement spectrum (the eigenvalues of the reduced density matrix for a state) using this method. So depending on what you want to study, this might not be helpful.</p>
<h2 id="quantum-toolbox">Quantum Toolbox</h2>
<p>During this first year of my PhD, I’ve been in a sort of transition mode. I’ve had to learn the ropes in both this field of quantum system, as well as how to work with code and numerical tools. The learning curve has been high at times, but I’ve found an appreciation for building my own tools that are perfect for my needs.</p>
<p>I came across a <a href="https://4gravitons.com/2021/04/23/building-ones-technology/">blog post</a> on exactly this by Matthew von Hippel. As a lazy person, I love being able to just grab some code off the shelf that works for my needs. But as a scientist, I know that it’s probably a good idea to build up a set of tools that I’m confident will work. This isn’t easy, and it often involves a lot of “wasted” time where research isn’t moving forward as quickly as I want. But once I have the tools built, I can then tweak things to my heart’s content.</p>
<p>In my case, this has been a year of learning techniques on how to simulate quantum circuits. Learning the ropes of the three methods above has been a big part of my research, and I’ve reached the point where I have a decent enough handle on these aspects that I can use them to start answering questions.</p>
<p>I hope to write more about how these tools are used in my research in future essays. For now though, I’ve built my toolbox, and now I can start digging into the research.</p>
<h2 id="references">References</h2>
<ol>
<li>I really like the <a href="https://tensornetwork.org/">this site</a>, which explains the various ways to use tensor networks to simulate quantum systems. In particular, the <a href="https://tensornetwork.org/mps/#toc_2">MPS article</a>, is well-done, and the link I’ve provided here will give you an idea for how many parameters you need to describe your MPS.</li>
<li>The Clifford stabilizer formalism for simulating quantum circuits was developed (to my knowledge) by <a href="https://www.scottaaronson.com/blog/">Scott Aaronson</a> and <a href="https://www2.perimeterinstitute.ca/personal/dgottesman/">Daniel Gottesman</a> in <a href="https://www.scottaaronson.com/papers/chp6.pdf">this paper</a>. There are others as well around the same time using slightly different approaches, but this is the one I’ve based my code on.</li>
</ol>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>The university which I’m doing my PhD at has a partnership with IBM, so I can actually use their quantum devices, including their largest available quantum computer: the <code class="language-plaintext highlighter-rouge">ibmq_manhattan</code>, which has 65 qubits. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>I’m thinking of Philip Moriarty’s <a href="https://www.numberphile.com/videos/philip-moriarty">podcast episode</a> with Brady Haran on Numberphile. Philip Moriarty also has a <a href="https://muircheartblog.wpcomstaging.com/">great blog</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Though, I imagine experimental quantum physicist’s would disagree with me and think that even this is oversimplifying things. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>This reminds me of the Standard Model class I took during my master’s. When people write down the Standard Model Lagrangian (think: the expression governing how a system evolves), there are a bunch of indices present. However, what you later learn is that there are <em>hidden</em> indices which are implicit in the notation, because adding those would make the expression even messier. I suspect a similar thing is going on here, where an index is omitted from consideration because it’s not relevant to the discussion, and so the tensor is demoted to a matrix. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>I spent a long time trying to figure out how to do this. To save every curious person in the future time, I asked and answered <a href="https://quantumcomputing.stackexchange.com/q/16718/">my own question on the Quantum Computing Stack Exchange</a>. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Sun, 25 Apr 2021 00:00:00 +0000
https://cotejer.github.io//evolving-qubits-with-bits
https://cotejer.github.io//evolving-qubits-with-bitsThe Shattering of SAT<p>If condensed matter theorists have the Ising model, gravitational physicists have the Schwarzschild solution, and quantum foundation theorists have the Bell inequalities, then theoretical computer scientists have satisfiability, or SAT. In the world of computer science (and particularly computational complexity), many discussions inevitably circle back to SAT. In fact, SAT isn’t just something that theoretical computer scientists study. Satisfiability has a rich history with statistical physics, a field which wields powerful tools to probe the properties of SAT. As such, SAT is a problem which touches several fields, which makes it a breeding ground for cross-disciplinary ideas.</p>
<p>But what is satisfiability?</p>
<p>At its core, satisfiability is the search for optimizing constraints. When you’re trying to solve a problem, you often can’t give <em>any</em> sort of solution. You have to work within the limits of possibility, for starters. (This is why we have engineers instead of theoretical physicists building devices. The latter might get you something that <em>technically</em> works, but usually at the cost of neglecting air friction or imagining point masses.) You also have to take into account the wishes of the person giving you the problem. Roughly, the constraints you end up finding will be a combination of the laws of physics and the whims of the person.</p>
<p>Satisfiability is what you get when you abstract all the messiness of the real world and go to the land of computer science. We only care about the core idea, which is that you have variables for your system (think of possibilities for action) and constraints these variables must satisfy. From there, we can ask a variety of questions:</p>
<ul>
<li>Given a set of variables and constraints, can I set the variables such that they obey the constraints?</li>
<li>How many solutions are there for a given set of constraints?</li>
<li>If there is no solution, which configuration of the variables agrees with as many constraints as possible?</li>
</ul>
<p>Each of these questions is a different way to tackle satisfiability. They called SAT, #SAT, and MAXSAT respectively<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>So what makes SAT such a magnet for research?</p>
<p>There are multiple reasons for this. First, most (but not all, see Reference 1) SAT problems belong to the complexity class NP, and are actually NP-complete. Without going into the details too much, because many SAT problems are NP-complete, it means that they are <strong>as difficult as any other problem in NP</strong>. If you have another NP problem and you suddenly find a way to solve a type of SAT in polynomial time, then all of the other problems in NP will also be solvable in polynomial time. As such, if you want to study the complexity class NP, you won’t do any better than studying SAT.</p>
<p>Second, SAT problems aren’t just of theoretical interest, but have a lot of practical applications.</p>
<p>One of those ideas from statistical physics that has been ported to SAT problems is the notion of phase transitions. At its core, a phase transition is the sudden radical change of a quantity that describes the system. The more radical the change, the “sharper” the transition. A classic example is in the Ising model, where temperature is the quantity you vary and the magnetization is the quantity that undergoes a sudden change (This is Figure 4 of <a href="http://personal.ph.surrey.ac.uk/~phs1rs/teaching/ising.pdf">these lecture notes</a>).</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto/v1616265738/Blog/Phase.png" alt="Magnetization phase transition in the Ising model" /></p>
<p>Phase transitions indicate a drastic change in the system, and the toolkit of statistical physics is what lets us understand the way they work.</p>
<p>The reason I got hung up on phase transitions and SAT is because I came across the same diagram over and over again in SAT papers. Being physicists, they had the habit of making diagrams that only give you the idea of the concept instead of actually showing something from an experiment. In this case, the diagram looked something like this (see the References):</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto/v1616265738/Blog/Droplets.png" alt="Droplets of SAT" /></p>
<p>If you read a few of these papers, you get the sense that the big blob on the left is the “puddle” of solutions when there aren’t many constraints, and as you add constraints, the puddle dries up and becomes just a collection of droplets. This is the “shattering” of SAT, and it means the solutions aren’t clustered close to each other (in terms of their bit strings).</p>
<p>That’s a fine mental model, but I wanted more. In particular, I wanted to understand what was going on to call the puddle one “thing”, while later on the puddle broke up into individual droplets. What was going on here?</p>
<p>In this essay, we’re going to look at some flavours of satisfiability, and understand where the critical thresholds for phase transitions are. The goal will be simple: <strong>To understand how the solution space of a SAT problem “shatters” or “clusters” as we increase the order parameter.</strong> To get there, we will have to venture into what a hypercube is, how to visualize SAT problems on graphs, and what it means to uncover phase transitions.</p>
<h2 id="anatomy-of-a-sat-problem">Anatomy of a SAT problem</h2>
<p>A SAT problem has a few parts. First, you have a set of variables. These variables can take values 0 or 1. From those variables, you have constraints. These constraints are in the form of tuples (think: lists) which specify what variables you have, and if they are negated.</p>
<p>For example, if I have V = 5 variables (x<sub>0</sub>, x<sub>1</sub>, x<sub>2</sub>, x<sub>3</sub>, x<sub>4</sub>), then a clause might be written as: (x<sub>0</sub>, x<sub>3</sub>, x<sub>4</sub>). This would tell me to include these three variables in a constraint. A SAT formula is just a bunch of constraints together.</p>
<p>The objective of a SAT problem varies depending on the question you ask, but if we take the first question I listed above, we simply want to know if there exists <em>one</em> way to set the variables so that all the constraints are satisfied. (In the jargon, the formula is a logical AND of all the constraints.)</p>
<p>As to the actual contents of each constraint, this is depends on the flavour of SAT you consider.</p>
<h2 id="flavours-of-sat">Flavours of SAT</h2>
<p>There are many different SAT variants, each with their own quirks. For this essay, I’m going to give you just a few variants so we can get a feel for what they are. For each variant, I’m going to use diagrams to give you an idea of how these constraints look. I’m also going to consider just k = 3 SAT problems here, though you could go higher (or lower to k = 2, but those end up being easier problems).</p>
<p>Before we move on, there’s an important point to make.</p>
<p>The two things we control for a given SAT variant is the number of variables V and the number of constraints C. It turns out that there are phase transitions when we α = C / V is a certain value (depending on the SAT variant). You can think of this as tell you how many constraints a given variable participates in one average. The idea for the phase transition is that below this threshold value of α, the probability that you will find at least one satisfying assignment is 1. Above the threshold, this probability drops to 0 (as you go to larger system sizes).</p>
<p>For us, we will see this a bit in the animations, but this threshold isn’t quite what we’re looking for in this essay. Instead, we want to see how quickly the solution space shatters, which by definition will have to happen before getting to the threshold (because then you have no solution). We won’t get into too deep the numerical values here. Instead, I just want to visualize what’s happening with these SAT variants.</p>
<h3 id="3sat">3SAT</h3>
<p>Plain old vanilla SAT is pretty simple. Everything’s a solution except for the all-zero configuration. If you have three variables, then (0,0,0) is the only solution that gets thrown out when you apply a clause.</p>
<p>As a diagram, it looks something like this:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto/v1616331985/Blog/SATtensor.png" alt="SAT tensor" /></p>
<p>Because a SAT clause only chucks out one of the eight possible assignments (in the case of three variables), you could imagine that you need a lot of clauses before there are no solutions left for your formula F. And you would be right! If we’re looking at 3SAT, then the threshold for satisfiability is at about α = C / V ~ 4.2 (see the References). I won’t show it in the animations below because of this high number of clauses needed to “break apart” the solution space, but I had to mention it.</p>
<h3 id="3xorsat">3XORSAT</h3>
<p>The “XOR” part of the name means “exclusive or”. It turns out that the XOR operation has a nice interpretation in terms of equations: Boolean addition, with a constraint added. This constraint is the <em>parity</em> of the clause, and can be even (0) or odd (1).</p>
<p>In other words, if we have three variables x, y, and z, then an even parity XOR clause would look like this: x ⊕ y ⊕ z = 0.</p>
<p>This means the only solutions to satisfy the clause over three bits would be:</p>
<ul>
<li>(0,0,0)</li>
<li>(0,1,1)</li>
<li>(1,0,1)</li>
<li>(1,1,0)</li>
</ul>
<p>Diagrammatically, we could represent it like this:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto/v1616331985/Blog/XORSATtensor.png" alt="3XORSAT tensor" /></p>
<p>There are two things to mention here. First, XORSAT is actually one of the examples of a SAT problem which is <em>not</em> in NP. It’s in P, and this has to do with the very special property that XORSAT can be recast as a series of Boolean equations. This can be solved by Gauss-Jordan elimination (that thing you did in linear algebra class a long time ago regarding row reductions and writing matrices out over and over). This is a polynomial time algorithm, so XORSAT is in P. But, XORSAT is still a problem that people study a lot<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>
<p>Second, the threshold for XORSAT (at least, a specific “constrained” version of it<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>) is at α = C / V = 1.</p>
<h3 id="3naesat">3NAESAT</h3>
<p>The name for this one is “Not-All-Equal SAT”. I like to think of this as “double” the regular SAT problem. For regular SAT, you throw out the all-zeros configuration for the variables in a clause. For NAESAT, you throw out both the all-zeros and all-ones configurations.</p>
<p>As a diagram, it looks like this:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto/v1616331985/Blog/NAESATtensor.png" alt="3NAESAT tensor" /></p>
<p>In terms of the threshold, it’s located at α = C / V ~ 2.2.</p>
<h3 id="1-in-3-sat">1-in-3 SAT</h3>
<p>Finally, we have 1-in-3 SAT. If we have three variables x, y, and z, this problem means the <em>only</em> accepted configurations are those that have exactly one 1 in their configuration. So the allowed configurations would be (1,0,0), (0,1,0), and (0,0,1).</p>
<p>As a diagram, it looks like this:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto/v1616331985/Blog/1in3SATtensor.png" alt="1-in-3 SAT tensor" /></p>
<p>Moreover, this variant is what you might call the most restrictive of all the variants. Whereas XORSAT filters out up to half the configurations each time (since we’re setting the parity, and it can be only even or odd), 1-in-3 SAT only allows 3/8 configurations. (For the general 1-in-k SAT problem, this becomes k / 2<sup>k</sup> configurations which are allowed.)</p>
<p>If you look at just positive 1-in-3SAT (see the next paragraph), the threshold is at α = C / V ~ 2/3.</p>
<hr />
<p>This is just a quick look at the variants, since we’ll be seeing them in the experiments I‘ve run below. I should also add that we can also make things a bit more complicated by looking at clauses which have negated variables in them. Think of something like (~x, y, z), where the “~” symbol means NOT. This can happen, but for now, we’ll just focus our attention on the case where there are no negations. This is sometimes called “positive” SAT.</p>
<h2 id="sat-solutions-on-a-hypercube">SAT solutions on a hypercube</h2>
<p>We’ve seen that different SAT variants have a threshold, so now let’s try to visualize how they evolve as we apply clauses.</p>
<p>To begin with, how do we visualize the set of solutions? There are many ways you can do this, but I want to take the one that is perhaps simplest and easy to think about.</p>
<p>If we have V variables, then there are precisely 2<sup>V</sup> configurations of bit strings. A nice way to see this is to think about building the bit string bit by bit. For the first bit, you have two choices: 1 or 0. For the second bit, you have two choices again: 1 or 0. Using the rule of multiplication, this means you have 2 × 2 = 2<sup>2</sup> = 4 choices. Continuing on for each variable gives the desired 2<sup>V</sup>.</p>
<p>We can represent these configurations in binary notation. If we have V = 3, then we get the following:</p>
<ul>
<li>000</li>
<li>001</li>
<li>010</li>
<li>011</li>
<li>100</li>
<li>101</li>
<li>110</li>
<li>111</li>
</ul>
<p>What’s really cool here is that this gives us a way to map bit strings to coordinates. If we imagine a 3D space (because we are dealing with 3 variables), then the bit strings can act as coordinates for our nodes.</p>
<p>That’s great, but if we are going to see the <em>shattering</em> of SAT, don’t the nodes have to cluster in some way?</p>
<p>Yes, here’s how.</p>
<p>We’re going to put an edge between nodes if we can “reach” one node by modifying the bit string of another in exactly one place.</p>
<p>Here’s an example. Say we start with the bit string 010. We have exactly three options for how to change it.</p>
<ul>
<li>Changing the first bit gives 010 &rightarrow; 110.</li>
<li>Changing the second bit gives 010 &rightarrow; 000.</li>
<li>Changing the third bit gives 010 &rightarrow; 011.</li>
</ul>
<p>So we would connect the bit string 010 to the nodes at 110, 000, and 011.</p>
<p>What you will eventually find if you do this for the V = 3 case is something like this.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto/v1616331985/Blog/Hypercube.png" alt="Hypercube" /></p>
<p>Looks like a cube, doesn’t it?</p>
<p>In fact, that’s precisely what this structure is. When you flip one bit in a bit string, you’re connecting corners of what we call a <em>hypercube</em>, which is just the generalization of the cube into all dimensions greater than one.</p>
<p>This gives us a nice way of visualizing all of the potential SAT solutions: They are the corners of a hypercube.</p>
<p>Now, it’s important to note two things here. First, the graph that I’m showing here is different than the one we started from dealing with variables and constraints. This is a graph of the space of possible configurations for V variables. Nowhere here do we see anything about clauses. Instead, it’s just the solution space.</p>
<p>Second, the form of the graph is particular to our choice of connections. I told you that an edge means the bit strings are separated by one bit flip. This isn’t the only way to connect solutions. If I wanted, I could connect every node to any node that I can reach by <em>two</em> bit flips. That’s perfectly allowed.</p>
<p>So why am I limiting myself to one bit flip?</p>
<p>One reason is that the visualizations we’re going to see are done on a small number of variables. As such, if you connect solutions that are separated by multiple bit flips, your graph becomes <em>really</em> connected. And so you don’t end up seeing the shattering as effectively.</p>
<p>The other has to do with the premise of why these shattered spaces are of interest. Many algorithms for solving SAT problems are local, which means they branch out from one variable. The idea is that if the solution space gets shattered, it will be very difficult for a search algorithm which is in one cluster to branch out and “find” the other one, since they are separated by multiple bit flips. You could argue that two bit flips isn’t a lot, but for now, we’ll stick with one.</p>
<p>With this set up, the game becomes the following. Start with a hypercube in V dimensions. Label the nodes as bit strings corresponding to possible configurations for your SAT problem. Then, apply clauses. For each clause, figure out which configurations are incompatible with that clause, and remove those nodes from the graph (removing the nodes also removes the edges attached to them). At the end, if you still have nodes left in your graph, then those are precisely the solutions to your SAT problem.</p>
<p>To me, this is an elegant way of thinking about SAT problems. We’re pruning the hypercube of solutions, and seeing what’s left over.</p>
<h2 id="seeing-the-droplets">Seeing the droplets</h2>
<p>For modern SAT solvers, you can go to quite a few variables (in the hundreds or thousands, I believe). In our case, this would be really bad when looking at the graph of solutions, since we would have to start with 2<sup>V</sup> nodes. Putting V = 1000 would be a <em>really</em> bad idea.</p>
<p>To keep things easy (and so I can take the shortcut of solving this through brute force), I’m going to limit our exploration to V = 12. This gives us just over four thousand solutions, which I think is reasonable for graphing.</p>
<p>The best way to implement this would be a SAT solver which can give you all solutions to a given SAT formula. In my case, I’m going to simply enumerate all the possible bit strings, and filter the list as I include more clauses. Then, I will plot the remaining nodes using the graphing package <a href="https://networkx.org">NetworkX</a>. It’s as simple as that.</p>
<p>To generate the clauses, there are a few methods that can be used. Perhaps the easiest is the following: Using a random number generator, choose three random numbers (without replacement!) between 0 and V-1, and this tuple will become your clause. Repeat this for the number of clauses you want to use, and that’s it. If you want to be fancy, you can also include negations for the literals you choose with probability 1/2, but I’m not going to do that here. The only real difference is in 1-in-k SAT, where the threshold doubles if you avoid negations.</p>
<p>So that’s the setup. Let’s see some droplets!</p>
<p>For 3XORSAT:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto/v1616331985/Blog/XORSATshattering.gif" alt="XORSAT shattering" /></p>
<p>For 3NAESAT:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto/v1616331985/Blog/NAESATshattering.gif" alt="NAESAT shattering" /></p>
<p>For 1-in-3 SAT:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto/v1616331985/Blog/1in3SATshattering.gif" alt="1-in-3 SAT shattering" /></p>
<hr />
<p>I’m happy this visualization actually maps onto the simple diagram that I saw in all of those SAT papers and drew at the top of this essay. There are still many questions that could be asked about the exact locations of the thresholds for shattering, but that’s not the point of this essay. Instead, I wanted to give you a feel for what’s going on in the solution space.</p>
<p>One thing to note is that the visualization would change if we use a looser definition of when to draw edges (meaning a Hamming distance of greater than one).</p>
<hr />
<p>The thing I find the most neat about all of this is how a very general and abstract computer science problem can worm its way into the minds of physicists and become relevant to them. To me, this signals that the toolkit of each scientist is useful, and you never know when it will be fruitfully applied to a new problem. The language of phase transitions isn’t just for the realm of materials and chemicals, but we can form strong analogies to more abstract problems.</p>
<p>And if <em>that</em> doesn’t satisfy you, I don’t know what will.</p>
<h2 id="references">References</h2>
<ol>
<li>I’m not super familiar with this paper, but Schaefer’s dichotomy theorem seems to categorize the different types of SAT problems in terms of their complexity. See the <a href="https://en.wikipedia.org/wiki/Boolean_satisfiability_problem#Schaefer's_dichotomy_theorem">Wikipedia page for Boolean satisfiability</a> for more on this.</li>
<li>If you look at slide 15 of <a href="http://artax.karlin.mff.cuni.cz/~zdebl9am/presentations/Hard_Problems_Zdeborova.pdf">this presentation</a> by <a href="http://artax.karlin.mff.cuni.cz/~zdebl9am/">Lenka Zdeborová</a>, you can see the droplets in action. Figure 1 of <a href="https://arxiv.org/abs/0901.2130">“Hiding Quiet Solutions in Random Constraint Satisfaction Problems”</a>, by <a href="https://florentkrzakala.com/">Florent Krzakala</a> and Lenka Zdeborová also has it. I’ve seen it floating around in other places too.</li>
</ol>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>SAT just refers to the usual satisfiability problem of answering the question, “Is there a configuration for the variables such that the constraints are satisfied?” #SAT doesn’t just ask for the existence of a solution, but the <em>number</em> of solutions. Finally, MAXSAT asks for the solutions that have the maximum number of satisfied constraints. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>There are a few reasons for this, but from what I can tell, the main reason is that XORSAT is actually a difficult problem once you prohibit Gauss-Jordan elimination. It’s like we wield a magical sword that is capable of beating XORSAT because of some coincidences, so we ban it. Another reason is that XORSAT can be formulated as an Ising-like model with three-spin interactions. Concretely, if you take a Hamiltonian of the form H = -J ∑<sub>ijk</sub> s<sub>i</sub>s<sub>j</sub>s<sub>k</sub>, with your spin variables taking values of ±1, then if you define s<sub>i</sub> = (-1)<sup>x<sub>i</sub></sup>, with these new x<sub>i</sub> variables being 0 or 1, your Hamiltonian reduces to H = -J ∑<sub>ijk</sub> (-1)<sup>x<sub>i</sub> + x<sub>j</sub> + x<sub>k</sub></sup> = -J ∑<sub>ijk</sub> x<sub>i</sub> ⊕ x<sub>j</sub> ⊕ x<sub>k</sub>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>This has to do with how many constraints each variable is connected to. See <a href="https://arxiv.org/abs/1212.3822">this paper</a> by <a href="https://math.osu.edu/people/pittel.1">Boris Pittel</a> and <a href="https://personal.lse.ac.uk/sorkin/">Gregory B. Sorkin</a> (note that I haven’t read this paper closely, but the abstract contains the information I’m referring to). <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Thu, 25 Mar 2021 00:00:00 +0000
https://cotejer.github.io//shattering-of-sat
https://cotejer.github.io//shattering-of-satAbsorbers and Explorers<p>To be a scientist means to explore. You need to start from what is known and jump out into the void, investigating new ideas. In this regard, the scientist is an explorer, a person searching for new truths in a world without a map. To be more precise, a scientist <em>uncovers</em> the new map as they learn.</p>
<p>When we talk about science, we like to emphasize the new discoveries. Whether that’s the <a href="https://ligo.org/detections/GW150914.php">discovery of gravitational waves</a>, a breakthrough in <a href="https://science.sciencemag.org/content/337/6096/816.long">gene-editing</a>, or <a href="https://deepmind.com/research/case-studies/alphafold">advances in machine learning</a>, scientists get excited when a new truth about the world is unearthed.</p>
<p>In fact, I would argue that it’s the joy of <em>discovery</em> which drives a lot of scientists. It’s the chase for something new and unknown that excites them. So, if a scientist discovers a new pathway to superconductivity, they will be excited, but only until the next project. Then, the previous discovery is all but a nostalgic memory.</p>
<p>We could imagine humanity’s scientific knowledge as some sort of uneven boundary:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1613918774/Blog/ScientificBoundary.png" alt="The uneven boundary of scientific knowledge." /></p>
<p>It’s uneven because some areas of knowledge have advanced more quickly than others. If I were to animate the boundary, it would expand, but not uniformly. It would resemble something closer to a bubble in the midst of formation, its surface undulating.</p>
<p>I like this metaphor because the bubble’s surface is what scientists are working on expanding. We want our horizon to reach further, and so to do this, we push on the surface of the bubble. This is what doing research looks like<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1613918774/Blog/Research.png" alt="Pushing on the boundary." /></p>
<p>The goal of scientists is to push on this boundary as much as possible.</p>
<p>But there’s actually another bubble, and its nestled inside of the first. This bubble is what I would call “scientists’ knowledge”.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1613918774/Blog/TwoBoundaries.png" alt="Two boundaries." /></p>
<p>You might be wondering what the difference is between the two. The difference is that the inner bubble refers to the knowledge of <em>scientists</em>, while the outer one refers to the knowledge of <em>science</em>. The inner bubble is tied to humans, while the outer one is a product of humans.</p>
<p>Depending on your awareness of the scientific literature, the following may come as a shock: It’s messy.</p>
<p>We might like to think that the scientific literature is a nice, curated space with everything neatly organized and filed away properly, but that’s woefully optimistic. Instead, I think of the scientific literature more like a student’s messy desk, with papers and books all over the place. Finding things can either be super easy or downright impossible.</p>
<p>It’s true that a lot of the material is organized, but what I fear though is the fracturing of nearby scientific fields. If everyone is working in their own “mini sphere” of science, then it can be difficult to propagate a discovery from one part of the boundary to another (let alone in different fields!). This suggests researchers will “waste time” coming up with ideas that other scientists have already discovered, but haven’t reached their field yet.</p>
<p>Plus, if scientists could <em>know</em> this information, than chances are they could make new connections and generate new knowledge. This is what the inner bubble is referencing. We have a lot of scientific knowledge, but at any one time, we aren’t necessarily aware of it all. So while our potential is the blue boundary, our current reality is the black one.</p>
<hr />
<p>That’s one problem. There’s another one though: As time goes on, the horizons of the field (the outer bubble) gets further and further away. This has led me to <a href="https://handwaving.net/143">joke</a> that there will eventually be an “event horizon” of sorts where a new scientist won’t be able to contribute to science because the boundary is further away than they can ever get to within their lifetime.</p>
<p>Fortunately, we have one way to manage this: Education. By training budding youngsters in school, we are able to give them a fast ride to a place much closer to the boundary of the bubble.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1613918774/Blog/Education.png" alt="Education provides a boost towards the boundary." /></p>
<p>But this gap between where you end up after your education and where you want to go as a scientist is difficult to traverse. The main reason is that the resources available dwindle to none as you make your way to the boundary (See Reference 1). That’s to be expected. After all, there are fewer people there! The problem though is that the burden for creating resources falls on the experts, the ones who are already at the bubble’s boundary. But many of them don’t want to spend time on this. Instead, they are drawn to the next adventure. What they leave behind is then a relic of their progress, but it’s often rough, and left to others to make sense of it.</p>
<p>It’s this roughness that concerns me as a scientist. If we want to ensure steady progress in science, we need to make sure people are able to get to the boundary. Unfortunately, few take the time to put all of the work together into one big piece that summarizes what has been done.</p>
<p>In essence, we’re all poking at the bubble around the same reason, unaware (or at least, not fully understanding of) the other work that has been done.</p>
<p>To fix this, I want to propose an entirely new class of scientist. Currently, being a scientist is more or less synonymous with being a researcher. That’s fine, but I think the accumulation of scientific knowledge is going to force us to develop another type of scientist.</p>
<p>If the traditional scientist is an <em>explorer</em>, then this new type of scientist is an <em>absorber</em>.</p>
<h2 id="absorbers">Absorbers</h2>
<p>What does an absorber do, and how do they contribute to science?</p>
<p>An absorber is someone who isn’t satisfied with the state of our scientific knowledge. In particular, they feel like there are too many papers and not enough <em>synthesis</em> of ideas. If everyone goes in their own direction, we might make some progress, but imagine how much more we could make if we made a more directed effort as a collective. This is the motivation of the absorber.</p>
<p>To get there, an absorber isn’t driven by discovering new things. Instead, they are driven by understanding the knowledge we have now. On its head, this sounds kind of silly. If someone has figured out a result, then surely we understand it? While that’s true to a certain degree, the absorber sees the opportunity in connecting that result to others<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. Remember, the fields we have in science are a human construct to aid categorization. In reality, things are much more connected and entangled than we give them credit for. An absorber sees this as an opportunity for generating new knowledge from the knowledge we have already.</p>
<p>But more than this, an absorber seeks to understand a field and its material. This is the crucial difference between an explorer and an absorber. Once the explorer discovers something, they move on to the next thing. The absorber is the one that makes the second pass on the material the explorer has moved on from and synthesizes it so that others can come along and get up to speed more quickly.</p>
<p>This benefits scientists in two ways:</p>
<p>First, it makes getting to the boundary of scientific knowledge easier. If an absorber spends time really understanding something, they are able to write about it, producing the resources necessary for the next generation of explorers to march towards the boundary. This would relieve the issue of not having enough resources near the boundary. An absorber’s main job would be taking these discoveries, absorbing them, and then sharing that understanding to the wider scientific community<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>
<p>Second, humanity’s knowledge of the scientific literature grows as a result of having absorbers. That’s because of the simple fact that the scientific literature isn’t composed of neatly-stacked rows of books and information, but is more a <em>pêle-mêle</em> of observations, anecdotes, and curiosities. By having absorbers whose job is to comb through the literature, understand large swaths of it, and share that knowledge to other scientists, the possibility for discovery among the ideas we already know grows.</p>
<p>Just think about this tantalizing possibility: How many scientific insights are hiding within the literature right now, simply waiting for someone to come across two disparate pieces of information and connect the dots? My guess is that there are many such discoveries waiting to happen, and an absorber would be primed for helping make them.</p>
<h2 id="the-status-of-an-absorber">The status of an absorber</h2>
<p>To take this idea of a second type of scientist seriously, there needs to be some incentive structure for them. We give grants to explorers to uncover new insights, but what can we do to incentivize the absorber?</p>
<p>Here, I want to connect another problem in science that an absorber could solve: peer review. Simply put, peer review in science has lots of problems, the first of which is that scientists do this on a more-or-less volunteer basis (see Reference 2). That’s workable, but it does introduce plenty of work for scientists who are really explorers. They don’t want to spend time reviewing papers when the next discovery could be made.</p>
<p>An absorber, on the other hand, would be the perfect person for the job. Because their role is to read a lot of the scientific literature and understand the cutting-edge science on a deep level, they would be well-suited for reviewing papers. An absorber knows the work that has been done before, and can situate the space the new work fills.</p>
<p>This is how I imagine a lot of the funding for an absorber would come from. They would be (mainly) paid as professional peer reviewers<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>, and this would allow them to spend time absorbing the material of a field.</p>
<p>To be clear, an absorber is a very different type of scientist than an explorer. Both deserve to be called scientists. I’m arguing that we need to let go of the notion that a scientist will be both a master absorber and explorer. As our bubble of scientific knowledge grows, it becomes more difficult to wrap your head around even a small bit of the literature while still making discoveries. Absorbers would relieve this burden.</p>
<p>I think it would take time for the status of an absorber to match that of an explorer. But it’s crucial that we don’t shovel people we think “aren’t good enough” to be an explorer into an absorber role. That would be the wrong approach. I envision it as something quite different. If you’re set on being an explorer and it turns out that the career doesn’t work out, being an absorber is not a “fall back” option. Instead, it’s an entirely different track.</p>
<p>That being said, I still think there are some scientists who could do both. But professionally, this distinction highlights the growing needs of scientists in an age where the amount of stuff we know is expanding at a lightning-fast rate.</p>
<h2 id="tools-of-an-absorber">Tools of an absorber</h2>
<p>My vision of an absorber is a cross between a librarian and a teacher. An absorber is a librarian in the sense that they will have a good idea of the work being done in a specific research field. If you come to them and ask for recommendations on what to read, they will be able to give you several directions to start.</p>
<p>An absorber is a teacher in the sense that they are able to explain and teach the main ideas of cutting-edge research to others who want to learn. This requires the difficult work of reading a paper, understanding the new ideas, and then explaining them to others in a pedagogical way. The word “pedagogical” is crucial here. If we want an absorber to succeed as a teacher, they need to transform the rough insights of an explorer into steps for the curious learner to follow.</p>
<p>For this latter part, I think tools other than the PDF/paper will be necessary. To be concrete, I’m thinking of examples like <a href="https://distill.pub/about/">Distill</a>, which focuses on machine learning research and explains it using web technologies that really allow the curious individual to grasp the concepts.</p>
<p>An absorber would be well-versed in programming/designing these experiences. Whether it’s in a <a href="https://jupyter.org/">Jupyter</a> or <a href="https://colab.research.google.com/">Colab</a> notebook, whether it’s using animation software like <a href="https://github.com/3b1b/manim">manim</a>, or whether it’s using <a href="https://www.math3ma.com/blog/understanding-entanglement-with-svd">hand-drawn sketches and words</a> (this is more old-school, like I use here), the goal is to facilitate sharing research ideas to those looking to learn.</p>
<hr />
<p>I do have a bias here: As a PhD student, I feel this chasm between the resources found in textbooks for “established” ideas and the lack of resources for anything else. But I don’t think that graduate students are the only people who would benefit from this. Heck, how are we supposed to encourage explorers to be interdisciplinary if they can’t easily jump from one boundary to another? Sure, there’s a need to know the fundamentals, but I think we also have to acknowledge that educational resources on the cutting-edge are lacking.</p>
<h2 id="does-anyone-really-want-to-be-an-absorber">Does anyone really want to be an absorber?</h2>
<p>And here’s the question that matters. If nobody is interested in becoming an absorber, then this idea is a non-starter.</p>
<p>I don’t know if it’s clear from reading this (and the rest of my essays), but being an absorber is something I can get behind. Being an explorer is great, but explaining research in a clear way is an art and a skill. One that I’m always working on.</p>
<p>I also think having these separate roles would help build scientific groups that have a variety of expertise. From my experience, collaboration is the way science is done now. As such, a group can really benefit from having people with specialized skillsets.</p>
<p>Where would we get the funding for this whole new type of scientist? I don’t know. At some level, it would take away from the funding given to explorers. But I think time is only going to make this a more pressing need: The literature is growing so fast that in order to “master” a field, you need to define it more and more narrowly. This means an expert won’t know X, but a sub-sub-sub-sub-component of X. What an absorber would do is bring a higher-level perspective to a scientific community, lightening the load that explorers must shoulder now.</p>
<hr />
<p>Science is changing. There are more scientists than ever before, and we are all poking at the boundary of scientific knowledge. To make sure we do it in a way that allows us to <em>retain</em> that knowledge (and not have it be buried in the literature), absorbers would be the main players to read the literature, understand things, and then help point the explorers in a promising direction.</p>
<p>Plus, absorbers would take the role of teachers of the cutting-edge. Using our best tools available, they would bring the insights of explorers to the other scientists, giving everyone a boost towards the boundary.</p>
<p>And finally, absorbers would be the new gatekeepers in peer review. Instead of relying on explorers acting on a sense of duty, absorbers would take on the task of peer review, since this will help them build up a sense of a field at the same time.</p>
<p>My proposal here is only a rough view of what <em>could</em> work. The point isn’t to follow this idea exactly. Instead, it’s about highlighting a broader truth: <strong>Contributing to science doesn’t have to only be done by explorers.</strong> Instead, there’s a large role to be played by the absorbers, those who delight in understanding science and sharing it with others.</p>
<p>The explorers may be the first to a topic, but the absorbers are the ones who make it understandable to the rest of us scientists.</p>
<h2 id="references">References</h2>
<ol>
<li>I wrote about this idea of the resources available to curious people dwindling as you go to the cutting-edge in a post for <a href="https://errantscience.com/blog/2018/09/24/taming-the-literature/">ErrantScience</a>. I described the scientific literature as a “jungle”, and I still think this is quite apt.</li>
<li>I won’t pretend I know all of the intricacies of peer review and its problems. But from what I’ve understood through reading other scientists, volunteering to review papers can take a lot of time that is not spent on building up your career. See <a href="https://theconversation.com/peer-review-has-some-problems-but-the-science-community-is-working-on-it-99596">this article</a> by <a href="https://research.monash.edu/en/persons/jessica-borger">Jessica Borger</a> (specifically, the section “Peer review relies on volunteers”) and <a href="http://blog.ametsoc.org/ams/the-volunteer-power-behind-peer-review/">this one</a> by <a href="https://people.envsci.rutgers.edu/broccoli/">Tony Broccoli</a>.</li>
</ol>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>I first came across this metaphor from <a href="http://matt.might.net/articles/phd-school-in-pictures/">Matt Might</a>, who described doing a PhD in a similar way. I don’t think this is limited to the PhD, so I adapted it for this illustration. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>I’d put it this way: If only one scientist understands a result, the net gain to science is not very big. But if we can get a <em>lot</em> of scientists to understand a result, we get a boost in potential new directions to search. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>My thinking on this has a lot of parallels to scientific outreach. Except in this case, it’s more like scientific <em>in</em>reach. The goal here is to produce more resources on cutting-edge science than the papers which announce a discovery. Because, let’s face it, those are not often the kind of material you want to dive into when first learning about the field. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>There is a discussion about how being paid for doing peer review means you won’t be unbiased (for example, since your identity as a peer reviewer will probably be known). I think there are valid points we can bring up regarding issues with payment and how the incentives could spur peer reviewers to reject/accept more papers than we would expect, but I also think we could overcome these with suitable protocols. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Thu, 25 Feb 2021 00:00:00 +0000
https://cotejer.github.io//absorbers-and-explorers
https://cotejer.github.io//absorbers-and-explorersVirtual CNWW<p>If you want to learn a topic today, the resources are much more plentiful than even a few decades ago. The internet has given us wonderful resources to learn from, including some which leverage internet technologies to provide animations and teach topics in a <a href="https://distill.pub/2020/communicating-with-interactive-articles/">much more interactive way</a>. This is particularly true for mathematics and physics, which have been entrenched in dry textbooks that are a chore to read<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> for much too long.</p>
<p>The tools are there, and access isn’t unlimited, but the barrier has been lowered. However, when it comes to doing science and mathematics, there’s more than just the sheer knowledge. If you want to enjoy your time as a scientist, building up a community of people you can throw ideas around with and trust is key. As I wrote back in my <a href="https://cotejer.github.io/psion">“PSIon” essay</a>, the most important thing I got from my experience as a master’s student at Perimeter Institute was the cohort of people. The friends I made were worth more than any other part. Sure, I learned a lot of physics, but the experience was made all the better by the friends I made.</p>
<hr />
<p>I’ve written about my experience as a physicist, as well as different ideas within the <a href="https://cotejer.github.io/pick-a-state">world of quantum</a>. But I’m not married to physics. While it’s the discipline I’ve chosen to focus on for now, I’m fascinated by many questions in science. The toolkit of physics is one I like, but I thought it would be a shame to go through my PhD focusing exclusively on quantum theory and never venturing into any other areas. Even if they don’t serve my “primary” direction, dipping my toes into other fields provides a way to gain a new perspective, learn of new big things happening in science, and frankly get out of the thought bubble of a field. When I was at Perimeter Institute, it was a lovely place to think about physics, but this also meant a lot of the topics were the the same. I’d hear constant talk about quantum theory, condensed matter, mathematical physics, and cosmology and astrophysics. All fascinating topics, but clearly not all there is to know about science.</p>
<p>And so it was with this motivation that I decided to apply to the <a href="https://vermontcomplexsystems.org/events/cnww/">Complex Networks Winter Workshop</a><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, which is a joint effort by the <a href="https://vermontcomplexsystems.org/">University of Vermont Complex Systems Center</a> and the <a href="https://sentinelnorth.ulaval.ca/en">Sentinel North program from the Université de Laval</a>. For this installment, the workshop was online, but the theme was the same: two weeks of exploratory projects on anything to do with networks and complex systems.</p>
<p><img src="https://vermontcomplexsystems.org/events/cnww/files/stacks-image-c8a0f4c-450x388@2x.png" alt="" /></p>
<p>If you’re reading this and wondering, “What on Earth is a quantum theory student like yourself doing in a workshop like <em>that</em>?”, well, I asked myself the same question. I wasn’t sure if I would be able to even contribute anything. The experience I had with anything networks-related was limited to two things: a graph theory course I took during my undergraduate degree, and a bit of work on tensor networks that I began last semester in my research. But apart from that, I had basically no experience.</p>
<p>In fact, I suspect I felt like my friend did when we started studying at Perimeter Institute together last year. It was a theoretical physics program, but he was coming into it with an engineering background. I imagine he must have had similar thoughts, though probably amplified since the program was a year long instead of only two weeks. (And if he could do it, I told myself that I could too.)</p>
<p>Nonetheless, if I want to say that I’m a scientist who keeps an open mind and looks at a bunch of areas, I couldn’t exactly say no to this chance. So, while feeling a bit out of place, I signed up for the workshop and got myself ready for two weeks of a lot of learning.</p>
<h2 id="projects">Projects</h2>
<p>The main part of CNWW is the projects. This year, the projects included many themes: renormalization of networks, animal networks, spread of contagion, roads, skiing (this one was mine), financial networks, soccer networks, political networks, collaboration networks, and social networks. Seriously, the sky was the limit with the projects.</p>
<p>The first week is one of plenty of brainstorming to get an idea and form a group. This meant hopping on calls with many people to get a flavour of what was going on, and finding something that struck your fancy.</p>
<p>As a runner and general outdoor enthusiast myself, I was drawn to the sport-oriented projects. My goal was also to do something quite different than what I’ve been looking at during my PhD, so choosing to look at skiing was a great fit.</p>
<p>Because there isn’t a ton of time, you can’t expect to get a super-ambitious project done. I went in without too many expectations. In fact, it was the same attitude I entered the Winter School during my year at PSI with: learn some new things, enjoy the time, and don’t care too much about turning a project into an instant research paper. I find that this is a healthy attitude for a short workshop like this one, particularly since it was way outside of my comfort zone.</p>
<p>Then, two weeks after the start of CNWW, we had a marathon Zoom session on Saturday to listen to everyone present their projects.</p>
<p>What I was struck by was the energy that accompanied each project. Instead of focusing on having perfect analyses, the projects were more exploratory, trying to tease apart questions that came up during the two weeks. There was an air of “lightness” to the day that was much different than giving a presentation where it feels like everyone is judging you. Here, it really was a chance to get back into doing science for the reason most of us started out with: because it’s fun. This was something the mentors<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> for CNWW repeated to us, and I definitely noticed it throughout the two weeks.</p>
<p>It was also very collaborative. Since people weren’t all experts in their projects, it was easy to offer feedback and ask questions that poked at new directions. Even I found myself asking questions at times, despite having no underlying baggage of knowledge to lean on. I’d say that the atmosphere was always inviting, and despite being an outsider to the field, I didn’t feel like I was falling behind and completely confused at all moments. That in itself was great.</p>
<h2 id="lessons">Lessons</h2>
<p>What did I learn at CNWW?</p>
<p>The most prominent thought I was left with is that <em>getting</em> data to answer a question is so tricky. It left me with an appreciation of collecting data and cleaning it. It’s great to just say, “Let’s use some data on X and build a network to analyze!”, but doing the work of massaging that data into forms you can work with (and that make sense for the problem!) takes a lot of time.</p>
<p>The other aspect is linking data to your questions. During CNWW, I had so many questions posed by and to me, and this left me with a dizzying array of directions to pursue. The catch is that actually <em>answering</em> a question with data is not an easy matter! In the “real world”, data doesn’t point only towards answering a particular question, and so tradeoffs have to be made.</p>
<p>I think of this mostly in the context of how I work as a theoretical physicist. I work with simulations, which is a roundabout way of saying that I craft models that conform <em>specifically</em> to the questions that I want to answer. This means there’s little ambiguity between the data I receive and the analysis I can do. The data is built to answer my question.</p>
<p>But during CNWW, I developed an appreciation for those that tried to take a networks approach to systems in the world. Because a network is an abstract object, you have to find a way to layer it on top of the system you’re interested in. But just like you can’t take a sphere and flatten it out on the ground without creating bumps, you lose something when you go from your system to a network. The goal is always to minimize that loss, but you have to be aware of that it rarely goes to zero. Losing something in translation is common.</p>
<p>This question comes to the fore as soon as you start asking what the network representation of your system even <em>is</em>. A graph is a fairly simple mathematical object: a set of edges and nodes. But this simplicity means you’re making tradeoffs each time you model a system in the world as a network. Unless you’re working on abstract computer science problems like myself where the graph <em>is</em> the system, you need to worry about how this embedding affects your ability to answer the questions you want.</p>
<p>The other lesson I learned was that there are a <em>lot</em> of questions you can ask, but really honing in on them and making them specific is tricky. This is a variant of something I encountered while working on some machine learning projects. When you train your model, there are various knobs that the algorithm doesn’t adjust on its own. These are called “hyperparameters”, and they are set by you, the person designing the model. So how do you decide? Well… it’s not always transparent. You can try an exhaustive search over the possible values, but because they are often continuous and there are several of them, the multiplication rule of combinatorics isn’t in your favour. And that’s not even looking at strategies where some of these hyperparameters <em>change</em> as a function of the training time!</p>
<p>I encountered this with the people studying real life networks. Yes, they could map their system onto a network and answer questions about the dynamics, but whose to say that there aren’t outside factors which the network doesn’t consider? And what about causality? These are all thorny questions, but nobody at CNWW hid from them. These are limitations of the field, and I did like the candor of admitting this.</p>
<h2 id="lectures">Lectures</h2>
<p>In addition to the projects, the mentors at CNWW gave lectures throughout the two weeks. These were on a bunch of topics, since the mentors all study different niches of network science. These ranged from animal populations to infectious disease to human relationships to what it means to even <em>have</em> a network. The diversity was something I really enjoyed, and it meant that I could get a taste of many different areas of network science.</p>
<p>I probably enjoyed those on the theory of networks the most, given by <a href="https://bagrow.com/pdf/bagrow_working-with-network-data-2021.pdf">James Bagrow</a>, <a href="https://www.jgyoung.ca/">Jean-Gabrielle Young</a>, and <a href="https://larremorelab.github.io/">Dan Larremore</a>. These really got to the heart of networks for me. The applied networks presentations were really good too, though admittedly some hit the mark of my interests more than others.</p>
<h2 id="being-virtual-and-doing-science">Being Virtual and Doing Science</h2>
<p>Like so many things today, CNWW was virtual. This meant we had many calls, and long sessions working with others on our projects.</p>
<p>Not only was this my first workshop/conference, it was also one of the few online events I’ve done (since I don’t take classes anymore as a PhD student). So I didn’t know what to expect going into this, but on the whole I think it went well. There’s an adjustment to be made with virtual events, but I think the work done can be just as good.</p>
<p>My one point of comparison is with the Winter School I participated in during PSI. There, we spent a week at a lodge doing intense work during the day. There were similar group projects, and I found that virtual CNWW was really good for getting people to work together. Despite some awkwardness of meeting people online, everyone was excited to learn and do some network science.</p>
<p>I think the biggest thing you need in <em>any</em> sort of workshop is enrollment: How invested are people? I think that if you have the investment from everyone, it doesn’t matter if the workshop is online or in-person. The key is to get people to show up and willing to put in the work, and that was something I felt for all of CNWW.</p>
<p>The effort each person put into the workshop really shone on the last day, where each group presented their work. It was nice to get a feel for what everyone else was doing. In an online setting, you don’t have the ease of bumping into others, so I was more or less in the dark about the other projects until the end.</p>
<p>Moving forward, I really think this kind of format can work well. Like I said, it requires investment on the part of everyone. If people don’t show up and put in the effort, any workshop will fail. But with CNWW, everyone was bursting with energy to go.</p>
<p>CNWW could have been devoid of energy. This is, unfortunately, the feeling I get during so many online meetings. We try to shoehorn an in-person interaction into the online realm and things go badly. But here, people were always energetic and ready to talk science. Each time I logged on to a call, instead of feeling a sense of, “This is going to be a boring meeting”, I felt excited for what’s next. I think part of this is because <em>everything</em> was basically new to me, but I also think the organizers of CNWW did a great job of building this atmosphere directly into the two weeks. I’m fairly sure this wouldn’t just happen organically.</p>
<p>It also helped that there were several cultural activities throughout the two weeks. Each morning and evening had a happy hour where different themes were presented. There was also a presentation on ice canoeing, as well as a few other challenges to the add to the workshop. I thought these were nice additions, and did make it feel much more than “just” a science workshop.</p>
<hr />
<p>Despite this being my first time learning about network science, I’m really hoping it won’t be the last. The experience of the workshop itself was great, and it’s one I’ll remember. In particular, I want to thank the two organizers who made the workshop happen: <a href="https://juniperlovato.com/">Juniper Lovato</a> and <a href="https://twitter.com/MFGevry">Marie-France Gévry</a>. Another thank you to those who showed up every day to lead the meetings, including <a href="https://antoineallard.github.io/">Antoine Allard</a>, <a href="https://laurenthebertdufresne.github.io/">Laurent Hébert-Dufresne</a>, and <a href="https://www.jgyoung.ca/">Jean-Gabriel Young</a>. The workshop went so well <em>because</em> of the time you all put into it before CNWW even started, and throughout the two weeks. I think I can speak for all of the participants in saying we really appreciated it.</p>
<p>When you’re online, it’s easy to get lost and bogged down by the tools and the various platforms needed to connect to everyone. For CNWW, the technologies used were Zoom, Slack, and Whereby, as well as a central website that had everything presented in an easy-to-see way. It sounds silly, but these details really do matter, and setting them up takes time. I think the team here did a great job at making sure that everything went on without much difficulty (even when Slack went down worldwide on Day One of CNWW!).</p>
<p>As for myself, I’m hoping to continue working on the project I started with my team. Two weeks is only enough to start figuring out what kind of questions to ask, so there’s a lot more to go. However, like I mentioned at the beginning, the point of the workshop for me was always about gaining new perspectives in science and meeting other scientists. And in that regard, I absolutely succeeded.</p>
<h2 id="references">References</h2>
<ol>
<li>The team behind CNWW was no stranger to virtual conferences either. In fact, if you want to read more, Juniper Lovato has <a href="https://medium.com/@juniper.lovato/a-how-to-reflections-on-planning-virtual-science-conferences-eeb754ed404b">written a piece on organizing virtual science conferences</a>, so you may want to check out that essay if you’re interested in the details.</li>
</ol>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Though there <em>are</em> some books which are a delight to read. I’m not against the medium, but I am against limiting ourselves to it. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>The workshop is called CNWW, but because of the Québec/Canada connection (and, let’s face it, the tendency of scientists to give groan-inducing acronyms and names to their projects), it’s pronounced “canoe”. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>While there were mentors and participants at CNWW, the distinction is meant to be blurry. What I mean by this is that the mentors would often help (and present) projects, giving their time just like the participants to work on them. In that regard, the workshop was very collaborative. There wasn’t much of a top-down structure, but the mentors did provide times to chat on ideas, and they gave lectures. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Mon, 25 Jan 2021 00:00:00 +0000
https://cotejer.github.io//virtual-cnww
https://cotejer.github.io//virtual-cnwwPick a State, Any State<p>Hilbert space is big.</p>
<p>No, not <em>big</em> like the how the Earth is big compared to you. Rather, Hilbert space is <em>astronomically</em> big. Actually, that’s not quite right either. It’s bigger than that. I guess the best adverb I can use is that it’s <em>mathematically</em> big. In a Hilbert space, you tend to have a lot of room to maneuver. (To read more about that, check out my essay, <a href="https://cotejer.github.io/curse-of-dimensionality">“The Curse of Dimensionality”</a>.)</p>
<p>In the <a href="https://cotejer.github.io/all-in-the-corners">last essay</a>, we saw how to pick a random unit vector in a <em>N</em> dimensional space. This wasn’t for nothing, and now we are going to put that knowledge to use to ask the following question:</p>
<p>If I have a line of <em>N</em> qubits and randomly pick a state from the Hilbert space, what kind of entanglement will it have?</p>
<!--more-->
<p>The essence of this question revolves around the idea of trying to capture what a quantum state is like. The reason we care is because some quantum states are much easier to engineer than others. For example, if we have the state $|0\rangle |0\rangle$, it’s much easier to prepare than the state $\frac{1}{2} \left(|0\rangle|0\rangle + |1\rangle|1\rangle \right)$. This is because the latter state is <em>entangled</em>, and generating purposeful entanglement can be tricky.</p>
<p>Plus, simulating quantum systems accurately means you have to track the exponential number of states it can be in, and entanglement is partly the reason why you can’t only consider a few of those exponentially many states. As such, it’s worth thinking about what the average case looks like.</p>
<p>Not only that, but this is the kind of question a physicist likes to ask. We don’t tend to study quantum states but quantum <em>systems</em>, which effectively means we are studying at the Hilbert space level. For the purposes of this essay, think of Hilbert space as a bag which contains a bunch of balls, each one a quantum state. Than, our question from before becomes: If I pick a ball from the Hilbert space bag, how much entanglement can I expect it to have?</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/v1608660324/Blog/HilbertBagShadow.png" alt="" /></p>
<p>This question will make us cover a lot of ground. We’ll see what to use as a measure of entanglement, we will talk about one of the most useful tools in linear algebra that quantum physicists use all the time, and we will (naturally) make a few approximations to help smooth our way. It promises to be a fun ride, filled with plenty of fun mathematical morsels to ponder.</p>
<p>Let’s go!</p>
<h2 id="a-quantum-cut">A Quantum Cut</h2>
<p>If we want to quantify entanglement, we need to first talk about how to partition quantum states. In particular, we’re going to look at the simplest partitioning scheme: cutting a quantum state into two parts (called a <em>bipartition</em>).</p>
<p>The idea is straightforward. Imagine we have a quantum state over four qubits:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/v1608660324/Blog/4Qubits.png" alt="" /></p>
<p>Then, a bipartition of the quantum state corresponds to inserting a “cut” so that the qubits are separated into two groups (I’ve shown multiple ways here):</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/v1608660324/Blog/4QubitsCut.png" alt="" /></p>
<p>Do we have to cut the states in the middle? No! But in our case, we will because it seems the conceptually easiest to do when we’re talking about average cases. But in principle you could cut the state anywhere you want, and this would give you a different bipartition of the system into parts A and B. You could even go crazy and group qubits that aren’t adjacent, though that’s beyond what we will do.</p>
<p>How does this get reflected in the mathematics?</p>
<p>Let’s start with a our quantum state over four qubits. Then, if I write the state of each qubit in the computational basis, the whole state can be written as:
\(|\psi\rangle = \sum_{i,j,k,l} c_{ijkl} |i\rangle |j\rangle |k\rangle |l\rangle.\)
The tensor $c_{ijkl}$ holds all of the coefficients we need for the quantum system. If you’re not used to this notation, it just means that there’s a coefficient for every combination of $i,j,k,l$. But the problem is that, written this way, we <em>can’t</em> cut things in a clean way such that the $i$ and $j$ part become one factor, and the $k$ and $l$ part become another. We would need more information about the quantum state to do this. In other words, we can’t perform the following “factoring” of the sums:
\(\sum_{i,j,k,l} c_{ijkl} \nrightarrow \sum_{i,j} a_{ij} \sum_{k,l} b_{kl}.\)
For example, if we had the state $|\psi\rangle = |0\rangle |1\rangle |1\rangle |0\rangle$, then the only nonzero coefficient would be $c_{0110} = 1$, so we could then actually break up the system into $|\psi\rangle = |A\rangle |B\rangle$, where $|A\rangle = |0\rangle |1\rangle$ (over the first two qubits) and $|B\rangle = |1\rangle |0\rangle$ (over the final two qubits).</p>
<p>The point here is that we can cut our system into two parts, A and B, and there’s only <em>one</em> term. Notice that I did not write a sum for the state in terms of A and B. That’s key, and it’s what will let us define entanglement.</p>
<p>But let’s back up. If we have the state given by the coefficient tensor $c_{ijkl}$, then can we still write the sum in terms of a system A and a system B?</p>
<p>The answer is yes! The price we pay though is that we might not get only one term for our result.</p>
<p>The tool we need to cut our system into two parts is called the Schmidt decomposition. This is the tool I go back to again and again in my research, because it forms the bedrock of how I classify quantum states. We won’t prove the Schmidt decomposition in the essay (perhaps another one in the future), but instead I’ll give you the idea. If you want to know more, I <em>highly</em> recommend looking at Tai-Danae Bradley’s work in Reference 1.</p>
<p>The Schmidt decomposition gives us a way to “split” a quantum system into two parts, in such a way that the coefficients only depend on a shared index. If we’re dealing with a bipartite system (two parts), then this means the coefficient matrix is <em>diagonal</em>.</p>
<p>Take a state on two qubits, which we can write as:
\(|\psi\rangle = \sum_{i,j} c_{ij} |i\rangle |j\rangle.\)
Notice that the coefficients depend on both $i$ and $j$. Moreover, the whole state involves two sums. What the Schmidt decomposition will do is reduce our state to something that looks like this:
\(|\psi\rangle = \sum_{a} c_{aa} |a\rangle_1 |a\rangle_2.\)
Now, the sum has only one index to go over ($a$), and the coefficient matrix is diagonal.</p>
<p>Why is this useful? It has to do with the fact that we can quantify entanglement with these values. They are the eigenvalues of our system, and are also called the Schmidt values.</p>
<p>In fact, the idea of calling a quantum state entangled rests precisely on this summation over $a$. If the index only takes one value, then we say the state is <em>not</em> entangled. However, if the index takes more than one value, the state is then <em>entangled</em><sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>We often don’t write something like $c_{aa}$. Instead, we just give it one index, so it would like something like $c_a$. This emphasizes that it’s not an index of either system in particular, but in fact something that characterizes both.</p>
<p>In terms of <em>performing</em> the Schmidt decomposition, the proof is actually constructive. We follow an algorithm that amounts to finding the eigenvalues of the system (with a <a href="https://en.wikipedia.org/wiki/Singular_value_decomposition">singular value decomposition</a> of the form $udv$, which you can also look at Reference 1 to learn about), and then rotating the quantum state basis vectors with the matrices $u$ and $v$, while the matrix $d$ (a diagonal matrix) holds the Schmidt values.</p>
<p>Once we decompose our system like this, we now have an easy way to calculate the amount of entanglement in the state.</p>
<h2 id="quantifying-entanglement">Quantifying entanglement</h2>
<p>Before we even start solving our problem, it’s worth thinking about what I mean when I say “the entanglement of a quantum state”. After all, quantum states don’t come with little price tags that announce how much entanglement they have! Rather, we need to come up with a notion for how entangled a quantum state is.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/v1608661754/Blog/PriceTag.png" alt="" /></p>
<p>There’s a lot of literature on this question (and a bunch of associated measures), but we will just look at one of the simplest ones: the Schmidt values.</p>
<p>As we saw above, if we have a quantum system we can partition it into two systems, perform the Schmidt decomposition, and end up with something that looks like (where we split the system into parts A and B):
\(|\psi\rangle_{AB} = \sum_{a} \lambda_{a} |a\rangle_A |a\rangle_B.\)
One of the reasons we like this decomposition is that it gives us a way to easily calculate what’s called the “entanglement entropy” (or “<a href="https://en.wikipedia.org/wiki/Entropy_of_entanglement">entropy of entanglement</a>”). It’s a measure that works for a pure state, which is what we have above. If we then consider the density matrix given by $\rho_{AB} = |\psi\rangle \langle \psi|$, This quantity is calculated using the following equation:
\(S(\rho_A) = -\text{Tr}\left( \rho_A \ln \rho_A \right).\)
Here, $\rho_A$ is the density matrix you get from doing an operation called “tracing out” system B. This gives us an idea of the state only on A, and if there was any entanglement in the original AB system, it will show up here.</p>
<p>You might be thinking that it’s awfully weird to calculate the entanglement entropy of $|\psi\rangle_{AB}$ using only half of the system, but the above definition is actually symmetric<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. In other words, $S(\rho_A) = S(\rho_B)$. In fact, because we have our state in the Schmidt decomposed form, we can do a bit of algebraic magic to eventually end up with:
\(S(\rho_A) = - \sum_a \lambda_a^2 \ln(\lambda_a^2).\)</p>
<p>In a future essay, we’re going to look at a different kind of entanglement that can arise. In this case, the quantum state’s entanglement entropy isn’t related to its volume, but its area. This is usually called the “area law”, but as we will see later on, it’s probably better to call it a “boundary law”. In either case, this will let us study systems whose growth of entanglement doesn’t grow with system size, giving us hope for studying these systems in more depth.</p>
<p>But that’s for another essay! For us, we will see that picking a <em>random</em> quantum state gives us entanglement that grows with the system size.</p>
<h2 id="volume-law">Volume law</h2>
<p>When we look at the entanglement entropy of one of these random quantum state (on two different Hilbert spaces of size $m$ and $n$, with $m\leq n$, the average entropy scales as:
\(\langle S(\rho_A) \rangle \approx \ln m - \frac{m}{2n}.\)
I’ve included some references at the end that go into this result (see References 2 and 3). To show the full-blown calculation will take us <em>way</em> outside what I want us to explore today. Instead, I want to give you a taste for why an expression like this makes sense. To do that, we’ll play the usual game of a physicist who wants to show something without a care for total rigour: handwaving and appeals to idealizations. Don’t worry though, the steps will be instructive.</p>
<p>If we start by writing out our random quantum state, it will look like what we had before:
\(|\psi\rangle = \sum_{i}^m \sum_{j}^n c_{ij} |i\rangle |j\rangle.\)
Since we’re picking random unit vectors, we know from <a href="https://cotejer.github.io/all-in-the-corners">“All in the Corners”</a> that we want to sample from a unit hypersphere. As a probability distribution, this means we want our coefficients to follow (where <em>c</em> is all of our coefficients):
\(P(c) \sim \delta \left( \sum_i^m \sum_j^n \lvert c_{ij} \rvert^2 - 1 \right).\)
Okay, this might not be the most transparent of expressions, so let’s walk through what it means.</p>
<p>The term inside of the parentheses measures the difference between the sum of the squares of the coefficients from 1. Remember that this double sum is just computing the “radius” of the vector <em>c</em> which describes our coefficients. This is exactly what we looked at in the previous essay, where we wanted the sums of squares to be equal to 1. So the expression in the parentheses is measuring how close we are to 1.</p>
<p>Then, the $\delta$ is the symbol used for the Dirac delta distribution. Sweeping a bunch of details under the rug, the Dirac delta distribution is 1 if the argument in the parentheses is 0, and is zero everywhere else. This effectively means we “pick out” the vector <em>c</em> such that our coefficients are normalized to one. Everything else is not allowed<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>
<p>Finally, we don’t have a strict equality because there may be a normalization factor. However, we won’t worry about it because we will switch to a different distribution soon.</p>
<p>While compact, this expression is only nice to work with under certain circumstances. This isn’t one of those times.</p>
<p>Remember, the end goal is to estimate the average entanglement entropy of our two-system state, $\langle S(\rho_A) \rangle$. This means we need a probability distribution for the states to follow, and this is what our Dirac delta distribution above is for us.</p>
<p>If we use the expression for the entanglement entropy I wrote above, the average is computed as:
\(\langle S(\rho_A) \rangle
= - \left\langle \sum_i^m \lambda_i^2 \ln \lambda_i^2 \right\rangle.\)
This expression is fine, but it hides something tricky. The angular brackets mean we have to take the expression within and evaluate it against a probability distribution. In terms of notation, we have:
\(\left\langle \cdot \right\rangle \equiv \int \left( \cdot \right)P dP.\)
Here, $P$ is our probability distribution. Also, while I’ve written an integral for our expression here, that’s not the only possibility. It could be a sum if we’re dealing with discrete objects. Since the possible coefficients are continuous though, the integral is appropriate.</p>
<p>So what is our probability distribution? It looks like we could just use our equation for $P(c)$, but the problem is that our entropy is expressed in terms of the Schmidt values <sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> $\lambda_i$, while our probability distribution is expressed in terms of $c_{ij}$. We will have to deal with this in some manner, but let’s leave it be for now.</p>
<p>We can start by writing the Schmidt values as:
\(\lambda_i^2 = \frac{1}{m} + \delta_i, \,\,\, \delta_i \in \mathbb{R}.\)
This might not look really useful, but there are two things to note here. First, this decomposition is of the form “constant” + “fluctuation”. In fact, the constant part $1/m$ is simply the contribution we would get if the state was uniform (which means maximum entropy). The upshot is that in our expression for the logarithm, we will be able to expand things to get a constant part and a fluctuating part.</p>
<p>Second, because our quantum state needs to be normalized, we have the following implication:
\(\sum_i^m \lambda_i^2 = 1 \rightarrow \sum_i^m\frac{1}{m} + \sum_i^m \delta_i = 1 \rightarrow \sum_i^m \delta_i = 0.\)
This will help us out later, so let’s keep it in our back pocket.</p>
<p>We’re now in a position to expand the logarithm using a Taylor series, which gives us:
\(\ln\left( \frac{1}{m} + \delta_i \right) = - \ln m + \sum_{k=1}^\infty \frac{(-1)^{k-1}}{k} (m\delta_i)^k.\)
This looks complicated, but it will allow us to take in each piece of the problem separately. If we write out what the integral looks like, it will be (following the notation of Reference 3):
\(\langle S(\rho_A) \rangle = \ln m - \frac{m}{2} \left\langle \sum_i \delta_i^2 \right\rangle + \frac{m^2}{6} \left\langle \sum_i \delta_i^3 \right\rangle - \ldots\)
Where did all of these terms come from? They all come from this big multiplication:
\(\langle S(\rho_A) \rangle = - \left\langle \sum_i^m \left( \frac{1}{m} + \delta_i \right) \left( - \ln m + \sum_{k=1}^\infty \frac{(-1)^{k-1}}{k} (m\delta_i)^k \right) \right\rangle.\)
If we multiply the first two terms of each expression, we see that we get:
\(-\left\langle \sum_i^m \left( \frac{1}{m} \right) \left( - \ln m \right) \right\rangle = \frac{\ln m}{m} \sum_i^m 1 = \ln m.\)
The other terms follow this exact method, though you need to keep the angular brackets because the terms with $\delta_i$ do depend on the various values of $\lambda_i$.</p>
<p>For our purposes, we will only look at the first two terms (there should be an argument about the higher terms being negligible when $n$ becomes large). That means our last step is to evaluate the $\delta_i^2$ term.</p>
<p>We will go about this in an oblique way. Remember that this term is related to the eigenvalues $\lambda_i$. But the probability distribution for them is not available to us (at least, without a lot of work), but we do have the probability of picking our coefficients $P(c)$.</p>
<p>To get there, we will first notice that we can calculate the trace of our density matrix pretty easily when we have the Schmidt decomposed form (the one with $\lambda_i$).</p>
<p>Because the matrix is diagonal, the trace is just the sum of those components. If we take the trace of a density matrix, it’s always one (that’s the normalization). It’s also why the average of the sum of $\delta_i$ is zero. But unless we’re dealing with a pure state, the trace of $\rho^2$ isn’t one. In fact, this quantity is called the <em>purity</em> of a state because it’s one if and only if the state is pure.</p>
<p>The good news is that we can still compute the trace easily:
\(\text{Tr}\rho_A^2 \equiv \sum_i^m \lambda_i^4 = \sum_i^m \left( \delta_i + \frac{1}{m} \right)^2 \\
= \sum_i^m \delta_i^2 + \frac{2}{m}\sum_i^m \delta_i + \sum_i^m\frac{1}{m^2} = \sum_i^m \delta_i^2 + \frac{1}{m}.\)
The point is that we can now substitute our average over $\delta_i^2$ for an average over the trace of our reduced state, which we will be able to calculate. In other words:
\(\left\langle \sum_i^m \delta_i^2 \right\rangle = \left\langle \text{Tr}\rho_A^2 - \frac{1}{m} \right\rangle.\)
Putting everything together, our average entropy looks like:
\(\left\langle S(\rho_A) \right\rangle = \ln m - \frac{m}{2} \left\langle \text{Tr}\rho_A^2 - \frac{1}{m} \right\rangle.\)
So we’ve kicked our problem down the road. We’re <em>almost</em> at an expression for the entropy, but we have this pesky trace over the square of the density matrix. This isn’t exactly forthcoming with insight, so we’d like to deal with this term.</p>
<h2 id="relaxing-the-delta">Relaxing the Delta</h2>
<p>To do this, we will now make use of our probability distribution from the beginning. Remember that it looks like this:
\(P(c) \sim \delta \left( \sum_i^m \sum_j^n \lvert c_{ij} \rvert^2 - 1 \right).\)
But dealing with a Dirac delta distribution isn’t fun unless you’re in very specific scenarios. Therefore, we will relax this distribution to one which is much easier to do calculations. We will end up getting the same asymptotic result at the end, and save ourselves some algebra.</p>
<p>Instead of sampling from this distribution, we’re going to use a modified distribution that will approximation $P(c)$. I’ll give it the name $Q(c)$, and it will look like:
\(Q(c) = \prod_{i,j} \frac{nm}{\pi} \exp{\left( -nm |c_{ij}|^2 \right)}.\)
Those sharp-eyed readers will recognize this as nothing other than a product of Gaussian distributions for the coefficients $c_{ij}$ with the mean being zero ($\langle c_{ij}\rangle = 0$) and variance $\langle |c_{ij}|^2\rangle = 1 /nm $.</p>
<p>Why does this distribution make sense?</p>
<p>First, the mean being zero just tells us that we aren’t biasing the distribution towards one direction or another. Second, the variance being $1/nm$ encodes the fact that we want the state to be normalized. Remember that the trace of the density matrix has to be one. But with this Gaussian distribution, it’s not <em>quite</em> equal to one, since we aren’t using the Dirac delta.</p>
<p>Those in physics are used to this substitution though. A Dirac delta distribution can be approximated using a Gaussian distribution, and this is exactly what we are doing here. Graphically, you can see this in the following animation that starts with a Gaussian distribution and sends $nm\rightarrow \infty$.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/v1608654029/Blog/DiracGaussian.gif" style="zoom:50%;" /></p>
<p>From the animation above, you see that the peak gets more sharply defined and narrower as time goes on. This shows us that our Gaussian function can approximate a Dirac delta distribution better and better.</p>
<p>We can now compute some of the traces we need. First though, we calculate the reduced density matrix $\rho_A$, which is defined as $\text{Tr}_B \rho_{AB}$.
\(\rho_{AB} = |\psi\rangle \langle\psi| = \sum_{i}^m \sum_{j}^n c_{ij} |i\rangle |j\rangle \sum_{k}^m \sum_{l}^n c^*_{kl} \langle k| \langle l| \\
= \sum_{i,k}^m \sum_{j,l}^n c_{ij} c^*_{kl} |i\rangle\langle k| \otimes |j\rangle \langle l|.\)
A bit of a monster expression, but the terms are grouped up for us to easily take the partial trace over subsystem B.
\(\rho_A =\sum_{i,k}^m \sum_{j,l}^n c_{ij} c^*_{kl} |i\rangle\langle k| \text{Tr} \left( |j\rangle \langle l| \right) = \sum_{i,k}^m \sum_{j,l}^n c_{ij} c^*_{kl} |i\rangle\langle k| \delta_{jl} \\
= \sum_{i,k}^m \sum_j^n c_{ij} c^*_{kj} |i\rangle\langle k|\)
Then, we can look at the trace of $\rho_A$, which we know should be one if we’re following our Dirac distribution, but might be off a bit here.
\(\text{Tr}\rho_A = \sum_{i,k}^m \sum_j^n c_{ij} c^*_{kj} \text{Tr}\left(|i\rangle\langle k| \right) = \sum_{i,k}^m \sum_j^n c_{ij} c^*_{kj} \delta_{ik} \\
= \sum_{i}^m \sum_j^n |c_{ij}|^2.\)
The problem is that the Gaussian distribution doesn’t have this being equal to one. But, on <em>average</em> this will be true, because of the requirements we set on the coefficients from before:
\(\left\langle \text{Tr}\rho_A \right\rangle = \left\langle \sum_{i,j}|c_{ij}|^2 \right\rangle = \sum_{i,j} \left\langle|c_{ij}|^2 \right\rangle = \sum_{i,j} \frac{1}{mn} = 1.\)
This means that on average, our state is normalized (which we are certainly hoping for!).</p>
<p>We now need to deal with the other term, which involves $\rho_A^2$. So first, we take a deep breath, dig into the algebra, and calculate it:
\(\rho_A^2 = \sum_{i,k}^m \sum_j^n c_{ij} c^*_{kj} |i\rangle\langle k| \sum_{a,d}^m \sum_b^n c_{ab} c^*_{db} |a\rangle\langle d| \\
= \sum_{i,k, a,d}^m \sum_{j,b}^n c_{ij} c^*_{kj} c_{ab} c^*_{db} |i\rangle\langle k| a\rangle\langle d| \\
= \sum_{i,k, a,d}^m \sum_{j,b}^n c_{ij} c^*_{kj} c_{ab} c^*_{db} \delta_{ka} |i\rangle\langle d| \\
= \sum_{i, a,d}^m \sum_{j,b}^n c_{ij} c^*_{aj} c_{ab} c^*_{db} |i\rangle\langle d|.\)
Yes, there are a lot of indices to keep track of! Now, we’re in a position to trace over this to find the purity.
\(\text{Tr}\rho_A^2 = \sum_{i, a,d}^m \sum_{j,b}^n c_{ij} c^*_{aj} c_{ab} c^*_{db} \text{Tr} \left(|i\rangle\langle d|\right) \\
= \sum_{i, a,d}^m \sum_{j,b}^n c_{ij} c^*_{aj} c_{ab} c^*_{db} \delta_{id} \\
= \sum_{i, a}^m \sum_{j,b}^n c_{ij} c^*_{ib} c_{ab} c^*_{aj}.\)
Okay, still not the most comprehensible. What we’re going to do though is exploit the properties that we were choosing our states with. Namely, that the coefficients have mean zero and variance $1/nm$.</p>
<p>How does this help?</p>
<p>Let’s split our sums into parts. If you look at the coefficients $c_{ij}c^*_{ib}$, they are just begging to have $j = b$ so we can combine them. That’s going to be our strategy here. In fact, we’re going to do a decomposition of the form:
\(\sum_{i, a}^m \sum_{j,b}^n = \sum_{i, a}^m \sum_{j=b}^n + \left( \sum_{i = a}^m \sum_{j\neq b}^n + \sum_{i\neq a}^m \sum_{j \neq b}^n \right).\)
This covers all of the different possibilities for the sum. The first term is when $j = b$, and the second term in parentheses is when $j\neq b$ (just split up even further). We can then use our properties from the sampling.</p>
<p>Let’s look at the first term. Since $j = b$, we will have:
\(\sum_{i, a}^m \sum_{j=b}^n |c_{ij}|^2 |c_{aj}|^2.\)
By itself, not that interesting. But if we look at the average (which is our goal), we get:
\(\sum_{i, a}^m \sum_{j=b}^n \left\langle|c_{ij}|^2 \right\rangle \left\langle|c_{aj}|^2\right\rangle = \sum_{i, a}^m \sum_{j=b}^n \frac{1}{\left(mn\right)^2} = \frac{m^2n}{m^2n^2} = \frac{1}{n}.\)
The reason we can just take the product of the averages has to do (I think) with the fact that the coefficients are sampled independently, so there shouldn’t be an “cross-talk” between them.</p>
<p>The next piece is when $i=a$ and $j\neq b$. This gives us another nice grouping of the coefficients:
\(\sum_{i = a}^m \sum_{j\neq b}^n |c_{ij}|^2 |c_{ib}|^2.\)
It’s a similar story to the previous one when we take the average. The only difference is that now $j\neq b$, so there are slightly less terms in the sum:
\(\sum_{i=a}^m \sum_{j \neq b}^n \left\langle|c_{ij}|^2 \right\rangle \left\langle|c_{ib}|^2\right\rangle = \sum_{i= a}^m \sum_{j\neq b}^n \frac{1}{\left(mn\right)^2} = \frac{mn(n-1)}{m^2n^2} \\
\approx \frac{1}{m} \,\,\, \text{for} \,\,\, n \rightarrow\infty.\)
The last term is the easiest. Here, we have $i\neq a$ and $j \neq b$, which means that none of the coefficients will group up. Instead, when we take the average, we will be able to invoke the property of our coefficients having mean zero, so this term won’t contribute.</p>
<p>All told, our end result (in the asymptotic limit of large $n$) is this:
\(\left\langle \text{Tr}\rho_A^2 \right\rangle \approx \frac{1}{n} + \frac{1}{m} = \frac{m+n}{mn}.\)
How does this compare with the exact expression? It turns out that the only difference (see Reference 3) is that the denominator becomes $mn+1$, which we know won’t make a ton of difference when the size of our Hilbert spaces are very big.</p>
<p>So, going <em>all</em> the way back to our calculation for the average entropy, we find that we get:
\(\left\langle S(\rho_A) \right\rangle \approx \ln m - \frac{m}{2} \left\langle \text{Tr}\rho_A^2 - \frac{1}{m} \right\rangle \\
\approx \ln m - \frac{m}{2} \left( \frac{1}{m} + \frac{1}{n} - \frac{1}{m} \right) \\
= \ln m - \frac{m}{2n}.\)
I don’t know about you, but isn’t it nice we could get to a simple result in the end without too many simplifications?</p>
<p>This result is sometimes called the Page limit, and it shows us that the average entropy of a quantum state in a Hilbert space is almost maximal. The maximal part is from the $\ln m$, which is what you get when you do an analysis for the maximum entropy a system can have (basically, when the probability distribution for your states is uniform).</p>
<p>But why is this important?</p>
<p>Remember that $m$ and $n$ are the dimensions of our Hilbert spaces. So if we have $A$ qubits in the first system and $B$ qubits in the second system, then $m = 2^A$ and $n = 2^B$. If we imagine our system has the same number of qubits in both systems (so $A=B$), then our average entropy becomes:
\(\left\langle S(\rho_A) \right\rangle \approx \ln 2^A - \frac{1}{2} \\
= A\ln 2 - \frac{1}{2}.\)
What this means is that the entropy scales with the size of the system. Because we are in one dimension (just a line of qubits), you might even say the entropy is scaling with the <em>volume</em>. This is why physicists sometimes refer to a “volume law of entanglement”. They are talking about precisely this notion. When the entanglement entropy of the system scales with the system size, we call it a volume law.</p>
<hr />
<p>So if I reach into my Hilbert space bag and pull a state out at random, not only will it be entangled, but it will be nearly <em>maximally</em> entangled.</p>
<p>The thing is, a lot of the states we’re used to seeing in physics are <em>not</em> maximally entangled. In other words, they don’t follow a volume law. These include product states that we use to start off computations in a quantum computer and the ground states of many Hamiltonians. So what’s the deal with them?</p>
<p>This leads to both a hypothesis, and a different type of entropy law. Whereas this essay showed you that grabbing a random state out of Hilbert space will net you one that follows a volume law, it turns out that a lot of states that are relevant for us physicists do <em>not</em> follow a volume law. In fact, many follow an <em>area</em> law<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>, and this lack of growth in the entropy gives us hope that we can simulate a lot of these systems without even needing to use powerful quantum computers.</p>
<p>But that’s for a future essay. Here, just remember that you might not know the <em>specific</em> price tag of a particular quantum state, but you can be fairly sure that if you pick randomly, it will scale with the size of the system.</p>
<h2 id="references">References</h2>
<ol>
<li><a href="https://www.math3ma.com/blog/understanding-entanglement-with-svd">“Understanding Entanglement With SVD”</a>, Tai-Danae Bradley’s post on Math3ma for the Schmidt decomposition and looking at quantum states. She also does other interesting work on the crossroads of tensors, categories, and quantum theory. It’s worth reading the posts just for the lovely diagrams!</li>
<li><a href="https://arxiv.org/abs/gr-qc/9305007">“Average Entropy of a Subsystem”</a>, by Don Page. This paper gives the result (Equation 10 of the paper) we get in this essay, with some additional corrections. I haven’t read through this super carefully, but it’s a derivation from the physicist’s perspective.</li>
<li><a href="https://arxiv.org/abs/1003.3153">“Many-body physics from a quantum information perspective”</a>, by Remigiusz Augusiak, Fernando Cucchietti, and Maciej Lewenstein. This paper is the basis for this post, and it includes the suggestion of using Gaussian functions instead of discontinuous Dirac deltas for the probability distribution of being on a unit hypersphere. The relevant part is section 3.1.
I haven’t gone into this much, but it seems like the authors cite Lubkin’s <a href="https://aip.scitation.org/doi/10.1063/1.523763">paper</a> which goes into more of the mathematical details, if you’re so inclined (warning: I didn’t read this paper myself and it might be behind a paywall).</li>
</ol>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Okay, this is only part of the story. Things get more complicated when you consider the difference between pure and mixed states. A mixed state is basically a way to deal with both quantum probabilities (amplitudes) as well as classical probabilities (I make <em>this</em> state with probability p, and this other state with probability 1-p). In that case, trying to disentangle the notion of entanglement (what we’re thinking of as “quantum”) from plain ol’ classical probability gets tricky. For our purposes though, we’re dealing with a pure state. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>One way to see this is to look at the Schmidt decomposition itself. Because the eigenvalues $\lambda_a$ are the same for system A or B, they will get “carried through” in the computation and will lead to the same result. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>I like to think of the Dirac delta distribution as a laser pointer. It can pinpoint <em>exactly</em> what you want, while setting everything else to zero. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>One annoying thing about the literature on the Schmidt decomposition is that it uses different conventions for these Schmidt values $\lambda_i$. It all has to do with if you’re using the vector representation of a quantum state or its density matrix. Some authors like to have the $\lambda_i$ be associated with the quantum state $| \psi \rangle$, so when you calculate the density matrix $\rho = | \psi \rangle \langle \psi |$, it has coefficients $\lambda_i^2$. Others (like the paper I referenced on this calculation) prefer to have $\rho$ with the $\lambda_i$, so that means the vector has to have $\sqrt{\lambda_i}$. Confusing, I know. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Like I said before, I prefer the term “boundary law”, because “area” becomes confusing when you’re in one dimension, for example. The boundary is a bit easier for me to think about. At any rate, the idea is that it scales with one less dimension than where your system lives. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Fri, 25 Dec 2020 00:00:00 +0000
https://cotejer.github.io//pick-a-state
https://cotejer.github.io//pick-a-stateAll in the Corners<p>As a quantum theorist, I spend a lot of time thinking about high-dimensional spaces. These are the playgrounds for quantum many-body systems, and they are vast. The technical name is a Hilbert space, and it’s the space of complex vectors with the additional structure of a way to put vectors together (called an inner product).</p>
<p>Hilbert space is big (see <a href="https://cotejer.github.io/curse-of-dimensionality">“The Curse of Dimensionality”</a>), but the usable area for quantum theory is often much smaller. This means we are stuck finding corners of a high-dimensional space that describe physically-relevant phenomena.</p>
<p>A very important tool in my field is entropy. It can characterize the amount of entanglement between quantum states, which lets us talk about systems that produce behaviour far away from our usual ideas.</p>
<p>Recently, my supervisor and I were preparing homework for a class I help teach, and we wanted students to investigate the amount of entanglement<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> present in a random quantum state of <em>N</em> qubits. This was a numerical exercise, and to complete it they needed to produce random unit vectors (because quantum states are normalized). This got me thinking about how to actually choose random unit vectors. <!--more--></p>
<p>In this essay, we’ll explore the various ways you can choose random vectors, and why some methods won’t give you truly random unit vectors. To get there, we’ll need to think about high-dimensional spaces, why certain sampling procedures can lead to “clumping”, and why the requirement of being a unit vector changes the geometry of sampling we should be considering. To keep things from veering off into me telling you to visualize things in 35-dimensional space, we will build off our knowledge of two and three dimensions, and also note where things can go wrong.</p>
<p>By the end, we will see that the question of picking a random unit vector is more complicated than it seems at first glance.</p>
<h2 id="what-is-random">What is random?</h2>
<p>When we use the word “random”, we need to be careful that we are talking about the same thing. That’s because random is always with respect to a probability distribution. Picking from a uniform distribution (each outcome is weighted equally) versus a normal distribution (outcomes near the mean are favoured, then taper off as you go further out) is a very different affair. You shouldn’t be surprised to get values near the endpoints of your interval in the uniform case, but you <em>should</em> be surprised if you’re picking from a normal distribution.</p>
<p>All this to say: random needs context.</p>
<p>We want to pick random unit vectors. As such, it’s probably worth defining exactly what we mean. A <em>vector</em> for our purposes will just be a list of numbers, with the number of entries corresponding to our desired dimension. For three dimensions, a vector could look like (a, b, c), where <em>a</em>, <em>b</em>, and <em>c</em> are real numbers. A <em>unit</em> vector just means that the norm of the vector is one. In particular, the norm we’re using is the regular L<sup>2</sup> norm, which means we look at the sum of squares of the components, so our vector would have norm (a<sup>2</sup>+b<sup>2</sup>+c<sup>2</sup>)<sup>1/2</sup>. To transform this into a unit vector, we just need to divide each component by the vector norm.</p>
<p>Finally, we get to our lovely word: <em>random</em>. Here, I want random to mean that, out of the possible unit vectors, each one is equally likely to be chosen. In other words: I want to pick a unit vector uniformly at random.</p>
<hr />
<p>With that out of the way, let’s dive into strategies to implement this. If you were to ask me, my first response would be a procedure like this:</p>
<ol>
<li>For each coordinate of your vector, choose a random number in some interval [-a,a] (it doesn’t matter what <em>a</em> is).</li>
<li>Rescale the vector you get by its norm, producing a unit vector.</li>
<li>Repeat for the desired number of random unit vectors.</li>
</ol>
<p>Simple, right?</p>
<p>The problem is that it doesn’t work.</p>
<p>Oh, you will get unit vectors. But they won’t be uniformly sampled. Instead, you will get “clumps” in certain regions of the possible unit vectors.</p>
<p>To check my claim, I’ll run some simulations that generate vectors at random, and then plot the results. To keep things visible, I’ll start by doing this in two dimensions. This will give us an easy in to the problem.</p>
<p>The following plot is what I get when I generate two lists: one with vectors whose components are picked uniformly from the interval [-1,1] (the red dots), and the other when I pick the vectors in the same way and then rescale them to be of unit length. (As a quick note, they the blue points aren’t found from the red ones. I generated two separate lists, but when you’re taking 5000 points, it shouldn’t be a big deal.)</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1605468864/Blog/Clumping.png" alt="" /></p>
<p>Do you notice the clumping?</p>
<p>Okay, maybe it’s not so obvious. I’ve made the dots somewhat transparent so that overlap would show up as darker regions. If you look at the four “diagonal” directions (North-West, North-East, South-West, and South-East), you will notice the blue is more prominent. That’s because there’s clumping in those regions. Those areas are <em>denser</em> than the regions at the regular cardinal directions (North, East, South, West).</p>
<p>Why does this happen?</p>
<p>To get a handle on this question, think about what we’re sampling when we use the uniform probability distribution. Any given coordinate is being chosen in an interval [-1,1], so in total our vector is being sampled from a <em>box</em> whose side length is 1−(−1)=2. For the plot above, this box is just a square, and you can see that the red dots do indeed sample the box at a pretty uniform rate (there’s no clumping that I can see).</p>
<p><strong>When we sample uniformly from the coordinates, the space we sample is a <em>box</em>.</strong></p>
<p>But what are we really trying to do when we want to sample unit vectors?</p>
<p>Not only do we want to pick the coordinates at random, we want the norm of the vector to be one. This modifies the geometry of our space. If the vector must have unit length, the space we want to sample should be <em>spherical</em>. Looking at any other space will introduce bias in our results.</p>
<p>Okay, let’s unpack this a bit.</p>
<p>A unit vector is a vector with length one. This means that if your vector has coordinates x<sub>i</sub>, then the coordinates satisfy:</p>
<p>∑<sub>i</sub> x<sub>i</sub><sup>2</sup> = 1.</p>
<p>If we’re in two dimensions, this is just a fancy way of writing out the Pythagorean theorem with hypotenuse of length 1. If you vary the coordinates of your two-dimensional unit vector, you will trace out a circle. In higher dimensions, we trace out spheres, which is why we call such an equation spherically symmetric.</p>
<p>This means that choosing a random unit vector with uniform probability requires us to sample from a spherically symmetric distribution. Or at the very least, if you don’t use a spherical geometry, whatever you do to rescale things should counter the bias you introduced. We’ll just focus on making our distribution symmetric to start with, and see why picking uniform Cartesian coordinates doesn’t work.</p>
<p>Sampling our Cartesian coordinates as indicated above gives us a square in two dimensions, which we can plot relative to the circle (the spherically symmetric unit vectors).</p>
<p>Imagine you generate a bunch of random points within the square. This is what generating uniformly random Cartesian coordinates gives us (with the blue circle being the points that are unit vectors):</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1605475363/Blog/CircleDots.png" alt="" /></p>
<p>Now, we want to modify those dots so they all become unit vectors. How do we do this? We rescale! Each vector has a certain distance from the origin, and it’s likely that this distance isn’t one, like so.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1605475363/Blog/CircleDotsWithPoint.png" alt="" /></p>
<p>Using the coordinates of the point, we can calculate it’s length from the origin, then <em>divide</em> each coordinate by that length. Doing so will give you a unit vector by definition.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1605475363/Blog/CircleDotsRescaled.png" alt="" /></p>
<p>But what does this rescaling look like on our square of points?</p>
<p>We know that rescaling will either <em>push</em> a vector with length less than one or <em>pull</em> a vector with length greater than one onto the circle.</p>
<p>Here is the fun part. Look at the difference between the North-East and North directions. In the North, <em>almost all</em> the points are closer to the origin than the circle. In fact, if you’re looking exactly North, than the only points that aren’t closer to the origin are those that already lie on the circle at the point (0,1). Everything else is pushed upwards.</p>
<p>But look at what happens in the North-East. You still have the same amount of points that are closer to the origin than the circle (because the circle is spherically symmetric, this is true in any direction you look). However, now you have a <em>ton</em> of extra points that are further from the origin than the circle!</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1605475363/Blog/ExtraPoints.png" alt="" /></p>
<p>I like to think of these as “extra” points that get pulled in because of the geometry of the box. In essence, the box puts “more” points in the corners, so those get rescaled down and populated the diagonal directions with more points when we rescale.</p>
<p>This is exactly what I was referring to when I talked about “clumping”. Because there are a bunch of extra points in the corners, your sampling will end up producing more points in those directions than the regular cardinal directions.</p>
<p>Here’s another way to think about. If you look in the North direction, you can go at most 1 unit away before you hit the box. This means that you have one unit of “space” to generate points on that line. But in the diagonal directions, you now have √2 units for your points to land on! Since your line is longer, more points will accumulate on it, making those directions favoured (biased).</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1605475363/Blog/ExtraLength.png" alt="" /></p>
<p><strong>When you use Cartesian coordinates that are sampled uniformly, you end up generating many more “corner samples” than you would want.</strong></p>
<p>And it gets worse. How much “extra space” is there in a box compared to a sphere? In two dimensions, the area of a unit circle is π, and the box has area 2<sup>2</sup> = 4, so the ratio is π/4≈0.785. This means that there’s about 21.5% “extra space” between them.</p>
<p>In three dimensions, the volume of a unit sphere is (4/3)π, and the box’s volume is 2<sup>3</sup> = 8, so the ratio is (4/3)π/8≈0.524. As we can see, we have a lot more extra space!</p>
<p>If we look at the ratio in n dimensions, we need to know the sphere’s volume. Using Stirling’s formula (an approximation for factorials), the volume of a unit n-sphere is:</p>
<p>V<sub>s</sub>(n) ~ (1/nπ)<sup>1/2</sup>(2πe/n)<sup>n/2</sup>.</p>
<p>We won’t dive into the reasoning here (perhaps in another essay). The volume of the box is much simpler. Each size as a factor of 2, so the volume is 2<sup>n</sup>. Therefore, the ratio looks like:</p>
<p>V<sub>s</sub>(n)/2<sup>n </sup>~ (1/nπ)<sup>1/2</sup>(πe/2n)<sup>n/2</sup> → 0 as n → ∞.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1605475363/Blog/Ratio.png" alt="" /></p>
<p>Put more colorfully: As you go to higher dimensions, your box becomes almost <em>entirely</em> filled in the corners<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. This means the bias we’ve seen in two dimensions will only get worse.</p>
<p>When you’re in high dimensions, everything is in the corners.</p>
<h2 id="the-normal-way">The normal way</h2>
<p>So what’s a <em>good</em> way to generate these unit vectors then?</p>
<p>One answer lies in taking a detour to see a certain kind of distribution, called a <em>Gaussian</em> or <em>normal</em> distribution. This distribution is one you see all over probability theory. In fact, I’d wager that saying “probability theory” will generate thoughts of the normal distribution in the minds of mathematicians.</p>
<p>What is the normal distribution? Briefly, the idea is that most of the possible values the distribution can take will be centered at a mean μ, and the the probability of getting a value away from the mean tapers off (controlled by a parameter σ called the <em>variance</em>). Mathematically, it looks like:</p>
<p>\(f(x) = \frac{1}{\sigma \sqrt{2\pi}} \exp{\left[ -\frac{1}{2} \left( \frac{x-\mu}{\sigma}\right)^2 \right]}.\)
This may look like an intimidating function, but when you plot it, the reality is much simpler.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1605475363/Blog/Normal.png" alt="" /></p>
<p>The wonderful thing about a function like the exponential is that it transforms <em>multiplication</em> into <em>addition</em>. This is one of the laws that get taught to secondary students early on:</p>
<p>e<sup>a</sup>e<sup>b</sup>=e<sup>a+b</sup>, where a and b are numbers.</p>
<p>This is going to be hugely helpful to us.</p>
<p>We’ve seen graphically that a uniform distribution doesn’t give us a spherically symmetric distribution. What makes this distribution different? After all, we’re still going to sample each <em>Cartesian</em> coordinate independently, so what’s the big deal with the normal?</p>
<p>It all has to do with the <em>joint</em> probability distribution.</p>
<p>Let’s go back to our two-dimensional case. Here, we have our x and y coordinates that we want to sample. We can don’t really care about rescaling after, but it needs to be done in a way that everything stays spherically symmetric.</p>
<p>The joint probability distribution simply tells us how to link together multiple probability distributions. Because we’re looking at independent coordinates, our joint probability distribution will satisfy:</p>
<p>\(f(x,y) = f(x)f(y).\)
In the uniform case, we have f(x) = f(y) = 1/2, where the 2 comes in because it’s the length of the interval. This nets us a “square” distribution for the coordinates, which doesn’t respect spherical symmetry.</p>
<p>On the other hand, if we take a normal distribution with mean zero (μ=0) and variance one (σ=0), the joint distribution will look like this:</p>
<p>\(f(x,y) = f(x)f(y) = \frac{1}{2\pi} \exp{\left[-\frac{1}{2} \left( x^2 + y^2 \right) \right]}.\)
Look at the argument of the exponential. Does the expression x<sup>2</sup>+y<sup>2</sup> look familiar?</p>
<p>This is precisely the radius r in polar coordinates! If we make the substitution, we end up with:</p>
<p>\(f(x,y) = f(r).\)
<em>This</em> is the reason that sampling from a normal distribution works. The structure of the exponential transforms a multiplication from independent distributions to an addition in the exponential. Furthermore, the fact that we have the <em>square</em> of the variable lets us employ the trick of going to polar coordinates.</p>
<p>Now that our distribution is a function of r, we’re guaranteed for it to be spherically symmetric. This means we’ve successfully constructed a way to sample unit vectors! All we have to do is the following.</p>
<ol>
<li>For each Cartesian coordinate, sample a number from the probability distribution N(0,1) (the normal distribution with μ=0 and σ=1).</li>
<li>Rescale the vector you generate by its magnitude.</li>
</ol>
<p>And that’s it! By simply changing the distribution we sample from, we can encode the fact that we want a spherically symmetric distribution.</p>
<p>——</p>
<p>Probability is a subtle mathematical subject, with the answer you get often hinging on the assumptions you baked into your approach (which are often implicit!). We saw a small example here, but this shows up everywhere in mathematics, and what it teaches me is that we need to be careful when tackling a problem. It’s worth spelling out your assumptions, checking to see if there are unintended consequences, and mending them if necessary.</p>
<p>This essay showed up that asking a simple question (How can I generate random unit vectors?) can lead us astray if we don’t question our assumptions. The probability distribution we used on the first try “fills up the corners” of our hypercube, bringing with it unwanted clumpy-ness. It also showed us that high-dimensional objects have properties that might run against our initial guess.</p>
<p>So the next time you start looking at hyperspheres and hypercubes, remember: the volume goes all to the corners.</p>
<h2 id="references">References</h2>
<ul>
<li>For more on the n-sphere and its geometrical properties, see <a href="https://en.wikipedia.org/wiki/N-sphere">this Wikipedia article</a>. There are plenty of interesting nuggets in higher dimensions, and many of them are collected here.</li>
<li>This <a href="https://stackoverflow.com/questions/6283080/random-unit-vector-in-multi-dimensional-space">StackOverflow question</a> is how I got started when I wanted to implement this in Python.</li>
</ul>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>The measure we were interested in is called the <em>entanglement entropy</em>. It works like this. If you have a quantum state over a composite system (like many particles), you first divide it into two parts. Then, you can compute the reduced density matrix on one of those states, which is just a way to figure out the quantum state on one part by itself. Then, if you calculate the von Neumann entropy of <em>that</em> state, you get the entanglement entropy of the original quantum state. It also turns out that once you separate the composite system into two parts, it doesn’t matter which one you use to calculate the entropy. The result will be the same. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>If you think it’s weird that the volume of the sphere takes up less and less space in the box as you increase the number of dimensions, welcome to the oddities of high-dimensional spaces! <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 25 Nov 2020 00:00:00 +0000
https://cotejer.github.io//all-in-the-corners
https://cotejer.github.io//all-in-the-cornersThe Curse of Dimensionality<p>If there’s one field of mathematics that everyone encounters in their daily life, I would argue that it’s combinatorics (with perhaps geometry being the other one). The rules of combinatorics cast a shadow over our lives. They affect how we make decisions and form the scaffolding for how options in our lives are displayed to us.</p>
<p>In this essay, I want to explore the idea which is known as <em>the curse of dimensionality</em>. <!--more-->It’s a lovely name, and the core of the concept is something that comes up without even straying too far into physics or mathematics (but we’ll get there!). To illustrate this, let me paint a little story<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>Imagine you decide to start running. Of course, you need to purchase some running shoes, so you go to the store to look at the options. At first, you don’t know anything about running shoes. They look more or less the same, and you judge them based on their looks. However, after talking with one of the salespeople, you learn that there are different <em>kinds</em> of running shoes. In particular, road running shoes and trail running shoes form two broad categories. However, there are then options <em>within</em> those two categories, such as racing/training shoes, minimalist/maximalist, zero drop or non-zero drop, weight, price, and so much more. Soon, you realize that if you want to explore all of those different options, there will be a lot of shopping to do.</p>
<p>Let’s see how the combinatorics work. If there are just road/trail shoes, then there are two options. However, if you then add in the sub-category of racing/training, we get 2×2 = 4 options. If we add in the other options (zero/non-zero drop and minimalist/maximalist) and split up weight and price into five sectors each, we find that there are now 2×2×2×2×5×5 = 400 options. That’s for only six categories! If you had no inkling of which shoe would be best for you, paying for 400 different configurations of running shoes would be a difficult sell, even for the most passionate runner.</p>
<p>How did we get so many options? It comes from the fact that each new option you add introduces a <em>multiplicative</em> effect to the total number of configurations. This is known as the “multiplication rule” in combinatorics, and it illustrates the curse of dimensionality. As soon as you start adding in a lot of combinations, the number of possible configurations skyrockets.</p>
<p>Here, the idea of “dimension” isn’t just a reference to spatial lengths. When mathematicians and physicists use the word “dimension”, we tend to be thinking about something slightly different. Putting aside the discussion of fractals and non-integer dimensions, the idea of a dimension is simply a coordinate you need to describe the state of your system. In the example of the running shoes, at first we only needed one coordinate: road or trail. This choice specified the shoe for us. Each time we added a new category, this became an extra dimension to consider. By the end, we had six categories, so the problem of choosing a good running shoe became a six-dimensional problem.</p>
<p>This may seem disconcerting, since we’re used to having two or three dimensions when we think of space. However, the “space” that the story refers to can be thought of as the state space of possible running shoes. The first four dimensions have two possibilities each (corresponding to the twos in the product above), while the last two dimensions have five possibilities each.</p>
<p>Okay, but we might not need to explore all 400 options to get an idea of what kind of shoe you like. Perhaps if we choose a few good points, we can get an overall feel for which shoes are better.</p>
<p>That’s a good idea, and it’s what we try to do all the time when searching through higher-dimensional space in physics and mathematics. The curse of dimensionality reflects the fact that these spaces are just too damn big to go wading through all the options. Instead, we need to be clever about how we’re going to learn something about the space without wasting more time than necessary exploring it.</p>
<p>For the rest of this essay, I want to highlight a few areas within physics, machine learning, and statistics where the curse of dimensionality sneaks into the foreground.</p>
<h2 id="quantum-systems-and-the-vastness-of-hilbert-space">Quantum systems and the vastness of Hilbert space</h2>
<p>An example in physics where this problem pops up all the time is in quantum systems, particularly many-body systems.</p>
<p>In <a href="https://cotejer.github.io/game-of-loops"><em>A Game of Loops</em></a>, we explored the surface code, a quantum system that can help us do quantum error correction. There, we saw that the complete quantum state is described by an N×N lattice that has a qubit at each site. Because of this, there are N<sup>2</sup> physical qubits needed for the system.</p>
<p>How big is the state space of this system? Well, we first have to figure out how many degrees of freedom a single qubit has. Remember that we can write the quantum state of a qubit in the following way: ψ = (a,b), where <em>a</em> and <em>b</em> are just complex numbers whose magnitudes sum to one. This could make it seem like a qubit has four degrees of freedom (since a complex number has two degrees of freedom and there are two here), but the normalization constraint will eliminate one of them. Furthermore, it turns out that we can eliminate a second degree of freedom, which comes from the fact that the overall phase of a quantum state doesn’t play a role<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> in the physics. Taking both of these into account, the total number of degrees of freedom in a qubit is two.</p>
<p>That seems rather harmless, but as we’ll soon see, this leads to devastating consequences when trying to tackle quantum many-body problems.</p>
<p>Back to our surface code. We have N<sup>2</sup> physical qubits, and each one has (independently) two degrees of freedom. How many do we then have in total? Well, this is simply the multiplication rule in action (you can think of specifying the quantum state as a series of choices for each physical qubit). This means in total we will have 2<sup>N<sup>2</sup></sup> total states.</p>
<p>If there’s one thing you should be worried about when you see exponents, it’s a <em>tower</em> of exponents.</p>
<p>To get an idea of how this grows, look at the following table:</p>
<table>
<thead>
<tr>
<th style="text-align: center">N</th>
<th style="text-align: center">2<sup>N<sup>2</sup></sup></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">1</td>
<td style="text-align: center">2</td>
</tr>
<tr>
<td style="text-align: center">2</td>
<td style="text-align: center">16</td>
</tr>
<tr>
<td style="text-align: center">3</td>
<td style="text-align: center">512</td>
</tr>
<tr>
<td style="text-align: center">4</td>
<td style="text-align: center">65536</td>
</tr>
<tr>
<td style="text-align: center">5</td>
<td style="text-align: center">33554432</td>
</tr>
<tr>
<td style="text-align: center">6</td>
<td style="text-align: center">≈ 6.87×10<sup>10</sup></td>
</tr>
<tr>
<td style="text-align: center">7</td>
<td style="text-align: center">≈ 5.63×10<sup>14</sup></td>
</tr>
</tbody>
</table>
<p>As you can see, this is really some explosive growth. Things start off fine, grow rather quickly, and are even manageable at N = 5. However, as soon as increase the lattice by one more unit length, the state space becomes enormous.</p>
<p>No wonder my machine learning project during <a href="https://cotejer.github.io/psion">PSI</a> struggled when dealing with larger lattices.</p>
<p>And remember, this is only for N < 10. We aren’t exactly in the huge numbers territory in terms of lattice sizes, but the state space has its own scaling behaviour that makes things very difficult. This is only exacerbated for quantum many-body systems, which can have huge values of N. That’s why a lot of these problems are intractable at the moment, since we simply do not have the computational resources to get to lattices of that size<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>
<p>This brings us to machine learning. If you know anything about machine learning, it’s probably that this approach <em>loves</em> trudging through data (sometimes called “big data”). While this is true, there are also limits to what we can do. As the number of dimensions (degrees of freedom) grows, the amount of work it takes to search through the possible options becomes worse than finding a needle in a haystack.</p>
<h2 id="neural-networks-and-hyperparameter-tuning">Neural networks and hyperparameter tuning</h2>
<p>For my PSI project, the goal was to create an RNN decoder for the surface code (I will write about this more in a later essay). While the specifics aren’t super important, the key point is that neural networks aren’t magical. They take work to get right, and there are <em>way</em> more things you have to manually fiddle with than what you might expect.</p>
<p>Before I got into machine learning, I had the impression that building neural networks involved coding the model, feeding it training samples, and waiting for it to learn the data. While neural networks do this, the process is much more nuanced. In particular, there are knobs called <em>hyperparameters</em> that need to be set, and this takes a significant amount of time.</p>
<p>What is a hyperparameter? It’s an adjustable part of your neural network which isn’t automatically tuned as the neural network is trained. These are knobs which you have to set, and they play a big role in how well your network performs. That’s why adjusting them is so important.</p>
<p>In terms of the possible hyperparameters, here are a list of some:</p>
<ul>
<li>Number of training samples</li>
<li>Number of neurons/hidden units</li>
<li>Number of layers</li>
<li>Choice of activation function</li>
<li>Choice of loss function (and sometimes, added regularization)</li>
<li>Choice of optimizer</li>
<li>Learning rate</li>
<li>Learning rate decay</li>
<li>Mini-batch size</li>
</ul>
<p>Those are just the hyperparameters that immediately come to mind. The thing about hyperparameters is that they are a little bit like weeds in a garden: the more you look, the more you find.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/c_scale,q_auto:best/v1535842782/Handwaving/Published/Weeding.png" alt="" /></p>
<p>Some of these have a discrete number of options, while others (like the learning rate), can basically be any positive number. I think you can imagine where this is going. If you’re trying to find the optimal hyperparameters for your neural network, there are going to be a lot of combinations you can try. Because of the multiplication rule, the curse of dimensionality strikes again.</p>
<p>If we just look at the above hyperparameters, we’re already exploring a nine-dimensional space! Nine dimensions is a lot, which means <em>searching</em> through this space can be worse than looking for that needle in a haystack. At least with the haystack, we’ve limited ourselves to three dimensions.</p>
<p>In fact, while I thought the bulk of my project would involve coding the neural network, the real work was fiddling with the various hyperparameters, trying to see if I could squeeze out a bit more accuracy. This can be a frustrating endeavor, particularly when you’re working with a system whose performance you aren’t sure of. When should I give up with tuning and call the performance “good enough”? That’s the type of question I would ask myself. So by the time I was wondering if I was actually doing more tuning than the network itself!</p>
<p>It’s tempting to think that the neural network will do all the work for you, but that’s not true. The curse of dimensionality rears its ugly head again even for something as simple as getting a few model parameters adjusted.</p>
<p>Maybe you then have the clever idea of teaching another neural network to adjust those hyperparameters for you. While I imagine that could work, you’ve only kicked the problem down the road: How do you adjust the hyperparameters for that <em>new</em> neural network?</p>
<p>It’s turtles all the way down.</p>
<h2 id="a-flashlight-in-the-dark">A flashlight in the dark</h2>
<p>I once worked for a week on a problem at the intersection of quantum computing and machine learning. It was a project during my master’s degree that I did with two other students and some people from the company <a href="https://1qbit.com/">1QBit</a>. The idea was to explore the “loss landscape” of a quantum system to learn more about it (See Reference 4).</p>
<p>The loss landscape is an evocative name for something rather simple. When using some sort of gradient descent approach to a problem, there’s always a function you want to minimize. That function is called a “cost” or “loss” function. Minimizing the loss function means trying to find the local and global minima.</p>
<p>If we have a regular graph that we can visualize, seeing how we navigate to the minimum isn’t too difficult, like in the following example (note here that this means we only have <em>one</em> parameter, since the other is used to plot the cost).</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1603283451/Blog/2D_cost_function.png" alt="" /></p>
<p>We can even go up one dimension, where now we are looking at a <em>surface</em> and trying to find its minima. This is where the name “landscape” comes from. However, since these are difficult to draw, I will only show you an example of one from Pennylane’s post on barren landscapes (See Reference 5).</p>
<p><img src="https://pennylane.ai/qml/_images/sphx_glr_tutorial_local_cost_functions_002.png" alt="" /></p>
<p>The tricky part is when you’re trying to minimize functions of many variables. The tools are the same (gradient descent or a similar procedure, find where the derivative vanishes, and check that it’s an actual minimum and not a saddle point), but the problem becomes more difficult. In higher dimensions, the landscape isn’t something we can visualize<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. Furthermore, there’s the added complication that higher-dimensional spaces are vast, which means finding solutions to your problem can be fraught with issues.</p>
<p>During the week I worked on this problem, I saw firsthand how having many parameters in a loss function can make it difficult to really understand the loss landscape of your model.</p>
<h2 id="a-blessing-in-disguise">A blessing in disguise</h2>
<p>From the examples I’ve talked about in this essay, it’s easy to take the mindset that the curse of dimensionality is all downside. However, it turns out that there’s a surprise waiting for us in higher dimensions.</p>
<p>Despite the difficulties we’ve seen, there’s also a complement to the curse of dimensionality: it’s sometimes called the “blessing of dimensionality”. To cap things off on a cheerful note, I want to dip our toes into how higher dimensions can work in our favour.</p>
<p>The rough idea is that going to higher dimensions can allow you to learn more about your data. This is relevant in statistics, where you want to do inference from the data. By accumulating more data, the different features (dimensions) you are considering become intertwined. This lets you “piggyback” on a connection between variables A and B to find out something about variable C, since it has a connection to B (See References 1 for the full example I described just above and 2 for more perspective on the blessing).</p>
<hr />
<p>Befitting the month of October, the curse of dimensionality has the air of being a little scary. And for physicists like myself who study quantum many-body systems, it can definitely be a nuisance. But at the same time, encountering the complexity of the world can sometimes help, if we are willing to loosen the magnification of our lens and look at things a bit more fuzzily.</p>
<p>Just watch out for those high-dimensional loss landscapes.</p>
<h2 id="references">References</h2>
<ol>
<li><a href="https://simplystatistics.org/2015/04/09/a-blessing-of-dimensionality-often-observed-in-high-dimensional-data-sets/">“A blessing of dimensionality often observed in high dimensional data sets”</a>, by Jeff Leek.
This blog post gives a few examples of how doing statistics on many dimensions of data can be useful. I won’t pretend to know all of the details, but the point seems to be that high=dimensional data is helpful.</li>
<li><a href="https://statmodeling.stat.columbia.edu/2004/10/27/the_blessing_of/">“The blessing of dimensionality”</a>, by Andrew Gelman.
This post talks about how not everything is bad about high-dimensional data. He makes the point that it can be a good thing as well, explaining the title.</li>
<li><a href="https://en.wikipedia.org/wiki/Curse_of_dimensionality">“Curse of Dimensionality”</a>, Wikipedia. This page gives an overview of both the curse and the blessing, as well as some examples that are more mathematically-oriented.</li>
<li><a href="https://pennylane.ai/qml/glossary/variational_circuit.html">“Variational Circuits”</a>, Xanadu. This post gives a nice idea to what I’m talking about. Basically, a quantum circuit with tunable parameters is built, and then you do machine learning to train the parameters of the circuit to fit the output you want (for us, it was the ground state of a Hamiltonian). The loss landscape is then the cost function you come up with to encourage the model to be tuned well.</li>
<li><a href="https://pennylane.ai/qml/demos/tutorial_local_cost_functions.html">“Barren Plateaus”</a>, Xanadu. This post shows an example of a barren plateau, which is very difficult for doing gradient descent. That’s because gradient descent and a lot of machine learning is based on the idea of taking derivatives and moving in the direction of most change (hopefully towards a minimum). A barren plateau is a region where the derivatives are almost zero everywhere, making it difficult to pick a direction.</li>
</ol>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>This story is based on some great answers to the question from this <a href="https://stats.stackexchange.com/questions/169156/explain-curse-of-dimensionality-to-a-child">Cross Validated StackExchange post</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>This comes from the fact that quantum states aren’t vectors, but <em>rays</em> in the Hilbert space that they live in. This is just a fancy way of saying that any vector which is related to another by an overall global phase belongs to the same “class” of states. They are all identified with each other. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Of course, research is being done to make this more tractable. It’s also why we’re interested in quantum computers: they can store the states of quantum systems in a much more efficient manner, which means we don’t hit the curse of dimensionality as fast. If we have N qubits, there are 2<sup>N</sup> numbers we need to specify in order to simulate these on a classical computer (the complete state space). On the other hand, a quantum computer only needs the N qubits to do the same thing. The growth is linear (N) instead of exponential (2<sup>N</sup>). <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>There are tools to help visualize higher dimensions. For example, a technique called <a href="https://en.wikipedia.org/wiki/Principal_component_analysis"><em>Principal Component Analysis</em></a> (PCA) projects dimensions out so that we can “view” slices of the problem on a graph. It’s not perfect, but it does allow us to visualize things we wouldn’t have otherwise been able to. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Sun, 25 Oct 2020 00:00:00 +0000
https://cotejer.github.io//curse-of-dimensionality
https://cotejer.github.io//curse-of-dimensionalityPhysics On A Cube<p>One of my favourite mathematical pieces of writing is <a href="https://en.wikipedia.org/wiki/Flatland"><em>Flatland</em>, by Edwin Abbott Abbott</a> (the book is in the public domain, so you can download it from Wikipedia). Published over a century ago, it’s a story<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> involving residents (Flatlanders) who live in a two-dimensional world. Without giving too much of the story away (because you should seriously read it!), the inhabitants find themselves shocked when a strange shape dips into their world. That other “shape” is a sphere, which we know lives in a three-dimensional space. This confuses the residents to no end, and only a brave soul dares to push their mind further to explore the possibility of there being another dimension available.</p>
<p>What I take away from this tale is that there can be hidden dimensions available to us when we look more closely.</p>
<p>Okay, I’m not talking about the kind of extra dimensions from ideas like string theory. Instead, I’m referring to paradigm shifts that have occurred in physics, and how they carve out new dimensions in the space of possible theories for physicists to explore.</p>
<p>There is also a historical aspect to think about, since our theoretical frameworks for the universe have evolved along with our capability to actually <em>probe</em> the world. It’s difficult (or perhaps, too easy) to think about possible worlds when there are no constraints. There’s no feedback to guide you. That’s what experiments give us; ways of saying, “Okay, the world looks like <em>this</em> and not like <em>that</em>.”</p>
<p>Much like the Flatlanders, we began our story in a state of relative ignorance. The physics we knew (and learn as students) begins with Newtonian mechanics. This involves projectile motion, collisions, notions of energy, work, and momentum, and can be applied to all sorts of systems you would see in everyday life.</p>
<p>Here’s an example. If you see two things coming at each other, from the perspective of one of those things, the other is moving at the <em>sum</em> of the two speeds. This is the standard “addition” rule for velocities, and is something we grasp as kids playing sports. It’s also reinforced with every encounter we have, so it makes sense to assume that this is how the universe operates. Velocities add. Nice and simple.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1593785738/Blog/ReferenceFrames.png" style="zoom:30%;" /></p>
<p>But nature has some tricks up their sleeve. Namely, it is hiding the full story from us, just as the Flatlanders have no inkling of the third dimension.</p>
<p>In this case, there is more than one dimension. We will start with what captured the minds of many scientists in the 17th century, which is the study of gravitation and celestial mechanics. Here, the star of the show is Newton (with, like all advances in science, a slew of other characters that get a lot less limelight), but the specifics aren’t necessary. Instead, there was a problem. Celestial objects moved along the backdrop of the sky, and cannonballs rose and then fell back to Earth. There was all this motion, but how did it all work? Was there even any connection between these phenomena?</p>
<p>It was Newton who put forth his theory of universal gravitation, capturing everything into one compact equation (for the magnitude of the force):</p>
<p>F<sub>g</sub> = G m<sub>1</sub> m<sub>2</sub> / r<sup>2</sup>.</p>
<p>Here, F<sub>g</sub> is the gravitational force exerted by Object 1 on Object 2 (and vice versa), while m<sub>1</sub> and m<sub>2</sub> are the two masses and r is the distance of separation. For our story though, the important player is G, the universal gravitational constant. Now, if we wanted to talk about systems with some sort of gravitational nature, the constant G would be present.</p>
<p>This gave us a new dimension to think about. We started with looking at plain mechanics, and now we added gravity to the mix. Diagrammatically, we can illustrate our new space of frameworks as such:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1593958440/Blog/TheoryLine.png" style="zoom:40%;" /></p>
<p>Alright, so this is great. But like I said, this is far from the whole story. There were other surprises on the horizon, and would change how we view time and space.</p>
<h2 id="moving-fast">Moving fast</h2>
<p>Let’s think about our addition law for velocities again. In Newtonian mechanics, velocity vectors add in the usual way. This is done by defining something called an “inertial” frame, which is basically a way of setting up a coordinate system such that the system being analyzed is stationary, with everything else moving relative to it. To get an intuitive feel for this, just imagine being in a vehicle that is moving quickly. You know that you are the one moving relative to the trees. But if you let your mind relax, is it not equivalent to saying that the trees are rushing past you? This is what we mean by reference frames, and it was a huge help in analyzing systems using Newtonian mechanics.</p>
<p>There was a problem though. Not obvious at first, but cracks were beginning to show in physics. The main culprit was light. What would it look like to be in the rest frame (inertial frame) of a beam of light?</p>
<p>This is a question that Einstein pondered, and it led to his postulate that the speed of light was actually <em>constant</em> in all reference frames. Not only that, but this implied the the notion of space and time had to bend over to accommodate the fixed notion of the speed of light <em>c</em>.</p>
<p>Earlier in the history of physics, the speed of light was assumed to be infinite. In fact, it was only in 1676 that Ole Rømer gave convincing quantitative data that indicated the speed of light was finite<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. This set the stage for Einstein and his theory of special relativity. Space and time weren’t two separate things. They were intrinsically tied together through the speed of light to create spacetime. Furthermore, our notions of distance and time were dependent on our relative velocity with other systems.</p>
<p>But how could this be missed? Why didn’t we see any of these effects before? People certainly don’t move at the same speeds, so shouldn’t we have noticed relativistic effects?</p>
<p>This brings us to the new dimension that Einstein gave us for our space of theoretical frameworks. It turns out that the range of speeds we use as humans doesn’t really come close to unveiling relativistic effects. That’s because the key ratio is v/c (sometimes denoted β), where v is your speed relative to another system, and c is the speed of light. For most situations, this ratio is super-duper small. Small enough that equations such as the velocity addition law hold up. It’s only when we get to a very fast speed that relativity kicks in.</p>
<p>To picture this, a quantity that shows up in relativistic effects like time dilation and length contraction is the <em>gamma factor</em>, which is defined as γ = (1 - v^2^/c^2^)^-1/2^. If we plot this curve for various values of the speed v (and take units for which c = 1), we get the following plot.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1593959832/Blog/GammaFactor.png" style="zoom:40%;" /></p>
<p>I was even being generous with respect to the part I labeled as speeds that we inhabit. To give you a bit of perspective, the speed limit on the highways where I live is 100km/hr. If we express that in terms of the speed of light, the ratio is a huge 9.266×10^-8^. As you can imagine, this isn’t exactly easy to squeeze on my graph.</p>
<p>The key though is really that the speed of light c is finite. The exact value relative to the speeds we explore in our everyday lives would probably have affected how quickly we realized relativity was a thing, but the main idea is acknowledging that c isn’t infinite.</p>
<p>Because of this, our new dimension is an axis describing 1/c:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1593958440/Blog/TheorySquare.png" style="zoom:40%;" /></p>
<p>We now have two axes, each one with a “switch” we can flip. We can turn the gravitational constant G on or off, and we can do the same<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> for 1/c. Note here that having 1/c = 0 implies that c &rightarrow; ∞. The origin (0,0) gives us Newtonian mechanics, (0,G) gives us Newtonian gravity, and (c,0) gives us Einstein’s special relativity.</p>
<p>But what happens when we switch on both gravity and special relativity? Well, that gives us the final corner of the square, which is general relativity. It occupies the coordinate (c,G), and finishes off our square.</p>
<p>So that’s great, and gives us two dimensions for physicists to explore frameworks. However, you can probably guess what we’re missing: the quantum.</p>
<h2 id="bring-in-hbar">Bring in &hbar;</h2>
<p>I did call this essay <em>Physics on a Cube</em>, so it’s probably not surprising that we have another dimension to add. This comes through the addition of quantum theory, one of our most sophisticated frameworks which was developed in large part throughout the 20th century. There’s a lot to say about quantum theory as well, but we’re just going to explore the vertices of the cube, so we’ll save in-depth explorations for another essay.</p>
<p>The reduced Planck constant &hbar; is a very small quantity in units that we care about. As such, you might hear that quantum effects take place on the smallest levels of atoms, nothing close to the macroscopic scale of humans. While perhaps true, this hides the fact that quantum theory is the best description of the universe we have. And in principle, quantum effects <em>could</em> be seen on large scales, it’s just that quantum effects tend to be delicate, and so get washed out in our daily lives.</p>
<p>For our discussion though, the net result of discovering quantum effects is that our square of options becomes a <em>cube</em>. The new dimension corresponds to “turning on” the quantum effects by having &hbar; move away from zero. Therefore, our space now looks like this (I’ve rotated the direction of some of the axes).</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1593958440/Blog/TheoryCube.png" style="zoom:40%;" /></p>
<p>On the bottom, we have our original square. But now we can go up a level and make everything quantum. This gives us some new vertices to explore.</p>
<p>First, we have the vertex (0,0,&hbar;), which describes the quantum theory physics students begin with: non-relativistic quantum mechanics. This is where you study the Schrödinger equation. No effects from relativity are added. What you usually study here are harmonic oscillators and a few simple “well” potentials. Even though we only have one framework “turned on”, there are still a bunch of phenomena that we get to explore. For example, this is where students learn about superpositions, tunneling, probability distributions, and much more.</p>
<p>Next, we can turn on relativity, which brings us to (c,0,&hbar;). This is what happens when you take quantum mechanics and make it relativistic. <em>Everything</em> becomes a quantum field, and so the corresponding theory is called <em>quantum field theory</em>, or QFT. This is something I learned about during PSI, and it involves a lot of advanced mathematics and somewhat-sketchy-at-times prescriptions. I know there’s a lot of work being done in quantum field theory to give it a solid mathematical footing, and it seems to be where mathematical physicists often end up. I don’t have a ton to say about quantum field theory because I don’t had much experience working with it. Our best theories of physics are quantum field theories, and these encompass the Standard Model of particle physics, which is our theory for the zoo of particles that we know of in nature.</p>
<p>This vertex doesn’t include gravity, so the idea is that we are looking at quantum field theory in flat spacetime. That’s a fancy way of saying that our spacetime isn’t curved very much, so we can ignore gravitational effects as a good approximation. When you look at the mathematical guts of these theories, the Dirac equation and the formalism of an action principle pops up all over the place. These are sophisticated techniques that incorporate the invariance needed when dealing with relativity, which the Schrödinger equation doesn’t incorporate.</p>
<p>One other interesting thing to note is that there isn’t much being done to explore the (0,&hbar;,G) vertex (the question mark on my diagram). From what I could find, this is kind of an ignored corner. This shows that the cube is more of a useful construct than something fundamental about reality.</p>
<p>It’s important to realize that this cube serves as a nice illustration, but you shouldn’t take it too seriously. There are some important questions that the cube leaves unanswered. As always, one of my favourite physics writers <a href="https://backreaction.blogspot.com/2011/05/cube-of-physical-theories.html">Sabine Hossenfelder wrote</a> about the cube of theoretical physics almost a decade ago. I would definitely recommend looking at her post, since she gives some good critiques of this illustration. For example, can you “traverse” the cube in any way, or are there differences that accumulate? Also, are the choices of axes suitable for this kind of discussion? Reading her article will help answer those questions. I prefer to look at the cube as a way to hold information in my mind, rather than an actual manifestation of how the physical theories interconnect.</p>
<p>But now, we have one more corner to visit.</p>
<h2 id="the-far-corner">The far corner</h2>
<p>It’s time to turn on all three frameworks: relativity, gravitation, and the quantum. Doing so brings us to (c,G,&hbar;), the realm of quantum gravity.</p>
<p>Well, almost. First, it might be useful to talk about QFT in curved spacetimes. As the name suggests, we take QFT and go past using flat spacetimes. This might lead you to ask: Doesn’t this give us a theory of quantum gravity?</p>
<p>Not exactly. That’s because QFT in curved spacetime doesn’t incorporate <em>all</em> of the features of a dynamic spacetime. As such, I like to think of it as doing perturbation theory, where you take a theory you know and change it a little bit in order to get a more accurate description of your physical system. However, at the end of the day this is still not the full theory of quantum gravity. There are various reasons for this, but I won’t get into them for now. As such, you might think of QFT in curved spacetime as a marker along the edge to quantum gravity.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1593959832/Blog/QFTCurvedSpacetime.png" style="zoom:40%;" /></p>
<p>At the (c,G,&hbar;) vertex, a full quantum theory of gravity emerges. At the time of writing this essay, no such theory exists. Physicists are working on ideas, but these are nothing close to being complete. As such, while we know this vertex is there, we don’t have any way to actually get there.</p>
<p>I find this to be a fascinating state of affairs. We have this corner to reach, but we don’t know how to get there because we don’t have the mathematical tools to move along the edges between the corresponding vertices.</p>
<p>This is worth repeating. Moving from one vertex to another isn’t a matter of just throwing in a new constant. There’s often a conceptual breakthrough (like Einstein with relativity and the existence of a maximum speed) as well as mathematical techniques used to move along the cube (like path integrals in quantum field theory). A question on my mind is then: What kind of new breakthrough will be needed for a theory of quantum gravity?</p>
<p>Before we get there though, there’s some unfinished business left for us with our cube.</p>
<h2 id="another-dimension">Another dimension?</h2>
<p>So far, we’ve talked about relativity, gravitation, and quantum theory. These are huge pillars within theoretical physics. But they aren’t the only ones. In fact, I’m sure that I’ve offended many physicists and students with this discussion, because I’ve left out one <em>huge</em> area of physics: statistical physics.</p>
<p>We could talk about the constant of interest being the Boltzmann constant k<sub>B</sub>, but that’s not what’s important here. Instead, when we look at statistical and condensed matter physics, what makes this field special?</p>
<p>The number of particles, N.</p>
<p>It’s not a fundamental constant like the others, but it <em>is</em> crucially important when studying physical systems. That’s because many systems in condensed matter exhibit <em>emergence</em>. You may have heard of emergence as the idea that the whole is greater than the sum of its parts. In physics, the spirit is similar. Emergence means we get phenomena that only occur when enough particles have assembled together in the right state.</p>
<p>The classic example is that of a phase transition. The idea is that you have a bunch of particles (or degrees of freedom in your system), and there’s some parameter that controls how they behave, such as the temperature. Imagine moving that parameter along a slider, changing its value and looking at what the system does. For most small changes, the system will respond a bit, but nothing drastic will happen. However, sometimes this parameter will have one (or more) critical values, where sliding across that point will yield dramatic effects to the system you’re observing. For most physics students, this gets examined with the Ising model, which is a great conceptual tool for thinking about phase transitions, and one I hope to write about in a future essay.</p>
<p>So even though N isn’t a constant of nature, there’s still a very real sense in which the number of particles in your system matters. Furthermore, instead of thinking of N as the number of particles, it can be useful to think about it in terms of taking a <em>continuum limit</em>. This means that N represents the number of degrees of freedom in your system, and going to the large N limit implies that there are an infinite number of degrees of freedom.</p>
<p>The mathematical tool used here is called <em>renormalization</em>, a technique that lets us wrangle with potentially infinite degrees of freedom. To think about this, imagine you’re tallying the votes of a huge population. You could individually count the votes, but what you could also do is group people together. Then, each group gets a <em>single</em> vote which corresponds to the majority of the group. This would reduce the number of votes needed, and you could even do this over and over again at many levels, eventually making it much easier to tally the votes.</p>
<p>Renormalization is a fascinating subject, but for our purposes here it gives us a way to handle a large number of degrees of freedom.</p>
<p>If we add N to our dimensions, we would get a <em>hypercube</em> in four dimensions. I won’t draw it here because I’m running out of room to annotate things, but you can imagine it as another cube connected to the vertices of the existing one. Basically, you tack on another dimension index for all of the points and cycle through the combinations. For example, single-particle non-relativistic quantum mechanics would occupy the vertex (0,0,&hbar;,1). Note here that, while the previous dimensions can be “on” or “off” (zero or one), for N, the more natural thing is looking at one particle or an infinite number of them (the continuum limit).</p>
<p>Now, a full theory of quantum gravity would also incorporate taking N to be very large. I won’t go into what the other vertices might mean, because I haven’t seen much on them anyway.</p>
<p>In the literature, this is known as the Bronstein cube, after the physicist Matvei Bronstein<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. Well, the cube is the one without N. When you include N, it becomes the Bronstein hypercube. The full theory of quantum gravity we are after then lies on the vertex where all of these dimensions are turned on.</p>
<hr />
<p>Again, there are subtleties at play here that I haven’t treated in depth. Do all of these vertices make sense? Can you traverse the cube (or hypercube) in any way, or does the manner in which you traverse it matter? This is a tricky question of commutation of limits, which often is not as straightforward as we would like.</p>
<p>One aspect that seems clear to me is that going from a small number of degrees of freedom to a high one (low to high N) requires renormalization. By design, that “erases” some information. Thinkin about the voting example again. If the groups are made up of nine people and the two choices are 0 or 1, seeing a 1 at the group level just means there were at least five people who voted for a 1. How many people exactly remains a mystery. As such, I would speculate that taking the continuum limit is a one-way street.</p>
<p>Finally, there’s the possibility that something more is waiting for us to discover it. We might not know how to traverse the hypercube at the moment, but maybe there are directions we don’t yet know how to take. Of course, this is speculative in the extreme, but it also makes me hopeful. Even if physics is often seen as a reductionist science, finding a whole new dimension (framework of physical theories) would be exciting.</p>
<p>In the end though, I like to think of the cube of physics as a conceptual tool. In fact, when you have a bunch of binary options, organizing things along a cube makes a lot of sense<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>. Much like it’s worth taking the time to organize your possessions so that they are easy to find, organizing the frameworks of theoretical physics helps me keep the main pillars ordered. I don’t want to take the cube <em>too</em> seriously. Instead, as <a href="https://backreaction.blogspot.com/2011/05/cube-of-physical-theories.html">Sabine Hossenfelder tells us</a>, “All together, the “cube of theories” is a very appealing representation. But do not wonder if it confuses you – it has to be taken with a large grain of salt.”</p>
<p>Next month: The Curse of Dimensionality.</p>
<h2 id="endnotes">Endnotes</h2>
<h2 id="references">References</h2>
<ol>
<li>For a nice introduction on the idea of phase transitions geared towards a general audience, see <a href="https://www.quantamagazine.org/the-cartoon-picture-of-magnets-that-has-transformed-science-20200624/">this recent Quanta Magazine article by Charlie Wood</a>. You won’t get all the details of the system in question, but it does illustrate what a phase transition is.</li>
<li>The Bronstein hypercube of quantum gravity by Danielle Oriti. <a href="https://arxiv.org/abs/1803.02577">arXiv:1803.02577</a>. Section V starts talking about using N as another dimension for the hypercube.</li>
<li><a href="https://backreaction.blogspot.com/2011/05/cube-of-physical-theories.html">“The cube of physical theories”</a>, by Sabine Hossenfelder. I would always recommend checking out her blog, Backreaction. It’s a compendium of useful scientific information with a no-nonsense attitude. I love her writing because she doesn’t shy away from pointing out when things are wrong or misleading.</li>
</ol>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Which, of course, is also a product of its time. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>A translation of the original can be found on the <a href="https://royalsocietypublishing.org/doi/10.1098/rstl.1677.0024">Royal Society’s webpage</a>. If you want to read the document and don’t have access, Archive.org has it <a href="https://archive.org/stream/philosophicaltra02royarich#page/397/mode/1up">here</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Just to clarify, different conventions are used here for the axis. The idea is that being at the point labeled (c,…) means we are in the realm of special relativity. On the other hand, going to the point (0,…) means we are taking the speed of light to go to infinity. This is why the axis is labeled 1/c. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>I could not locate the original article by Bronstein. As is the case with a lot of old papers, they have citations from other resources, but there is no copy to be found. It took enough work to find Bronstein’s first name, which I did with the help of Reference 2. <a href="http://people.bu.edu/gorelik/MPBronstein_100/MPBronstein_100.htm">Here</a> is a page dedicated to Bronstein and his work. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Just as I was writing this essay, Grant Sanderson of 3Blue1Brown <a href="https://youtu.be/wTJI_WuZSwE">posted a video</a> on using the notion of a cube to solve a puzzle involving a chessboard. As usual, he includes some nice visual animations to show just how having using a cube (or hypercube) lets you visualize a bunch of binary choices. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Fri, 25 Sep 2020 00:00:00 +0000
https://cotejer.github.io//physics-on-a-cube
https://cotejer.github.io//physics-on-a-cubeA Game of Loops<p>When I hear the word “quantum”, I think of all the misconceptions and crazy ideas people associate with it in a lot of popular media. Physicists are great (and terrible) at coming up with names, and the word “quantum” is such an example of a word with a lot of baggage attached. Pair it with the word “computer”, however, and the misconceptions skyrocket, sometimes turning into full-blown hype. The reality (at the time of this writing) is much more modest: quantum computing presents an <em>opportunity</em> for thinking of computation differently, and the subsequent years will see how this plays out when theory meets experiment and engineering.</p>
<p>There’s a ton to talk about when it comes to quantum computing, but in this essay, I want to share with you something called a <em>quantum error correcting code</em>. It does what it says on the tin, and corrects errors that can accumulate during a computation. There are many such proposals, but one of the most popular is called the <em>surface code</em>, whose name will make sense as we dive into the details. The surface code is a proposal for how we can build a quantum computer that is robust to errors, but is only one step in the process. This essay is devoted to the surface code, how it works, and the challenges it faces when it comes to implementation.</p>
<!--more-->
<h2 id="quantum-information-in-a-nutshell">Quantum information in a nutshell</h2>
<p>To really get why we even care about quantum computers, you need to know about quantum information. I would write an essay about it, but thankfully, the awesome Michael Nielsen has done this (and a few more) on his website <a href="https://quantum.country">Quantum Country</a>. There, he goes into a great amount of detail and writes beautifully, so I would recommend you start with his first essay there if you need more details.</p>
<p>For our purposes, we will only need the basics. In quantum theory, a <em>qubit</em> (a portmanteau of “quantum bit”) is just a system that can have two levels. If you’re trying to implement a qubit in a laboratory, you might want to use an atom which has a ground state and an excited state. This gives you the two levels required for a qubit.</p>
<p>The way physicists write this is as such:</p>
<p>|ψ〉 = a|0〉 + b|1〉, |a|<sup>2</sup> + |b|<sup>2</sup> = 1.</p>
<p>The symbol |⋅ 〉 is used to denote vectors<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, with |0〉 and |1〉 being the basis vectors. The coefficients are complex numbers whose sum of squares adds to one. That’s a qubit, for our purposes.</p>
<p>Notice how similar this is to the bit, the classical counterpart to the qubit. A bit, as you might know, is just a 0 or a 1. There are two states, and that’s it. The same is true for our qubit, except we can <em>also</em> have combinations of the basis vectors. This is different from the bit, and it’s part of the reason as to why people are excited for the possibilities of quantum computing.</p>
<p>Another structure that’s important is the <em>tensor product</em>. This is what lets us combine multiple qubits together into a system. If we have two qubits, the way to combine them is: |ψ〉 ⊗ |φ〉.</p>
<p>The details of the tensor product aren’t necessary for us here, except for the fact that it lets us treat the combined system as a sort of “concatenation” of the individual components. When we will later have operators acting on our qubits, we can basically act the operator on specific qubits, while doing nothing to the others.</p>
<p>So in a nutshell, that’s what quantum information is. We get states that are combinations (in the physics lingo, we say “superpositions”) of the basis states, which means we have more possibilities than with bits. This is what enables protocols like <a href="https://quantum.country/teleportation">quantum teleportation</a> and <a href="https://quantum.country/search">quantum search</a> (seriously, Michael Nielsen is awesome, and it’s worth checking out his work).</p>
<p>With these tools in hand, let’s get to the difficulties of quantum computing, which will lead us to the surface code.</p>
<h2 id="challenges-of-quantum-computing">Challenges of quantum computing</h2>
<p>As we’ve seen, quantum information is different enough from classical information that we cannot use all of the tricks we’ve developed for classical computers when trying to make a quantum computer work.</p>
<p>No matter what system you use though (classical or quantum), your device will have errors that occur. If anything, we can appeal to the law of experiments: anything a theorist thinks up will <em>always</em> be imperfect in the laboratory. Despite what theoretical physicists are taught in school, you can’t simply wish away every source of noise or error in a system. In the real world, we need to take them into account. Moreover, if we didn’t design our computers with noise in mind, they would be incredibly fragile to perturbations. Not something you want when manipulating sensitive information like in your bank account.</p>
<p>In classical computing, correcting errors is straightforward. Let’s use the simplest example in the book: the three-bit repetition code.</p>
<p>Imagine your friend asks you over the internet if you want to play squash or go running later, with the instructions of replying 0 for squash and 1 for running. As a runner, you don’t want to be stuck within four walls for the next hour, so you send the message 1. Unfortunately, your device is getting old and prone to messing up, so it sends your friend a 0 instead. Your friend then shows up to your place with their racquet while you are lacing up your running shoes. Not a great situation.</p>
<p>What went wrong? Well, if your friend receives a 0 or a 1, how are they supposed to know this is what you sent? They need to be reasonably confident that there’s no way for the information to get corrupted as it passes from you to them. But if there’s some probability <em>p</em> that the bit gets flipped, it’s difficult to say what the real message was (unless <em>p</em> is super small).</p>
<p>It would be much better if there was some sort of redundancy built into the message that could increase your friend’s confidence that the received message is what you sent. At the very least, it would be nice if they could tell if something went wrong, and knew how to fix it. This is precisely the idea of error correction.</p>
<p>Here’s a strategy that works better. Let’s assume <em>p</em> is small, but not super small. Instead, of replying with 0 or 1, you’re going to encode those two messages within larger messages. In particular, you will use 000 to represent the message “0”, and 111 to represent the message “1”.</p>
<p>Now, even though the probability of flipping a bit is <em>p</em>, it’s going to take a lot more to scramble the message! If we want the wrong message to be sent, each of the individual bits has to be flipped. Sending 000 requires each of them to flip to become 111 (the wrong message). Assuming that errors are independent (unlikely in real life, but it illustrates the point), this will occur with probability <em>p×p×p = p<sup>3</sup></em>. If <em>p</em> is already small, this new value will be much smaller. For example, if <em>p = 0.01</em>, then <em>p<sup>3</sup> = 0.000001</em>, which is very small. By repeating the message three times, you’re able to suppress the chance that the wrong message gets sent.</p>
<p>But what if you send 000 and your friend receives 001? Well, if the two of you agreed that the only messages you could send are 000 and 111, then there are two cases. If you sent 000, getting to 001 requires flipping the last bit, which happens with probability <em>p</em>. On the other hand, if you sent 111, getting 001 requires flipping the first and second bits, which occurs with probability <em>p<sup>2</sup></em>. Again, if <em>p</em> is already super small, then <em>p > p<sup>2</sup></em>, so your friend can be confident that you sent 000. It won’t work <em>all</em> the time, but it will work often enough (to the extent that you make sure your equipment has a low error rate!). This is called “majority rules”, since the intended message is the one that appears the most in the string of bits. In the error correction jargon, the code we just came up with is able to correct single bit errors (remember, if two bits get flipped, then we will mistakenly “correct” to the opposite message).</p>
<p>Look at what we have here. By increasing from one bit to three bits, we’ve come up with a way to ensure (probabilistically) that the message which gets sent is the one that is eventually received. The recipient might have to think a little bit when the message is garbled with 0s and 1s, but they can do it.</p>
<p>The snag is that this doesn’t quite work in the quantum case. In particular, there’s a key issue which plagues a lot of work in quantum theory: measuring systems often collapses their superpositions. Collapse the superposition, and you’re just dealing with essentially classical information all over again.</p>
<p>Where did we use measurement in the protocol I’ve described above? It’s in the way that the recipient looked at the message and made a judgment based on that. When they go from 001 back to 000, they need to physically <em>look</em> at the message. This act of looking is the equivalent of performing a measurement on a quantum system, and that’s bad news when it comes to error correction. After all, how are we supposed to know what to do to corrupted information if we can’t look at it?</p>
<p>There are other problems too. We can’t just implement the repetition code stated above, because of something called the “No-Cloning theorem”. This means we can’t just copy our messages three times and let the recipient use the majority rules selection. We’ll have to be more clever than this. To make matters worse, the kinds of errors you can have on a qubit are much more numerous than with a bit. That’s because a bit can be flipped from 0 to 1 or vice versa. With a qubit, we can do a bunch of things to the coefficients <em>a</em> and <em>b</em> of the quantum state, as long as we keep the qubit normalized. Since the two coefficients have the constraint that they lie on a circle, this means we have a <em>continuum</em> of possible errors. We crafted the repetition code above to detect a bit flip error, but how are we going to correct all of these possible errors?</p>
<p>These both make things look bad. And we haven’t even discussed the engineering challenges of making a qubit! The good news is that these challenges <em>are</em> surmountable (well, the engineering question is still open). I won’t go into all of the details for how we solve these problems, but I will touch on them briefly here.</p>
<p>For the issue of cloning states, the problem is that we can’t have a quantum operation that performs ψ⊗φ → ψ⊗ψ. In particular, this means you can’t encode your message by just sending three copies of the same state (unless it’s a very particular state). I won’t do the No-Cloning theorem justice here, so I will refer you to this <a href="https://www.physics.umd.edu/studinfo/courses/Phys402/AnlageSpring09/TheNoCloningTheoremWoottersPhysicsTodayFeb2009p76.pdf">short and sweet proof</a>. So a straightforward repetition code is out of the question. Thankfully, there are similar constructions that work, and are used to great effect.</p>
<p>The discretization of errors is one that will be relevant to us, so I’ll talk about this a bit more. The big idea is that quantum states evolve like this: ψ’ = Uψ, where <em>U</em> is a unitary matrix describing, in our case, an error that has occurred. In the single-qubit case (this also generalizes to multi-qubit states), the matrix is 2×2. What we can do is then perform a Taylor expansion of <em>U</em> around some small perturbation. Once you do that, you can use the fact that the set <em>{I, X, Y, Z}</em> (the Pauli matrices along with the identity) forms a basis for 2×2 matrices, which means we can describe the error as being the result of a linear combination of them (to first order). If the perturbation is small—which makes sense for error correction since we want to get back to our original state and having it pushed too far away will mean we likely mix it up with another state—we can only look at these first terms in the expansion. We’ll have something that looks like:</p>
<p>ψ’=(I + ε(sum of Pauli errors) )ψ.</p>
<p>To get back to our original state, all we have to do is apply the same Pauli operators again. Since they each square to the identity, we can recover the state and correct errors. Plus, since quantum mechanics is a linear theory, everything works out perfectly.</p>
<p>It’s worth pausing to appreciate what this means. We’ve essentially chopped up infinity into a finite number of slices. The matrix <em>U</em> could have described <em>any</em> error, and the fact that the Pauli matrices can describe all 2×2 matrices came to our rescue. This means we’ve turned a problem of continuous errors into a discrete set: <em>{I,X,Y,Z}</em>, a pretty cool feat.</p>
<p>Now that we’ve talked about some of the challenges inherent with quantum computing, we’re ready to dive into the lovely quantum error correction structure which is the surface code.</p>
<h2 id="parities-and-the-surface-code">Parities and the Surface Code</h2>
<p>One of the easiest algebraic systems to get a feel for is that of a switch. There are two modes: on and off, one or zero. It’s simple, it’s straightforward, and even though it might be unfamiliar to some at the beginning, it doesn’t take a whole lot of practice before this system becomes ingrained.</p>
<p>Computers are built on bits, and we all know how powerful computers can be. Even with such a simple algebraic system, bits can give us a lot of flexibility when it comes to computation. In terms of an addition and multiplication table, the binary system works as follows:</p>
<table>
<thead>
<tr>
<th style="text-align: center">+</th>
<th style="text-align: center">0</th>
<th style="text-align: center">1</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><strong>0</strong></td>
<td style="text-align: center">0</td>
<td style="text-align: center">1</td>
</tr>
<tr>
<td style="text-align: center"><strong>1</strong></td>
<td style="text-align: center">1</td>
<td style="text-align: center">0</td>
</tr>
</tbody>
</table>
<table>
<thead>
<tr>
<th style="text-align: center">×</th>
<th style="text-align: center">0</th>
<th style="text-align: center">1</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><strong>0</strong></td>
<td style="text-align: center">0</td>
<td style="text-align: center">0</td>
</tr>
<tr>
<td style="text-align: center"><strong>1</strong></td>
<td style="text-align: center">0</td>
<td style="text-align: center">1</td>
</tr>
</tbody>
</table>
<p>What’s nice about using bits is that they can encode anything that can take one of two states. In particular, bits can encode the <em>parity</em> of a whole number, which is just a fancy way of saying if it’s even (we use the bit 0) or odd (we use the bit 1).</p>
<p>What I want to show you in this essay is how this idea of parity can let us build codes that can be used for quantum computers. It’s all about asking the right kind of question, and the notion of parity will let us do so.</p>
<h2 id="stabilizers-and-measurement">Stabilizers and measurement</h2>
<p>There are many challenges for quantum computing. As I mentioned above, these include being unable to copy arbitrary quantum states and avoiding measurement because this collapses the quantum state (destroying the superpositions we want). We will build on these considerations in this essay.</p>
<p>With the surface code, the idea is simple: find a way to encode the information of a qubit while also allowing us to detect errors that occur.</p>
<p>Because of the discretization of errors, the possible errors on our lattice can be be anything comprised of <em>I</em>, <em>X</em>, <em>Y</em>, and <em>Z</em>. This includes multiple-qubit errors. For example, if we had three qubits, an error operator could look like <em>X<sub>1</sub>Y<sub>2</sub>I<sub>3</sub></em>, where the structure linking them together is the tensor product. This simply means the first qubit would have an error <em>X</em> being applied to it, the second qubit would have <em>Y</em>, and the final qubit would have <em>I</em>. Sometimes the <em>I</em> operator is omitted in order to keep things compact, particularly when there are a lot of qubits.</p>
<p>Let’s start building up our surface code. The first ingredient is to arrange our qubits on a lattice, with each physical qubit on a lattice site. We will use <em>N</em> to count the number of physical qubits per lattice side, so there will be a total of <em>N<sup>2</sup></em> physical qubits. Also, due to some technicalities with boundary conditions, <em>N</em> has to be an odd integer greater than one (we’ll see why soon).</p>
<p>Doing this gives us a lattice that looks like so:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1589918249/Blog/SurfaceCode.png" class="centre-image" /></p>
<p>We now want to define our stabilizers (the checkerboard pattern). The stabilizers are a way to encode multiple states of the physical qubits into one <em>logical</em> state, giving us a more robust way to do a computation. The important part here is that the stabilizers are a set of operators which commute with each other and preserve the quantum state. (Note that, in the mathematical terminology, the <em>group</em> of operators is called the stabilizer, and the group elements are called stabilizer generators. This gets to be a mouthful though, so I’ll simply refer to the operators as stabilizers.) In particular, if ψ is our quantum state, than any stabilizer <em>M</em> we take has the property that Mψ = ψ. This tells us that the stabilizers don’t do anything to the underlying quantum state. They “stabilize” it. In more technical terms, the state <em>ψ</em> is an eigenvector of the stabilizer <em>M</em>, with eigenvalue +1. The surface code is then defined as the quantum state of the lattice of N<sup>2</sup> physical qubits where the stabilizers are all in their +1 eigenstate.</p>
<p>I don’t know about you, but I like this topic just for the beautiful diagrams! It isn’t only visually appealing; it also encodes the stabilizers themselves. The great thing is that you can understand it in both a visual and mathematical sense. It’s worth taking the time to do both.</p>
<p>Graphically, the stabilizers are defined as such:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1589918249/Blog/PlaquettesDefinition.png" class="centre-image" /></p>
<p>The squares and half-moon shapes are what we call <em>plaquettes</em>, and the colours represent either an <em>X</em> stabilizer (green) or a <em>Z</em> stabilizer (blue). They work by having an <em>X</em> or <em>Z</em> operator on each lattice site, in the tensor product structure I mentioned before. So a square <em>X</em> stabilizer would look like <em>XXXX</em>, where the operators act on the qubits touching that square. The same is true for the <em>Z</em> stabilizers. (Why aren’t there any stabilizers associated with the Y Pauli operator? See here<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.)</p>
<p>What’s up with these weird half-moon plaquettes? It’s because of boundary conditions. The surface code has open boundary conditions, which means we have to modify things a bit. Having these operators along the edge ensures that everything works correctly.</p>
<p>It’s also not a coincidence that the stabilizers are either four Pauli operators or two. We’ll get to this soon.</p>
<p>Mathematically, the stabilizers are defined as:</p>
<p>X<sub>p</sub> = ⊗<sub>v∈G<sub>p</sub></sub>σ<sup>x</sup><sub>v</sub> and Z<sub>p</sub> = ⊗<sub>v∈B<sub>p</sub></sub>σ<sup>z</sup><sub>v</sub>.</p>
<p>The symbol <em>p</em> represents the plaquettes, <em>v</em> are the lattice sites, and <em>σ<sup>x</sup></em> and <em>σ<sup>z</sup></em> are the Pauli <em>X</em> and <em>Z</em> matrices (I’m using a slightly different notation here to clarify the difference between <em>X<sub>p</sub></em>, <em>Z<sub>p</sub></em> and the Pauli matrices themselves).</p>
<p>Let’s break down the notation, which can be a little intimidating the first time you see it. The <em>G</em> and <em>B</em> simply stand for the green and blue plaquettes that make the lattice, and the subscript <em>p</em> is a way of enumerating them. It’s always good to have a way of listing things, and the indices do just that. Next, the symbol ⊗ is just the tensor product, which I mentioned above. This tells us that there are are going to be multiple operators “concatenated” together, which is exactly what I wrote above when I gave the example of a stabilizer being <em>XXXX</em>. The subscript on the tensor product symbol tells us which vertices we are considering when building our stabilizer. For any plaquette <em>p</em>, there will be either two or four vertices which touch it. The <em>v∈ p</em> notation tells us to take <em>all</em> of those vertices when building the stabilizer. Finally, we have the Pauli operators <em>X</em> and <em>Z</em>, with indices that tell us which qubit they act on. The following handy diagram summarizes all of this. (There are “hats” on the symbols because they are technically operators. I don’t have them in the text here because they are tricky to format.)</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1589918249/Blog/StabilizerEquations.png" class="centre-image" /></p>
<p>We should check that these stabilizers actually commute with themselves, as a good stabilizer code should do (because we want to measure their eigenvalues simultaneously to detect errors). We can go through a couple of cases and make sure that things work out. Before we do, it’s important to remember the rules of Pauli matrices: they anticommute with each other, which is a fancy way of saying that XZ = −ZX.</p>
<p>Suppose we’re looking at two stabilizers that are far away from each other, like in the following diagram:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1589918249/Blog/FarAwayPlaquettes.png" class="centre-image" /></p>
<p>Then, it shouldn’t be too surprising that nothing goes wrong here. Since they are separated, it doesn’t matter what stabilizer we apply first. That’s because the two stabilizers won’t “interact”. Any Pauli operator on the first stabilizer will meet an identity from the second one, and since the identity commutes with everything, there’s no issue.</p>
<p>The more interesting case is when two stabilizers are adjacent, which means some of their operators <em>do</em> overlap:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1589918249/Blog/CommutingPlaquettes.png" alt="" class="centre-image" /></p>
<p>Here, the situation looks more complicated, but not by much. First, because of the checkerboard pattern of the surface code, two adjacent stabilizers are of different colour. This means we have <em>XXXX</em> and <em>ZZZZ</em> (I’m dealing with the squares here, but you should convince yourself that the half-moon plaquettes work the same). These stabilizers are squares, which means <em>exactly</em> two operators will overlap. It’s always two, because the only alternative is one (where the squares are diagonal from each other), but this cannot happen because those types of arrangements are always with the same colour. The other operators can be freely moved like in the previous case. For the ones which overlap, the anticommutation relations will bring in a negative sign when we swap them.</p>
<p>But notice how many of these we have! There will always be <em>two</em> such pairs, which means we’ll have two negative signs coming out of the calculation. And since we all know that two negatives make a positive, the stabilizers <em>will</em> commute. So in fact, the entire group of stabilizers commutes. In the diagram above, this corresponds to swapping the overlap between <em>X</em> and <em>Z</em> circles (each swap gives a minus sign).</p>
<p>To finish things off, we define our quantum state to be in the ground state of some fictitious Hamiltonian <em>H</em> (see here<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>). Don’t worry if you aren’t familiar with the physics jargon. In our case, it just means that we want our quantum state to be the one in which all of the stabilizers have a +1 eigenvalue. Enforcing this will lead us to detecting errors, which are excitations or deviations from the ground state.</p>
<h2 id="parities-and-errors">Parities and errors</h2>
<p>I just finished saying that we want all of our stabilizers to have eigenvalue +1. What does that mean?</p>
<p>For our purposes, an eigenvalue is just a measurement outcome. These are implemented using helper qubits<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> which essentially measure the eigenvalues of Pauli operators from the stabilizers. As an example, if I had a stabilizer of the form <em>XXXX</em>, the measurement would return me a value of <em>(±1)(±1)(±1)(±1) = ±1</em>, depending on the eigenvalue for each physical qubit it acts on.</p>
<p>You can see the binary arithmetic playing out here. Because of the way the stabilizer measures eigenvalues, we are always going to have a value of <em>±1</em> at the end. Furthermore, we can play a counting game on the plaquettes. If there are an even number of errors on a plaquette (0, 2, or 4), then the stabilizer’s eigenvalue when measured will be +1. On the other hand, an <em>odd</em> number of errors will mean −1 shows up an odd number of times in the product, giving an eigenvalue of −1. Since we are assuming that the surface code begins in its ground state with stabilizer eigenvalues of +1, we know that a measured eigenvalue of −1 means there are an odd number of errors that appeared on that plaquette. Sometimes, this is called a “violation” of a stabilizer.</p>
<p>To get a sense of this, the following are some examples of errors on the lattice, and how they show up with a flipped eigenvalue. The graphical approach is to place a red dot in the middle of the plaquette when there is an odd number of errors present on the lattice. When there’s an error present, I’ve drawn it with a corresponding yellow circle and an X. Note that there are both X and Z errors that can occur, but to keep the diagrams clear, I’m only looking at X. The situation would be similar with Z errors, except the red dots would now appear on the blue plaquettes (because those are defined with X operators).</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1589918249/Blog/ErrorExample.png" class="centre-image" /></p>
<p>This gives us a way of detecting errors on the surface code. The state is protected if there are an <em>even</em> number of errors on a given plaquette, but it gets corrupted if we have an odd number of errors on a plaquette. If we go back to our handy repetition code, this is what happens when we send a message 000 and the recipient reads out 010. The state is corrupted, but it’s clear that something has happened to it (we detected an error). These errors are described using the idea of a <em>syndrome</em>, which is a vector that encodes which plaquettes have eigenvalue −1 (an odd number of errors).</p>
<p>I like to think of this vector as having a component of zero if the eigenvalue remains +1, and a component of one if the eigenvalue flips to −1. It doesn’t really matter what you use to denote it, as long as you can differentiate between a plaquette with a red dot and one without. Furthermore, I find it convenient to split the syndrome up into a part for the green plaquettes and a part for the blue plaquettes. So, if we were looking at X errors only, a syndrome could look like the example below. Again, this is just a convenient way to write it out (particularly when you’re writing code!). Also note here that there are black dots on each lattice site to indicate the physical qubits. This is just an artifact of how I was presenting things with the software. You can pretend like they aren’t there.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1589918249/Blog/SyndromePlaquettes.gif" alt="" class="centre-image" /></p>
<p>The game now is to figure out how to get our state <em>back</em> into the one that has no red dots (the syndrome goes to the zero vector).</p>
<h2 id="loops-and-states">Loops and states</h2>
<p>Let’s say I give you the following error configuration of the lattice, with the corresponding syndrome.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1598384471/Blog/ErrorExample2.png" alt="" class="centre-image" /></p>
<p>How can you apply Pauli operators to the lattice so that the red dots vanish?</p>
<p>If you’re familiar with the Pauli operators, you might have an answer: just apply the same configuration to itself! This takes advantage of the fact that the Pauli matrices are <em>idempotent</em>: they square to the identity. Applying X to a lattice site which already has an X error will give <em>X × X = I</em>. And voilà, correction complete!</p>
<p>That’s great, but there’s a small hitch: the error configuration isn’t available to you. That’s because you can’t directly measure the system (or else you could ruin the superposition it might be in). This is precisely why we introduced the stabilizers. They allow us to extract <em>some</em> information about the system without disturbing it from doing what it needs to do. So if you were a person working on a hardware implementation of the surface code, all you would see when you have errors is this.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1598384471/Blog/BareSyndrome.png" alt="" class="centre-image" /></p>
<p>This is a bit inconvenient, since you don’t know exactly where you should apply the Pauli operators for corrections. Still, we saw above that all it takes to make a red dot disappear is to apply another error to that plaquette. This will change the parity of errors on the plaquette from odd (there’s a red dot) to even (it goes away). The issue is that applying an operator on most of the blue plaquettes does this:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/video/upload/q_auto:best/v1589982850/Blog/MovingSyndrome.gif" alt="" class="centre-image" /></p>
<p>Sure, we got the error to go away, but we’ve just kicked it somewhere else! You can try to experiment a little, and you will see that this happens almost everywhere you look. Like trying to pick up a slippery object, the red dots will simply move away from you as you attempt to make them vanish.</p>
<p>You might have noticed that I said “almost everywhere”. It turns out that there are a few ways to make a syndrome go away without introducing <em>more</em> red dots in the process.</p>
<p>The first way is to remember our friends: the stabilizers. Applying these in groups can give us a clue about what we should do. Remember, the goal is to make the syndrome vanish (this is what I’m calling the syndrome with all-zeros as components), and we know that applying stabilizers to the ground state <em>also</em> doesn’t change the syndrome. So if we can take our error configuration and apply more errors to it such that the result is something that we can equally achieve with a bunch of stabilizers, than we’re in business. Remember, we can’t <em>actually</em> see the error configuration in practice, but it’s useful when analyzing the various scenarios.</p>
<p>Let’s take a bunch of stabilizers and apply them in one big clump to the lattice. Look what happens to the error configuration, remembering that applying the same operator on a site twice means it vanishes.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1589918249/Blog/Loops.gif" alt="" class="centre-image" /></p>
<p>Look at those chains form! The overall error configuration becomes a loop, with all of the sites inside having the same operator applied to them twice (giving the identity). This means that one way to correct the errors is to form a closed loop. The syndrome will vanish (because it means that the error + recovery configuration is just a product of a bunch of stabilizers).</p>
<p>This looks promising. In fact, you might exclaim that we’re able to correct every kind of error that can occur, as long as we form loops. One thing to keep in mind though is the following: for the surface code, the syndrome isn’t unique to one kind of error<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>.</p>
<p>Say we have our familiar syndrome.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1598384471/Blog/BareSyndrome.png" alt="" class="centre-image" /></p>
<p>With a bit of work, you might come up with an error configuration whose syndrome matches the one above. The problem is that with a little more work you could find <em>another</em> one (and many more). Here are just two examples.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/video/upload/q_auto:best/v1589982850/Blog/SameSyndrome.gif" alt="" class="centre-image" /></p>
<p>This means applying errors to get a loop depends on the error configuration you’re dealing with and not just the syndrome. But you only have access to the syndrome!</p>
<p>“Big deal,” you might say. “Who cares if I identify the right error? As long as the syndrome goes away, I’m good to go.”</p>
<p>And that is where we run into more trouble.</p>
<h2 id="logical-operators">Logical operators</h2>
<p>Despite the neat argument of parity for detecting errors, it turns out that this isn’t foolproof. That’s because the space carved up by the stabilizers doesn’t totally cover the list of operators which give no syndrome. There are some which aren’t a combination of stabilizers (or else they would belong in the group) and don’t show up when we check for errors.</p>
<p>This is both a good and a bad thing. It’s a good thing because having these other operators lets us manipulate our quantum state to do a computation. If we didn’t have them, we would be stuck with simply storing a quantum state. On the other hand, it’s bad because suddenly there ways to return back to a ground state that is different than the one we started with. While that’s good if we want to do a computation, it’s bad if we accidentally go into that state without knowing it. Worse, there’s no way we <em>can</em> know it, since we would have to know the original underlying state.</p>
<p>The idea is as follows. When you build a stabilizer code, the stabilizers are there to detect errors. But there will be operators that are not written as a product of stabilizers yet <em>still commute</em> with the stabilizers. They will go undetected if they get applied to your system, where “undetected” means the syndrome will be zero.</p>
<p>But remember our discussion above. When we were trying to build loops to make the syndrome go away, I mentioned that there are multiple underlying error configurations that give rise to the same syndrome (it’s not a one-to-one pairing). The problem is that applying the same recovery sequence to two different error configurations which have the same syndrome can make these other operators appear.</p>
<p>Let’s see this in action. Below, I’ve shown a syndrome, we well as the errors we applied to make the syndrome go away.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/video/upload/q_auto:best/v1589982850/Blog/ApplyErrors.gif" alt="" class="centre-image" /></p>
<p>Now here are two potential underlying error configurations, and the result of applying the recovery configuration.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/video/upload/q_auto:best/v1589982850/Blog/LogicalRecoveryFailure.gif" alt="" class="centre-image" /></p>
<p>On the left, we see the error + recovery configuration forming a loop, just as we expect. But on the right, something strange happens. Instead of forming the usual loop, it forms a loop that doesn’t close in on itself and stretches across the entire lattice. I like to think of it as still being a loop, except now it wraps <em>around</em> the surface code (think of a rubber band holding a deck of cards together).</p>
<p>The point is that this is a very different kind of loop. There are fancy mathematical names for this<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>, but the key idea is that this loop <em>cannot</em> be written as a product of the stabilizers, and is exactly the kind of object which commutes with them. This means the they won’t get detected with our quantum error correction scheme of returning the syndrome to a bunch of zeros, and they bring our quantum information into a new state.</p>
<p>Here’s how.</p>
<p>The tool we use to calculate if a loop has been placed sometimes goes under the name of a “Wilson Loop”. The idea is that this operator can detect if a chain of errors has crossed the lattice, bringing us into a new state.</p>
<p>I’ll focus on the X errors, but the same is true for the Z errors (though they are oriented along the other direction of the lattice). The boundary conditions of the lattice ensures that a chain of X errors going from the bottom to the top will not cause a syndrome. It’s worth pausing for a moment to digest this. Any time we place an X error, syndromes will pop up. But, because of the half-moon plaquettes being only one colour along a given boundary, placing an X error at the top, for example, won’t cause any of the half-moon plaquettes to get a red dot. The same is true for placing X errors at the bottom. Therefore, if we <em>connect</em> those two errors by making sure each blue plaquette is covered by two X errors, we’ll be fine. The Z case is the same, except that now we have to go <em>horizontally</em> across the lattice.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/video/upload/q_auto:best/v1590929735/Blog/LogicalOperators.gif" alt="" class="centre-image" /></p>
<p>It doesn’t matter if you have the loop going straight along a row, or if you have it zigzagging through the lattice. Almost any loop will do (provided they commute with the stabilizers), and that’s because you can actually “move” these loops by adding stabilizers to the mix. Doing so will just shift the loop, like in the following diagram.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/video/upload/q_auto:best/v1590009187/Blog/LogicalOperatorShift.gif" alt="" class="centre-image" /></p>
<p>But how do we detect these loops being applied? After all, they don’t show up in our syndrome, which was the tool we were using to detect errors.</p>
<p>The key is that we can implement another operator, which can detect the presence of a loop that goes across the boundary. It turns out that these operators are complementary: to detect if an X loop across the boundaries occurred, we use a Z loop, and vice versa.</p>
<p>Why does this work?</p>
<p>Let’s recall the two kinds of loops we can have (again, I’ll focus on X errors). They can either form loops <em>within</em> the lattice, or they can stretch across the boundaries (the rubber band on a deck of cards image). How does a horizontal Z loop detect the difference?</p>
<p>Again, parity comes to the rescue. Suppose we have a loop of X errors that looks like this.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1598384929/Blog/LoopExample.png" alt="" class="centre-image" /></p>
<p>The insight is to look at the <em>rows</em> of the lattice. Do you notice anything in particular about how many X errors there are? It’s an <em>even</em> number! This turns out to be a general feature of these kinds of loops<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>. The only way to get an odd number is to have the loop stretch across the lattice. This provides a nice way to cleanly distinguish between the two loops.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/video/upload/q_auto:best/v1590009187/Blog/Homology.gif" alt="" class="centre-image" /></p>
<p>One thing to keep in mind is that this is all assuming that we <em>know</em> the error + recovery configuration that was applied. We won’t actually know that while running the surface code on hardware. The reason I’m telling you about this then is because these techniques are used to build good software implementations for correcting errors (we’ll get to that shortly).</p>
<p>If we are just working things out with pencil and paper, we don’t need to say anything about using Z loops to detect X loops. Instead, we can just <em>look</em> at the X errors and count along the rows of the lattice. Just keep in mind that what we need to actually do is measure a Z loop in the horizontal direction, which will give us a product of eigenvalues that will be ±1, telling us which “sector” we are in with respect to the Z loop. The same story is true for using X loops to detect Z loops. They are intertwined. This means that the quantum information encoded in our entire lattice can be thought of as living in the two dimensional space carved out by the two loop operators.</p>
<p>These loop operators are what we call <em>logical</em> operators for the surface code. They manipulate the quantum information that is hidden within the <em>entire</em> lattice, not the states of the individual physical qubits. We call them logical X and logical Z because, like the usual Pauli operators, they have the same sort of algebra (they anticommute).</p>
<p>It turns out that the quantum information that is distributed across the whole lattice describes a qubit, which we call the logical qubit. A general feature of surface codes is that they have N<sup>2</sup> physical qubits, and only <em>one</em> logical qubit.</p>
<p>Here’s how it works. The lattice has N<sup>2</sup> physical qubits, and each one has two degrees of freedom (you can think of them as being spin-1/2 particles). This gives us a total of 2N<sup>2</sup> degrees of freedom. But then we have the stabilizers. How many are there? Well, if you look at the lattice, you can break the number of plaquettes up into bulk and boundary plaquettes. Let’s start with the bulk. If the lattice has length N, then there will be N–1 plaquettes on each side (the bulk ones). This means that there are (N–1)<sup>2</sup> bulk plaquettes in total.</p>
<p>Next, notice that each boundary has exactly 1/2 the number of half-moon plaquettes compared to the bulk plaquettes making up the row beside them. This means we have (N–1)/2 plaquettes per side, for a total of 4×(N–1)/2 = 2(N–1) half-moon plaquettes.</p>
<p>In total, we then have (N–1)<sup>2</sup> + 2(N–1) = N<sup>2</sup> – 1 plaquettes. Each one imposes a constraint on the system, since they must have values of ±1. This means two degrees of freedom are lost with each stabilizer plaquette. Putting this all together, the <em>actual</em> degrees of freedom of the surface code are:</p>
<p>d.o.f. = 2N<sup>2</sup> – 2(N<sup>2</sup> – 1) = 2.</p>
<p>In animated form:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/video/upload/q_auto:best/v1590009187/Blog/DegreesOfFreedom.gif" alt="" class="centre-image" /></p>
<p>This is exactly the degrees of freedom for a qubit! What this means is that we’ve managed to encode a qubit’s worth of information from N<sup>2</sup> physical qubits. We call this one the <em>logical</em> qubit, because it’s the one which is actually used to carry out a quantum computation. This whole technique of detecting errors using stabilizer measurements is for the purpose of protecting this qubit.</p>
<p>And how do we access this logical qubit? After all, it’s not a physical object on the lattice. Instead, it’s sort of “smeared out” across the whole lattice. This is why people sometimes call these kinds of codes “topological”. It reflects the fact that the logical qubit we’re manipulating isn’t located at a specific place on the lattice. The operators used for manipulating this logical qubit are the logical X and Z operators we looked at above, which is why it’s good that we had <em>something</em> which didn’t get detected by the stabilizers.</p>
<p>After a moment’s thought, you might be confused as to why the surface code gets any attention. After all, being only able to create <em>one</em> logical qubit<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup> out of that many physical qubits looks like a waste. And it only gets trickier as you increase N, with no corresponding increase in the number of logical qubits. So why is the surface code popular?</p>
<p>The answer lies in <em>fault-tolerance</em>. This is a research topic within quantum error correction that concerns itself with making sure that a quantum computing can do <em>all</em> of its computation correctly. In what I described above, we assumed a bunch of things about the errors. Namely, we assumed that errors occur on the lattice during the computation, but <em>not</em> when implementing the syndrome measurement or with applying the corrections. But these are physical processes just like any other! There’s no reason that they should be magically immune to errors, except for the fact that theorists such as myself like simplicity. Unfortunately, the world is far from how a theoretical physicist imagines it, so being able to deal with errors during all steps of a computation is crucial. Answering questions like, “What happens if the syndrome is measured incorrectly?” or “What if a correction that is applied happens with an error?” is the focus of studying fault-tolerance. This is going to get us way off-field from the discussion I want to have in this essay, but it’s worth pointing out that these ideas are the next steps that need to be taken if you ever want to go from this nice theoretical idea to a real implementation in the laboratory.</p>
<h2 id="main-ingredients">Main ingredients</h2>
<p>This essay has been focused on building up the necessary ideas required to understand the surface code and how it actually works. There are several ingredients involved, so it’s worth recapping what we’ve seen.</p>
<p>The surface code is built up from a square lattice of N<sup>2</sup> physical qubits, and they encode one <em>logical</em> qubit. The checkerboard pattern on the surface code is from two kinds of stabilizers, which are plaquettes that can measure the parity of errors (bit flips for X errors and phase flips for Z errors) on the surface code. This gives us an indirect way of detecting errors by constructing its <em>syndrome</em>. In an actual quantum computer that uses the surface code, only the syndrome is seen, not the underlying error configuration.</p>
<p>Correcting the errors ultimately involves constructing loops. These loops are formed from the error + recovery configuration put together, but since we cannot see the error configuration, this can lead to good loops (within the bulk of the surface code) or bad loops (stretching across the surface code’s boundaries). A bad loop <em>can</em> be good if you implement it on purpose (because this lets us manipulate our logical qubit), but it’s bad if you apply it by mistake (logical failure). Being able to form good loops reliably is what leads us to the idea of decoding.</p>
<h2 id="decoding">Decoding</h2>
<p>We’ve seen that it isn’t easy to do error correction with the surface code. Despite the fact that we have syndromes, they aren’t one-to-one with the possible errors. This wouldn’t be a disaster if only errors were possible on our state, but when the wrong corrections are applied, this can lead to a state with no syndrome yet fundamentally different characteristics. This is what happens when a logical operator is mistakenly applied. And because we don’t have access to the underlying configuration, there’s no way of knowing if we manipulated the quantum state into what we call <em>logical failure</em>.</p>
<p>Here’s what the ideal scenario would look when doing quantum error correction with the surface code.</p>
<ol>
<li>The helper qubits measure all of the stabilizers, giving us the syndrome.</li>
<li>Software takes in the syndrome and outputs a recovery configuration to apply to the lattice.</li>
<li>The errors are eliminated, the surface code has underwent recovery, and the computation continues.</li>
</ol>
<p>Ignoring the fact that errors can show up while measuring the syndrome and applying a recovery configuration<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>, Step 2 is very tricky. Like I just mentioned, the relationship between syndromes and errors isn’t one-to-one, so there’s always a chance that a proposed recovery configuration leads to logical failure (implementing a logical operator by mistake). A good decoder will then minimize this outcome when looking at a given syndrome.</p>
<p>One thing to keep in mind is that this will depend on the physical error rate, which is the probability that a given lattice site will produce an error during the computation. A discussion of error rates will get us way further into the weeds than we’ve already gotten, so I will be brief on this topic. There are many error models, ranging from the simplistic but unrealistic (error rates are independent) to complicated (including correlations between lattice sites and their errors, as well as time-dependence).</p>
<p>One key plot that is shown in work on the surface code is that of the error threshold. As far as I know, this is done for the case in which errors are independent. With your software implementation, you can calculate the logical failure rate <em>P<sub>fail</sub></em> as a function of the physical error rate <em>p<sub>error</sub></em>. For a given lattice of side length <em>N</em>, you get a different curve. Then, there’s a threshold physical error rate at which each of these curves cross. The error rate corresponds to <em>p<sub>threshold</sub> ≈ 0.109</em>. Such a plot might look like this.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1591972040/Blog/FinalLogicalFailureN_5_7.png" alt="" class="centre-image" /></p>
<p>What does the threshold represent? It’s the critical point at which, when you take the lattice size to infinity (otherwise known as the thermodynamic limit), errors can be corrected<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>. It should be a step function centered at this point. For physical rates below the threshold, recovery can be reliably done, and above it, recovery is impossible. By plotting the <em>P<sub>fail</sub></em> vs <em>p<sub>error</sub></em> curves for various values of N, the hope is that they will all cross at this threshold rate and resemble a step function (getting closer and closer to one as N increases).</p>
<p>But how can we construct this plot if we don’t know the underlying error configuration? The idea is that the software implementation gets evaluated on a test set of syndromes <em>and</em> errors. This means we know the underlying error configuration while building the model. The hope is that if we can build a model which recovers the state in an accurate way (independent of the test set), it will work well in various scenarios.</p>
<p>A bunch of work has been done to see which decoder is best<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup>. Proposals include neural network implementations, the standard “Minimum Weight Perfect Matching” algorithm, and many more. Some are more sophisticated than others, but they all play the same game: take in a syndrome, and propose a recovery configuration to apply, while minimizing the logical error rate.</p>
<p>For my essay project as part of <a href="/psion">Perimeter Scholars International</a>, I worked on building such a good decoder (the plot you see above is from this project). It uses machine learning, and although I am biased, I have to say that it has some pretty neat ideas baked into it. I’ll tell you about it in a future essay.</p>
<p>Next time: Physics on a Cube.</p>
<h2 id="references">References</h2>
<ol>
<li>
<p>Surface codes: Towards practical large-scale quantum computation. A. G. Fowler, M. Mariantoni, J. M. Martinis, A. N. Cleland, <a href="https://arxiv.org/abs/1208.0928">arXiv:1208.0928</a> [quant-ph], 2012.</p>
<p>I would say that this reference has almost everything you need to know about the surface code. The lattice they use is slightly different than mine though, which is why the diagrams don’t perfectly match. I’m pretty sure that the surface code I’ve drawn is called the “rotated” surface code because it takes the one from their article and rotates it by 45 degrees. Still, a lot of the mathematics is worked out here, and it makes for a great place to find a lot of technical details. </p>
</li>
<li>
<p>Reinforcement Learning Decoders for Fault-Tolerant Quantum Computation. R. Sweke, M.S. Kesselring, E.P.L. van Nieuwenburg, J. Eisert, <a href="https://arxiv.org/abs/1810.07207">arXiv:1810.07207</a> [quant-ph], 2018.</p>
<p>This is the reference that I first saw with the kind of diagrams that I used. I loved the design, and that’s why I’ve used this kind of surface code ever since.</p>
</li>
<li>
<p>Topological quantum memory. E. Dennis, A. Kitaev, A. Landahl, J. Preskill, <a href="https://arxiv.org/abs/quant-ph/0110143">arXiv:quant-ph/0110143</a>, 2002.</p>
<p>The beginning of all this surface code business (as far as I’m aware). I’m putting it in the references for historical purposes. It does have some interesting parts, but I’m not a fan of the presentation. I think a lot of the diagrams are difficult to follow, and the construction they use for the surface code is different than what I used here. My biased opinion is that this graphical approach is much easier to understand, but perhaps you want some more mathematical meat to your surface code. If so, you can go here.</p>
</li>
<li>
<p>A game of surface codes: Large-scale quantum computing with lattice surgery. D. Litinski, <em>Quantum</em>, <em>3</em>, 128, 2019. <a href="https://arxiv.org/abs/1808.02892">arXiv:1808.02892</a></p>
<p>This is the paper that inspired the name of my project and this essay, which in turn is from <em>A Game of Thrones</em>.</p>
</li>
</ol>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Technically, a state is something called a <em>ray</em> in the Hilbert space of the system. This is a fancy way of saying that we actually have a whole group of states that are effectively the “same”, even if they don’t look the same. They are related by something called an overall phase factor, which doesn’t change the underlying quantum state since the main thing we care about is having it be normalized to 1. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>This comes from the fact that the Pauli matrices aren’t independent. If you actually perform the multiplication, you will find that XZ = −iY, where <em>i</em> is the imaginary unit. This means two things. First, we don’t need to have Y stabilizers since X and Z “cover” the space. And second, when we consider errors, we just have to look at X and Z because both happening on the same site introduces a Y error. In particular, a Y error on the lattice would generate a red dot (syndrome) on <em>all</em> of plaquettes it touches. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>A Hamiltonian is what physicists use to describe the dynamics of their system. In the game of quantum theory, the Hamiltonian is the object that goes into the Schrödinger equation iℏ ∂<sub>t</sub>❘ψ(t)❭ = H❘ψ(t)❭. In the case of the surface code, the Hamiltonian is defined in terms of the stabilizers: H = −J ∑<sub>p</sub> ( Z<sub>p</sub> + X<sub>p</sub> ), with J > 0. This enforces the stabilizers to all have +1 as their eigenvalues, which will yield the lowest energy to the system. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Helper or “ancilla” qubits are qubits which don’t form the main part of the algorithm, but serve some purpose. For our case, the helper qubits serve as measuring devices for our stabilizers. This means that each stabilizer has a helper qubit associated with it, apart from the <em>N<sup>2</sup></em> physical qubits we are already using. These are not used for anything else than building the syndrome. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Syndromes are a feature of all stabilizer codes, with some codes doing better than others in mapping one specific error to a certain syndrome. If you have a code that can do this, then you can correct errors, since you just have to make a “translator” that tells you that <em>this</em> syndrome corresponds to <em>this</em> error, and act accordingly. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>These are called <em>homology</em> classes, and for our purposes, they roughly classify what kind of loop we have. There’s a lot more to unpack here mathematically, but I won’t get into it. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>The reason each loop has an even number of X errors when looking at them by row has to do with the fact that they are built from stabilizers, and any loop has to “wrap back” on itself. This means you have to cross a row an even number of times before returning to your starting point. This won’t be obvious at first, but once you play around with a few examples, you will see the principle at work. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>For the surface code, we were able to create a logical X and Z operator by taking advantage of how each boundary only has one plaquette colour. This means we only get one logical X and one logical Z operator. However, if we used the surface code’s close cousin, the toric code, we can upgrade the number of logical qubits we have from one to two. The reason is that the periodic boundary conditions of the toric code allow you to define a logical X operator along both the horizontal and vertical directions (the same is true for Z). Analyzing the degrees of freedom leads to the result that you get <em>four</em> degrees of freedom, or two logical qubits. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>This is called <em>fault-tolerance</em>. The idea is that errors will accumulate during all steps of a computation, and being able to stand robust against all of them is critical to getting any computation to work. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>The threshold rate was found by establishing a mapping between the surface code and a random-bond Ising model, which has a critical point at this threshold. The details aren’t important for this essay, but if you want to know more, Reference 3 has some discussion about it in Section IV, part F. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>How do we define “best”? There are two aspects at play. On the one hand, we want a decoder which makes the fewest mistakes possible, which means we want to minimize <em>P<sub>fail</sub></em>. On the other hand, I don’t care how good your decoder is if it’s too slow. After all, we’re hoping to implement these during a computation, so time is of the essence. If you’re slower than the rate at which errors occur, the decoder will become overwhelmed and be of no use. So there’s a tradeoff to be made between accuracy and efficiency. For an actual quantum computer, we need the decoder to be good at both. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Tue, 25 Aug 2020 00:00:00 +0000
https://cotejer.github.io//game-of-loops
https://cotejer.github.io//game-of-loopsPSIon<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1594560377/Blog/PSI2020.jpg" alt="Perimeter Scholars International Class of 2019-2020" class="centre-image" /></p>
<p>In my final year of undergrad, I had a plan: go to the university near my house, begin my master’s degree, and eventually do a PhD. It was nice, simple, and straightforward. Not having a ton of people around me applying for graduate school, I wasn’t aware of how big a deal the choice of institution was, nor the fact that some people apply to ten or more schools (often for those looking to go in the US). In my case, I had someone at the local university agree to supervise me, and that was that.</p>
<p>Oh, and as a long-shot chance, I applied to a theoretical physics program in Waterloo, Ontario. I knew I would most certainly not get in, but it was free to apply so I wrote up my application quickly and sent it off, not thinking much about it.</p>
<p>Which is why I was <em>very</em> surprised to hear back from them at the beginning of March, asking for an interview.</p>
<!--more-->
<p>I wasn’t expecting much. I still didn’t think I had a shot. I showed up to the interview, did my best, but didn’t let myself get too hopeful. It was only when I received the letter of acceptance about a few days later did things start to sink in.</p>
<p><a href="https://perimeterinstitute.ca/training/about-psi">Perimeter Scholars International</a> (what we call PSI) is a yearlong course-based master’s program in theoretical physics. About thirty students are selected each year, and they get taught by researchers at Perimeter Institute. The program is different from others in many ways. First, everything is free. From the tuition fees to food and housing, Perimeter provides everything you need. Second, there are no grades. Courses are pass/fail, with no exams. Instead, interviews at the end with the instructors serve as a final assessment. Third, the program is all about collaboration rather than grades (as you can imagine by the previous item). This means students are encouraged to work together, rather than compete.</p>
<p>Okay, so that’s the marketing fluff. But that’s not why you’re here. I’m hoping this essay can achieve two things. First, I want it to give an unfiltered look at what the program is like from the eyes of someone who attended. Despite knowing this was a good program, the number of posts online from past students when I looked was woefully short (<a href="https://saurabhmadaan.wordpress.com/2009/12/22/perimeter-scholars-international-a-student-experience/">this</a> is the only one I found). I won’t be able to make up for the lack of many perspectives, but I will give you a sense of what the PSI program is like. Second, I want to chronicle my own growth from the start of the program to the end. PSI is all about growth, so I figure it’s worth sharing my own.</p>
<h2 id="pre-psi-interview">Pre-PSI interview</h2>
<p>The researchers that run the program are called PSI Fellows. They teach most of the courses in the fall semester and are the main contacts for the program. They also choose the incoming cohort each year. From my understanding, they whittle down the applicant pool several times before finally getting to around fifty or so students. How much whittling do they do? In my class, we ended up being 28 students (though only 26 attended in the end), and there were roughly 700 applicants. So yes, a lot of whittling.</p>
<p>At this point, interviews begin. I received an email to schedule an interview during the first week of March. The email said that I would be asked basic questions about fields like special relativity, statistical physics, and electromagnetism. I guess this is an opportunity for you to study up, but since I didn’t think my chances were that great, I figured that I would simply do my best in the interview and not study beforehand. Probably not something I would recommend, but it did keep me from feeling too much pressure.</p>
<p>The interview is done with two PSI fellows. It doesn’t last long, perhaps twenty minutes. I imagine that’s because there are many interviews to do.</p>
<p>I also want to note that interviews differ from person to person. I know that one of the students in my class didn’t really have an interview at all. It was more like a formality, and he was basically already accepted into the program. I don’t know everything about how this works, but I just wanted to mention it for completeness.</p>
<p>After the interview was done, I heard back with my offer a few days later. You then get two weeks to decide if you will attend.</p>
<p>Once I was accepted, I didn’t get a whole lot of information from the program for most of the summer. There were a few brief emails, though nothing significant. I think my year was a bit exceptional because there were some ongoing staff changes on the administrative side of PSI. From talking with the other students, a lot of us felt like we were given radio silence for a good part of the months preceding PSI. It’s not like we couldn’t ask for information, but it was on us to reach out.</p>
<p>The program asked us to be there at the beginning of August, though this was flexible depending on a student’s situation, and it goes until the end of June.</p>
<h2 id="accommodations-and-logistics">Accommodations and logistics</h2>
<p>PSI students live in the University of Waterloo residences. These are quite close to Perimeter, about 800 metres away (and through a park). The apartments have three bedrooms each.</p>
<p>The apartments are stocked with appliances and things you will need. That means there’s bedding, pots, pans, glasses, utensils, and all other miscellaneous things. The reason I list these out in boring detail is because the email we got simply said something along the lines of, “You will have everything you need, so you only have to bring personal belongings.” For a future PSI student wondering what they will get, the answer really is “basically everything”.</p>
<p>PSI students each get a living stipend, which isn’t a lot but serves to cover meals on the weekends and any other small personal costs you might have. I found it sufficient for myself, but this could vary drastically depending on your circumstance. Since I drove here (I’m from Quebec, the province beside Ontario), I was able to bring winter clothing and everything I needed. Others had to buy all of this, which means budgeting this living stipend appropriately.</p>
<p>Meals during the week are provided at the Black Hole Bistro, which is the restaurant within Perimeter itself. For PSI students, most items are free, and we get three meals a day from Monday to Friday.</p>
<p>Perimeter provides students with computing hardware and software (laptops, Mathematica, Maple, and so on). Also, since the bread and butter of theoretical physicists is working through equations and ideas, Perimeter has a basically unlimited supply of paper, pencils, pens, and other office gear. So yes, when the email said, “Only bring your personal belongings,” they weren’t joking. This came in handy throughout the year, and made hunting for supplies much easier.</p>
<p>Finally, the PSI students (we’re also called “PSIons”) have our own room in Perimeter. It’s aptly called the PSI Room, and is where tutorials are done, people hang out to work, and basically is a space just for us. I preferred working in the library, but many worked in the PSI Room all the time.</p>
<p>Those are the logistical details in a nutshell. Again, not the most exciting of stuff, but I know that <em>I</em> would have liked to know this coming into the program, so I’ve done my best to satisfy that curiosity here.</p>
<h2 id="front-end">Front End</h2>
<p>PSI is broken up into roughly three sections (though this is already outdated because I hear that there won’t be any Front End starting in 2020). The first is called the Front End. During the month of August, students take a few review classes. The goal here is to help students acclimate to the new environment and set everyone on the same page. The courses here are even shorter than later on, and there is no homework (only tutorials).</p>
<p>In my year, the following classes were offered:</p>
<ul>
<li>Lie Groups and Lie Algebras</li>
<li>Programming in Python</li>
<li>Mathematica</li>
<li>Classical Mechanics Review</li>
<li>Math for QFT</li>
</ul>
<p>I won’t go into each one in detail, because that will get dreadfully boring. I’ll just note that I didn’t think these were super useful (except for the Classical Mechanics Review). Everything else was so new to me that I had trouble absorbing the details. It made for a rough start, to say the least. Even though there was no pressure in terms of homework or tests, I still felt like I was much worse than a lot of other students. Or rather, that I was behind because I hadn’t ever seen these topics. Combine that with the fact that these courses were sometimes just a few lectures, and you can imagine how intimidating it feels when you don’t have a good grasp on things.</p>
<p>The Front End courses are taught by the PSI fellows, and like most varieties in life, some are better than others. I had my personal selection of courses I enjoyed or not, but I think this will vary wildly by student.</p>
<p>To me, the main point of the Front End is to give students a buffer to mentally get ready for the next phase of the program: the Core Courses. In that regard, it was successful.</p>
<h2 id="core-courses">Core Courses</h2>
<p>As the name suggests, these are the courses that every PSI student needs to take. There are six courses, and they are spread out over the fall semester (from September to December):</p>
<ul>
<li>Relativity Theory</li>
<li>Quantum Theory</li>
<li>Quantum Field Theory I</li>
<li>Statistical Physics</li>
<li>Quantum Field Theory II</li>
<li>Condensed Matter</li>
</ul>
<p>I’ve listed them in the order that they occur. The PSI program has experimented with different schedules, but the way it usually works is that two courses are given at the same time (so group them up in the list above). The courses last for either three or four weeks (these are the two variants we tried). When the course lasts for three weeks, that means you have class every single day. In particular, the schedule looks something like this:</p>
<ul>
<li>90 minutes for first class (beginning at 09:00)</li>
<li>15 minutes for a break</li>
<li>90 minutes for second class</li>
<li>Two hours for lunch</li>
<li>90 minutes for tutorial from first class</li>
<li>90 minutes for tutorial from second class</li>
</ul>
<p>If you’re looking at this and think it’s a lot, you’re right. And this is repeated every weekday (sometimes with less tutorials in the afternoon) for three weeks. I remember taking Relativity and Quantum Theory under this schedule and thinking that there was just <em>so much</em> to absorb every day. It was quite the onslaught. You can definitely do it, but it takes a lot out of you.</p>
<p>At the end of three weeks, we had a week off. This sounds nice, but what it’s to prepare for the interviews. You can think of these as “final exams”, but they aren’t quite that.</p>
<p>The interview works as follows. For about 30-40 minutes, you go into a room and talk with two of the PSI fellows about concepts you saw in the course. They might ask you to prove something, or sketch out the argument on a blackboard. It’s not as high-stakes as an exam, but I’d be lying if I said I didn’t prepare for these a lot. In fact, I would almost say that I prepared <em>more</em> than I did for exams when I was in undergrad. I guess I was just worried about this different format than what I was used to. At the end, they tell you how you did and if you passed.</p>
<p>Honestly, they aren’t so bad. You do your best, explain what you know, and that tends to be enough. If for some reason you’re lacking in a certain area and the fellows aren’t satisfied, you will be asked to retake the interview. I know several of my friends who did exactly that. Luckily, I passed on the first go for each course, but it’s not a huge deal to retake an interview.</p>
<p>The other variant of the schedule was having courses for four weeks, with each Wednesday off. This was really nice, because it gave us a break during the week in which we could do homework. The downside is that it took up our study week at the end of the courses. This meant we had to prepare for the interviews during the weekend and during the interview week itself.</p>
<p>So in total, the schedule looked like this:</p>
<ul>
<li>3 weeks for the courses</li>
<li>1 study week</li>
<li>1 interview week</li>
</ul>
<p>Or:</p>
<ul>
<li>4 weeks for the courses with Wednesday off</li>
<li>1 interview week</li>
</ul>
<p>It turned out that people were torn about which format they enjoyed. I think when we were polled it was roughly split in half. I actually enjoyed having the longer format, because it meant the week didn’t feel like as much of a slog.</p>
<p>I’ll also note here that my particular year didn’t have an interview for the Statistical Physics course, due to some external issues. We instead had to give a small presentation, which I thought was easier. This shows you the advantage of being in a small program like PSI. The flexibility offered is great.</p>
<p>I would say that I learned a lot from these interviews. It gave me a chance to work at explaining ideas aloud, instead of on the page. I think it was a useful way to do the assessments, and I am glad I didn’t have to write exams.</p>
<p>Homework tended to be lengthy. It depended on the course, but I remember taking a ton of time for some of the assignments. If you’re smart about things though, you learn to do them in collaboration with others. You still write your own copy, but there’s no reason you should go about it all alone. Heck, we’re <em>encouraged</em> to talk about it with others.</p>
<p>From what I’ve gathered, the most difficult courses are QFT II, QFT I, and Statistical Physics. These are for different reasons, such as teaching style and content. Some of these ideas are quite advanced, so it does take time to wrap your head around them. Your mileage will also vary, depending on your background.</p>
<p>I have to say though: Once I finished my last interview in December, I was filled with joy knowing that I got through many courses that I had <em>never</em> seen before in my life. Out of the six, the only one whose content I saw in part was the Relativity course (Statistical Physics at PSI is different than what you’ve probably done). Even then, I didn’t have a general relativity course during my undergrad, and learned things through summer research with my supervisor. I say this because I think many of the students had seen some of the content before.</p>
<p>This is where I want to interject with some of my own thoughts. During the Core Courses, I did often feel like I was one of the “weaker” students in terms of my background. While this was probably true, it helped to realize that this didn’t mean I was incapable of doing as well as anyone else. I just had to accept that I would move along at my own pace. It was difficult for me to do so at the beginning, but I got more comfortable with this as the semester wore on.</p>
<p>After the Core Courses are over, we get a holiday break (I and several other students returned home), and then start back up in early January for the Elective Courses.</p>
<h2 id="elective-courses">Elective Courses</h2>
<p>To get the PSI certificate (which isn’t the Master’s degree, but a separate thing from Perimeter), we needed to take at least six of these electives. However, because 2020 was the year of COVID-19, things ended up changing. They are supposed to be half the work of a Core course, though that didn’t always happen. Even if they go on for the same number of weeks, the density is significantly less. This meant fewer tutorials, which was great for me because I find tutorials take up a lot of energy.</p>
<p>To give you an idea of the available courses, here’s what my year had:</p>
<ul>
<li>Quantum Matter (Part I and II)</li>
<li>Standard Model and Beyond (Part I and II)</li>
<li>Gravitational Physics</li>
<li>Quantum Field Theory III</li>
<li>Machine Learning for Many-Body Physics</li>
<li>Chern-Simons Theory (Part I and II)</li>
<li>Quantum Gravity (Part I and II)</li>
<li>String Theory</li>
<li>Relativistic Quantum Information (Part I and II)</li>
<li>Computational Physics</li>
<li>Quantum Information</li>
<li>Quantum Foundations</li>
<li>Cosmology (Part I and II)</li>
</ul>
<p>As you can see, there was quite a lot on offer (though a few were cancelled due to COVID-19). A difference in my year was the addition of Part II courses. The goal was for Perimeter to offer courses for the PhD students, while letting PSI students also take them. The idea was to have Part I be accessible for any PSI student, while Part II was more research-oriented. I only took Part I courses, and I thought they were fine.</p>
<p>I won’t go into the precise details of each course, but the basic format was this: classes, a few tutorials, and some homework assignments. No interviews at the end.</p>
<p>I enjoyed some of my classes, while disliking others. Mainly though, I kept a simple mentality: explore the classes, and don’t worry about understanding everything. I had the mindset of a taster: try and see what was interesting, while being okay with not knowing everything. It worked well for me, and allowed me to get my courses done without too much stress. That’s important, since the winter semester is when research begins.</p>
<h2 id="winter-school">Winter School</h2>
<p>One tradition of PSI is the Winter School. This is a week-long trip to a remote area where we got to work on research projects and do a bunch of winter activities. It’s fun, giving everyone a chance to kick back and enjoy the winter as well as learn something with new people.</p>
<p>The way it works is that there are several research projects available. We filled out a survey, and then were placed in groups of three or four PSI students to tackle these projects. Each project also had someone supervising it, and sometimes there were multiple people (including professors and post-docs). The projects ranged from quantum foundations, to quantum information theory, to quantum computing and simulations, to gravitational theory, to mathematical physics and particle physics. The theme for our year was definitely “quantum”, but there was enough variety to satisfy most people.</p>
<p>Winter School was one of the highlights of the program, and that’s saying something because I was sick the whole week! Even then, we got to do a bunch of fun winter activities. It wasn’t exactly the coldest of winters (we had a mild season), but thankfully I got to bring out my Canadian and Québécois roots while playing hockey. There was also indoor roller skating, games, rock climbing, XC skiing, and archery.</p>
<p>The place is basically a retreat, and it has a staff of employees who show up each day and do everything from setting up the activities to preparing every meal and snack (which there are many of!). I thought they were great, and it definitely made the week easy, allowing us to just focus on our work and the activities.</p>
<p>We would work in the morning from 9:00-12:00, break for lunch, have afternoon activities from 14:00-17:00, have dinner, and then go for one more session between 18:30-21:00. It was a relaxed atmosphere, with each group getting their own workspace to ponder<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>My group was actually composed of people from academia and industry. We worked with <a href="1qbit.com">1QBit</a>, a company doing research on quantum computing and software development. Our problem was looking at the loss landscape for something called a variational quantum eigensolver. The idea is that we want to simulate a quantum Hamiltonian using quantum circuits, and the loss landscape is a multidimensional space where we want to find global minima (this lets us find the ground state, which is important when trying to understand a quantum system). For most of the week, we learned about visualizing these landscapes, building the necessary code for the visualization, and also coding the actual Hamiltonian in the language we were using (<a href="https://pennylane.ai/">Pennylane</a> from <a href="https://www.xanadu.ai/">Xanadu</a>, for those who are interested).</p>
<p>As someone who never really got into coding, I was excited but also anxious, not knowing if I would really be able to contribute. Thankfully, my two friends on the project with me were more than happy to help. I might not have contributed a lot to the actual coding that week, but I did learn a ton.</p>
<p>Some projects during Winter School result in published research. Others don’t. At the moment, ours hasn’t been published. We have plans to do more work to investigate, but we really learning for the most part. I didn’t feel much pressure to actually get a publication out from this, so I just focused on enjoying myself. Since the Winter School, a few groups have put work out on the arXiv, but it’s not the norm.</p>
<p>On the final day, we all gathered together and presented our work. It was nice to see what everyone had spent the week doing. I did not understand everything, but I enjoyed listening and trying to take nuggets of information from them. It made for good practice in sharing our research.</p>
<p>The Winter School was a welcome change of pace from everything else that was happening during PSI, and it was something I could even see myself doing again (as a project leader). Whereas a lot of PSI can feel fast-paced, that week was more of a stroll.</p>
<h2 id="essay">Essay</h2>
<p>The PSI essay is one of the big requirements for the program. It can sound scary in the abstract, but it’s not bad.</p>
<p>The essay is a “mini” research project. It doesn’t have to include original research, though some do. The idea is to produce a written work at the level expected of someone who does research. It’s only thirty pages, which is quite short when you think about it. Combined with this is a presentation, given at the end of the year and is examined by three people: one PSI fellow, your supervisor, and an external examiner.</p>
<p>Because you need a supervisor for this project, you have to start looking for one during the fall. As someone who is keen to start things early, I began contacting people about potentially doing a project during the very first weeks of PSI. (That’s <em>much</em> earlier than needed.) That being said, I only chose my supervisor in December. I also didn’t start working on my essay in earnest until February, which I think is the norm for many PSI students. During the Core Courses, I felt too occupied by homework and preparing for interviews to do any research on the side. For some of my friends, applications for graduate programs in the U.S. are due then, so that occupied a lot of time.</p>
<p>After the Winter School though, things picked up for me. I met with my supervisor more frequently, and we worked on a plan of attack for my project. Different supervisors have different styles, so it’s important to find someone who you like working with. The person I chose to work with was <a href="https://rgmelko.github.io/">Roger Melko</a>, who works on the intersection between quantum many-body physics and machine learning. These were two areas that I wanted to become more familiar with, so working with him was a straightforward choice. Moreover, Roger is simply someone who is very easy to get along with, which helped my choice.</p>
<p>If you’re curious about what I actually worked on, I will be writing about in in a future essay. The topic is quantum error correction on the surface code with recurrent neural networks, but I’ll save the full explanation for the essay dedicated to it.</p>
<p>In terms of time spent working on things, I would say that my PSI essay became my main focus by March. I was almost done classes by then, which left me with more free time. A large chunk of my essay was background, so I began with that. It let me get familiar with the conceptual aspects of the problem, and this was probably where I made the quickest progress. As evidenced by this blog, writing isn’t something I fear, so sitting down to write was fun.</p>
<p>Another thing that’s great about working with Roger is that he has a large group of people who can help answer questions. His group is called the <a href="https://www.perimeterinstitute.ca/research/research-initiatives/perimeter-institute-quantum-intelligence-lab-piquil">PIQuIL</a>, and the people there helped me a bunch with my essay. In particular, <a href="https://github.com/mhibatallah">Mohamed Hibat Allah</a> and <a href="https://dansehayek.github.io/">Dan Sehayek</a> were key in helping me get my code working, so I am super grateful to them.</p>
<p>Even though I had written an Honour’s thesis for my undergraduate degree, this was a different sort of project. It might have been shorter in terms of page count, but I think it offered me the chance to dig deep and really understand a topic. Plus, it’s short enough that even if I ended up hating it, I wouldn’t be attached to this work forever.</p>
<p>Because the project I was working on had the potential to be original research, I wasn’t bothered with making the essay a masterpiece. That’s not to say I didn’t put a lot of time into making the presentation good. It’s just that I kept my efforts modest. I wanted to do good work with this idea, which meant covering all bases for a future publication. My essay was like a work-in-progress towards achieving that. It eased the pressure to feel like the essay was a huge thing (it’s not). My recommendation for writing the essay is to think of it as a chance to work on how you communicate your science. Try to make it the best you can, and treat it as a learning opportunity where you can test your ideas.</p>
<p>I also had a lot of time to think about my project because of COVID-19. The lockdown meant that I had many hours per day where I could just think about things that were interesting to me. I put many of those hours into my project, and I think that definitely helped get things moving along.</p>
<h2 id="presentation-defense-and-graduation">Presentation, defense, and graduation</h2>
<p>In addition to the essay, we each gave a presentation of our work. This consisted of twenty minutes for us to talk, then twenty minutes for the examiners to ask questions and for discussion. The lockdown meant that this was entirely online, which was an interesting experience.</p>
<p>As I mentioned before, I like starting things early. So you won’t be surprised to hear that I began working on my presentation in April, over two months before presenting. For contrast, some of the PSIons were talking about starting their presentations <em>after</em> submitting their essay, giving them one or two weeks to prepare.</p>
<p>I started early for a reason: I never felt like I had given a <em>good</em> presentation before. Yes, my presentations were fine, but they didn’t <em>sing</em>. They didn’t inspire people, or make them excited about my work. I decided that this presentation would be different. I would put in a lot of work to make the presentation look good.</p>
<p>This turned out to be a great decision, since the impact of COVID-19 meant that our presentations were done entirely online. Having a nice set of slides became even more important.</p>
<p>I worked on my slides for longer than I would almost care to admit. I knew that the ideas I wanted to get across for my work could be encoded graphically. The question was: How much time was I willing to spend to make these look good?</p>
<p>I don’t think I’m exaggerating when I say I spent hours on the slides, getting the animations and visuals just right. I even spent a bunch of time making visuals that didn’t make the final cut for the presentation (but will make it into that future essay I mentioned). My goal was clear. If I had a picture inside my head, I had to make it a reality on screen.</p>
<p>Getting the visuals was good, but I also wanted to <em>sound</em> good. That meant knowing my material inside and out. Furthermore, it meant eliminating needless “ums” and “uhs” while speaking, and pausing at the appropriate places. By starting early, I became intimately familiar with the content.</p>
<p>The presentation itself wasn’t a big affair. I showed up to the call, I talked for twenty minutes about my project on surface codes and machine learning, and then I got questions for about twenty minutes. It was fun, and I must admit, the stakes felt lower. Since you aren’t actually facing a room full of people, it’s easier to focus on giving a good presentation. I think I would have done alright anyway, but I note this because I think it will have implications for how schools do presentations in the future.</p>
<p>And with my presentation completed, I was finally done. It was the last thing I had to finish during PSI. A few weeks later, we had an online graduation ceremony to replace the one that we couldn’t do in person. It was a nice way to cap off the year, and there was one part in particular that I thought was brilliant. During a physical ceremony, each person would get called up and be handed their PSI certificate. For the online ceremony, students were able to send in a video of one minute talking about their favourite moments during PSI, and each one was played during the ceremony. You can view those videos <a href="https://insidetheperimeter.ca/a-global-ceremony-for-perimeter-grads/">here</a>, as well as the video messages that were sent by special guests.</p>
<p>I was struck by how many talked about the same idea: friendship. Despite this whole year being nominally about physics, most students focused on meeting new people and thanking them for the wonderful year. This was what my video was about, and it seems like others had similar thoughts. It made for a wonderful series of videos, and a nice way to finish the program.</p>
<p>Before I wrap up, there are a few more items to talk about concerning PSI.</p>
<h2 id="mentors-and-buddies">Mentors and buddies</h2>
<p>Each PSI student is assigned to a PSI fellow as a mentor. This gives everyone a person they can go to throughout the year to ask questions, share problems, and so on. Depending on the PSI fellow, you might have more or less contact with them. They can also help you for reference letters, and school applications.</p>
<p>I thought this was a really good idea. During my undergraduate degree, I never had anything like this. Sure, I talked with my supervisor, but there was never anything set up explicitly for mentorship. More places should have this structure in place.</p>
<p>Then, there’s the PhD buddy system. In addition to your mentor, a bunch of Perimeter PhD students sign up to be a point of contact for the PSIons. Since we are in our master’s degree, being able to talk with PhD students is probably easier since we are close enough along the journey through academia. Even better, a lot (maybe most?) of the PhD students are former PSI students, so they know the program intimately as well.</p>
<p>Overall, I thought both systems were good, though I personally didn’t use them a ton.</p>
<h2 id="covid-19-and-the-shutdown-of-perimeter">COVID-19 and the shutdown of Perimeter</h2>
<p>I planned to write an essay about my time at PSI <em>before</em> even starting the program. It was something I knew I wanted to do. But I had no idea that I would be including a section like this. I want to take a few paragraphs to explain how the PSI program morphed during the pandemic, and how we all coped with it.</p>
<p>Like most places, Perimeter was shut down in mid-March. Just before, there were a bunch of things happening. One of the main concerns was travel. Since the PSI group was mostly comprised of international students, being able to return home was a constant thought for most of them.</p>
<p>Because of the border shutdowns in various countries, many of my friends left (in various states of hurry). This was, to say the least, an emotionally trying time for a lot of our group. We thought that we still had another three months to be together, and suddenly I was helping my friends pack and leave the country. In particular, it often crossed my mind that I was maybe saying goodbye to someone who I would never see in person again. Or, at least for a long while.</p>
<p>Staying in a university residence when all other public places were closing down was also stressful. Even though we were assured that we would be taken care of, the truth was that <em>everyone</em> was reacting to this new situation, and confident statements didn’t mean too much. The end of March was filled with a lot of uncertainty.</p>
<p>However, I began to adjust to the new normal. I gave myself permission to do things that I wanted to do, like reading, writing, and drawing. I had neglected these a lot during PSI (something I knew I would have to do but wasn’t thrilled about). Being able to pick them back up with more regularity during the shutdown was great.</p>
<p>Perimeter was also in a good position to make the transition online. Some of our winter courses were cancelled, but some continued online. It helps that Perimeter records all past lectures, so we were able to watch the lectures from previous PSI years. Not ideal, but better than nothing.</p>
<p>I wasn’t a fan of the online courses, but I think that was also because I was tired of taking classes. These final months are when you’re balancing class work and your research project. I was getting into the bulk of my project, so I wasn’t motivated for the classes. We still had video calls in order to discuss the material twice a week, but my mind wasn’t in it. I’m thankful that the lockdown happened near the end of classes, because I think taking them all online would have driven me crazy.</p>
<p>Overall, the lockdown was difficult in terms of society having to change its regular functions, but for myself, I was able to spend more time alone with my thoughts, and I enjoyed that. I know that I am one of the luckier ones when it comes to being able to do my work. As long as I have books to read, pencils to draw, and a computer for writing, I’m set. I took this as a time to reorient myself, and while it was far from easy at times, I’m glad I was able to make a bit of progress.</p>
<p>I’m impressed by what Perimeter did though in reaction to the shutdown. There are many seminars and talks each week, and a bunch of them moved online. Even the Friday social events got moved into virtual calls for people to gather. I didn’t participate in these even when they were held in Perimeter, but I thought it was good to continue the tradition online.</p>
<p>I also want to acknowledge that not all of the PSI students adjusted quite like I did. Some had very legitimate concerns about their future (PhD and so on) that caused a lot of anxiety when the shutdown happened. I’m lucky that the university I’ll be attending for my PhD studies is close to home and still in Canada, so I was one of the least affected. I just wanted to point out that not everyone was in the same boat as I was. This was also complicated by the various messages we received concerning the evolving situation. I think I can be confident in saying that the sense of instability during these months was felt throughout the PSI group.</p>
<h2 id="my-thoughts-on-psi">My Thoughts on PSI</h2>
<p>In the last several thousand words, I’ve talked mostly about what the program <em>is</em> and what you can expect if you join. Here though, I want to invite you into the window of my experience.</p>
<p>When I started PSI, I didn’t think much of my abilities. I saw people who had taken more courses than I, who knew more advanced mathematics and physics, and generally seemed smarter. It made me very reluctant to share my own thoughts on things, since I felt like everyone around me had probably thought what I had before.</p>
<p>But as I got through more of the program, I realized that, while the PSIons are on a shared journey, we aren’t at the same <em>point</em> in that journey. Some of us are further along than others. This doesn’t mean that anyone is <em>better</em> or <em>worse</em>. It simply means that there’s diversity.</p>
<p>This was different from my usual experience in school, where everyone in a class has basically the same background and knowledge. It’s what made me uncomfortable in PSI at the beginning, but as I learned to embrace where I was in my journey through academia and physics, I didn’t worry about it as much.</p>
<p>PSI makes you grow. That’s the most succinct way I can put it. I’m not even talking about growing as a physicist in particular. Throughout my time, I grew in other ways, too. I made a variety of friends who made coming to PSI worth it on their own, I learned about managing my time, and I learned to let go of the ridiculous notion that you can absorb <em>everything</em>.</p>
<p>The year was stressful at times. I’m thinking particularly of the Core Courses, where homework and preparing for my interviews consumed the bulk of my energy. Still, I would absolutely say that PSI was a positive experience for me.</p>
<p>PSI was an opportunity for me to do something different than I had done in my undergrad. I wanted to branch out from the research I had done (modified gravity) and into quantum computing. I studied quantum error correction with machine learning, a blend of two fields that was fun to discover. Being able to jump into new fields was something that PSI allowed me to do without committing for years.</p>
<p>If there’s one thing that I will treasure the most from PSI, it was meeting the other PSIons. Being able to form relationships with others who are on a similar journey is so rewarding, and it gave me a chance to really connect with people (more than I ever had with people I went to school with<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>). There are so many memories I can think of: a Thanksgiving dinner with fifteen of us squeezed into my apartment, walks around Waterloo Park, playing hockey and other games at the Winter School, driving five hours to go 2km underground to visit SNOLab, and many more.</p>
<p>I would be lying if I said the shutdown due to COVID-19 didn’t somewhat take away from the experience, but there’s nothing that could be done there. It was a stressful transition, but we made it through.</p>
<p>Throughout the program, I felt like I was supported by the PSI Fellows. They were always very kind to me, and it was clear they wanted the best for me. And because the program is so tight-knit, there’s a feeling of community that might not be present in larger programs. We did have some tension when things weren’t communicated to us in the best way during the health crisis, but I knew that Perimeter had our best interests at heart.</p>
<p>I also met great people outside of my program. In particular, I’m thinking of those from the PIQuIL, who were always kind and took the time to answer many questions I had when it came to my project. I also made friends outside of PSI who were at Perimeter, and that was great, too.</p>
<p>Being at Perimeter gave me a better sense of what I wanted to do in physics. That journey of self-discovery isn’t complete (and I hope it won’t be anytime soon), but seeing a huge swath of physics in more depth than I had as an undergraduate was eye-opening to me. As an undergraduate, my university had a limited number of physics classes. For example, the only courses on condensed matter that I’ve taken were during PSI (these were Condensed Matter, Quantum Matter I, and Machine Learning for Quantum Many-Body Physics). I’m glad that I was able to be exposed to these ideas, because they revealed a whole sector of physics that I had never encountered beforehand. Likewise, I’m happy that I got opportunities to play with code and do more computational work. As someone who used to swear by only using pencil and paper and tried his best to get away from computers, I wanted my time at Perimeter to include coding so I had a chance to build my skills. Through courses, the Winter School, and my PSI essay, I was able to do exactly this.</p>
<p>PSI was a great time for me. There were ups and downs (as is the case with anything in life), but I don’t regret joining the program at all. In fact, I’m so thankful that I was chosen. I didn’t go to a prestigious school, I was a good but not necessarily fantastic student, and I am happy that the PSI fellows gave me a chance to partake in this program. It was quite a ride, and it will be a year I never forget.</p>
<p>I’m proud to be a PSIon.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Full disclosure: The workspaces were housed in two different buildings. One was winterized, meaning it had heating and plumbing, while the other was not. I lucked out and got the good building. I suspect it wasn’t quite as fun for the others. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>This is mostly my own fault. During my undergraduate years, I did not make an effort to connect with my classmates outside of school. This is something I regret, and I made small steps to rectifying that this year. It does help that everyone is tossed into the same apartment building! <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Mon, 27 Jul 2020 00:00:00 +0000
https://cotejer.github.io//psion
https://cotejer.github.io//psion