Jeremy CôtéMathematical and scientific thinking for the curious.
https://cotejer.github.io//
The Journey to Completion<p>As a scientist, the usual “proof of work” is the paper<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, a small artifact that encapsulates the essence of a problem and its resolution (or at least, progress the scientist makes). But the process of going from an initial idea to a finished paper is less clear for outsiders.</p>
<p>My goal here is to shed some light on the topic of projects, and all the work that goes into a paper. This is similar in spirit to how writers have gaps in between books in a series. Between publications dates, they need to plan, write, and edit their next book! It’s a lot of work, and makes for the long period between when a writer begins working on a book and when it goes out on shelves.</p>
<!--more-->
<p>There are several stages to projects, and the time spent on each stage varies from project to project. Predicting these times isn’t the goal of this essay. Instead, I want to give future graduate students an idea of my own experience in completing a research project.</p>
<p><strong>First, there’s the initial idea.</strong> It comes from a variety of places, including your supervisor, a paper you’ve read, discussions with others, or even just a burst of inspiration. At this stage, you start to poke at it, testing its boundaries.</p>
<p>This tends to be one of the most exciting times, because you’re still in big-picture mode, imagining how this will change the world. I’m often wrong about this, but your experience may vary.</p>
<p><strong>Second, there’s the exploration.</strong> If you’re working on a mathematical result, this could mean scouring the literature for papers about this topic to see if there’s a place you can add insight. If you do experiments, my guess (not being an experimentalist myself) is that you start looking if measuring this change or quantity is even feasible. If you’re like myself and work in theory, then you might try some initial numerical simulations or play with the equations of your system.</p>
<p>In this stage, you will discover more about your idea. In fact, this is how you know if the initial idea is any good. Sometimes, nothing comes of it, and it dies at the exploration stage. In other cases, exploring triggers new ideas, and maybe even brings along a few surprises.</p>
<p>Exploring takes time. In my case, this often means writing code to test hypotheses I have before moving on to working with mathematics. I like the rapid prototyping of building a small numerical experiment and seeing if what I’m thinking even works. If it does, I’m encouraged to keep digging.</p>
<p>Dead ends often occur here. That’s because you’re exploring. The key is to not get frustrated by things not working out. That’s research, and it’s often confusing. <a href="https://handwaving.github.io/458">I liken it to stumbling around in the dark with only a flashlight that emits a very thin beam.</a> Sometimes you’re lucky and make progress right off, but often you’re forced to stumble in a few directions before anything looks promising.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/c_scale,q_auto:best/v1535842782/Handwaving/Published/Wandering.png" alt="A scientist wanders in the dark with a flashlight and says, “I thought I would be making frequent breakthroughs.”. Caption: Science: Mostly just wandering in the dark." /></p>
<p>I’ll repeat: Failure is <em>normal</em> here. Don’t give up! Use your knowledge of your idea to gauge if you should keep on exploring. If it was a wacky idea, then maybe you give up on it quickly. But if the idea makes sense from multiple viewpoints, you may work on it for longer.</p>
<p>During the exploration stage, you will notice things. Perhaps that numerical experiment didn’t pan out, or this one found something strange you can’t explain. Maybe you found an earlier result in the literature that has some relevance to what you’re doing, though you don’t quite understand it. I collect all of these, because they will be what I use to judge if I choose to continue to the next stage.</p>
<p><strong>The third stage is refinement.</strong> If you succeeded in finding several useful bits of information, then you want to take these further. To give the example of numerical experiments, you might want to scale up to larger system sizes to demonstrate a result in the thermodynamic limit. Maybe you want more samples so your results are more robust. Maybe you found out how to extend a minor result in another paper, and have hope that this can translate into a major result.</p>
<p>The purpose of this stage is to take the information you collected while exploring and refining it. This is what takes you from the realm of a few different scattered results to a publication.</p>
<p>A key consideration here is <a href="https://cotejer.github.io/finding-the-story">the story you want to tell</a>. Since you’re getting closer to turning these results into a finished project, knowing your story is important. I wrote about <a href="https://cotejer.github.io/language-of-science">this last month</a>, and the important part is to know how to pitch your story to an audience. This will shape which bits of information you emphasize, and what you refine.</p>
<p>One caveat is that you often don’t go through these stages linearly. You will often hop back and forth between them, as you find out more about your project. This is makes a project take much longer than you would think. Maybe a result you were refining didn’t work out, so you have to try another path. Maybe your numerical results aren’t as strong as you hoped, so you have to rethink your approach. These things happen, and it’s why you need to be resilient as you continue your project.</p>
<p>In my own experience, this can be the low point of a project. You thought you had exciting results, but now that you have to polish them as much as possible, they don’t appear as nice. I’d say this is a normal feeling, and it’s good to keep in mind the overarching purpose of the project during this time. Not only will it help you craft your story, but it will reinvigorate your work.</p>
<p><strong>The fourth and final stage is writing</strong>. This also includes revising, and it’s what you do once you have a story, some results, and are confident that they are important enough to warrant publication<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. This might be the last stage, but it’s also the trickiest if your goal is to truly serve your reader.</p>
<p>At this stage, there’s no hiding from unclear concepts. If your work relies on something you don’t understand well, it will come up again and again as a point of confusion. That means you want to understand all aspects of your project (or if you’re working with many people in a collaboration, there is always someone who understands each piece). In my own experience, this means I have to refine my results even more, to the point that I am confident I can answer most questions about them.</p>
<p>As you write, you will notice places where your understanding isn’t clear. Writing (and rewriting!) clarifies this. And yes, even for someone like myself who loves to write, this can be a painful stage. You will look at the same words and ideas over and over again, until they are seared in your brain. So make sure you take frequent breaks to make this rigorous process bearable.</p>
<p>At some point, you will declare victory on the manuscript, which will not only be polished, but every <em>result</em> within is polished as well. A finished manuscript doesn’t just mean you found the right words, it means you found the right story, the best way to present the data or results, and connected it back to other ideas your audience knows.</p>
<p>This all takes work, much more than I expected when I wrote the first draft of my first PhD project. The “fun” part of being a scientist may be the exploration stage, but as professionals, we need to turn those explorations into pieces of knowledge.</p>
<p>The journey is nonlinear, confusing, and prone to restarts.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/c_scale,q_auto:best/v1535842782/Handwaving/Published/Journey.png" alt="The journey of a project is often a big mess, and we only show the smooth version." /></p>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>I’m not necessarily a fan of having this be the unit of work that scientists judge each other by, but that’s another essay. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>This is not to say that “little” results aren’t important. They are! In fact, I kind of wish we made more little results known. I’m writing this essay more from the perspective of someone wanting to publish a paper. In that case, many journals will want a certain level of novelty in the work. But I think it’s great if you want to package your result as a blog post or a code repository. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Tue, 25 Jan 2022 00:00:00 +0000
https://cotejer.github.io//journey-to-completion
https://cotejer.github.io//journey-to-completionThe Language of Science<p>When communicating a new idea to someone, use the language they’re familiar with.</p>
<p>Seems reasonable, but as soon as you go out into the world, you see plenty of examples where this doesn’t happen. In this essay, I want to describe what happens when we commit this error in the realm of science.</p>
<p>Grab a physicist, and force them to sit through a mathematics seminar. Assuming they know the basics of the topic to follow the equations, I bet they will tell you, “I don’t know what this person was going on about! The way they presented things was just…weird.”</p>
<!--more-->
<p>Why does this happen? It’s not an issue of being able to understand the technical details, since most researchers can pick up the basics. One hypothesis is that the problems are so remote that the listener has no desire to understand. For example, if you’re a statistical physicist working on phase transitions, you may not care too much about developments in number theory.</p>
<p>While this plays a role, I don’t think it’s enough to explain the entire phenomenon. Instead, I offer a second hypothesis: the languages mathematicians and physicists use are different.</p>
<p>In fact, it’s worse than this. Even within the a given scientific arena such as physics, we have different specializations that have their own particular terminology. It doesn’t take long for jargon to sprout up, which stops outsiders from understanding your field.</p>
<p>Jargon is useful, until you have to present your work. Then, it’s critical to find the right jargon to fit your audience. If you’re talking to specialists within your field, using the technical jargon is fine. If you’re talking to scientists outside of your field, then use the language they are familiar with. And if your audience doesn’t have any scientific background, avoid technical terms as much as possible.</p>
<p>Don’t try to convert them to use your language unless it’s necessary. Forcing the audience to understand a new language means you have two battles to fight: getting the person to learn your way of seeing the world <em>and</em> accepting the new idea. The former is an unnecessary mode of failure.</p>
<p>I learned this last month while giving my first PhD seminar. The physics department at the Université de Sherbrooke is a mix of quantum physicists, statistical physicists, and condensed matter physicists. There’s a whole wardrobe of words and concepts that statistical physicists expect to see when they hear a new idea. These include “ensemble”, “averaging”, “correlations”, “order/disorder”, “phase transitions”, “couplings”, “configurations”, and “spin systems”. If you forgo these words when speaking to them, it makes communicating much more difficult.</p>
<p>I made the mistake of not fully speaking their language. My project<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> is at the intersection of quantum theory and statistical physics, but instead of speaking to my audience with their language, I retreated to the more familiar language of linear algebra and Boolean satisfiability. This was comfortable for me but not to them, which meant many did not get everything they could have out of my presentation.</p>
<p>Next time, I’ll make sure I keep the audience in mind as I craft my presentation. Not by trying to get them to understand <em>my</em> language, but meeting them where they are. That way, I can maximize the chance that the audience understands what I’m saying.</p>
<p>The second place I learned this lesson was while writing my paper for the project. I went through multiple drafts, significant revisions, and rewrites. <a href="https://arxiv.org/abs/2112.06939">The paper is out now.</a> It’s only four pages, but the effort put into those four pages was immense.</p>
<p>A big reason why it took so long to write is that my supervisor kept insisting we used the language of statistical physics. At first, this irritated me. I had written an early first draft that used the language of linear algebra, mentioned Boolean satisfiability, and had all the details. Why did I suddenly have to change everything and twist it into this other language?</p>
<p>But as time went on, I saw the wisdom in his suggestion. Our target audience all along was statistical and quantum physicists. We weren’t writing a computer science paper. I couldn’t have it both ways. Either I used the language of computer science and wrote for computer scientists, or I used the language of statistical physics and wrote for statistical physicists. In the end, I chose the latter.</p>
<p>Instead of using the word “solution”, I’ll often use “ground state”. Instead of using “parity”, I try to use “couplings”. We wrote a Hamiltonian for the system instead of just an abstract matrix equation. These are all little details, but they make a difference to the reader.</p>
<p>Thinking of my own experience reading papers, it’s usually a process of attrition. If I feel as though I can’t understand every second sentence, it’s frustrating. Every new stumbling block increases the chance I will just give up on a paper. This is why it’s important to know who your work is for. If you know your audience in advance, use the words they know. Make it easy for them to follow along.</p>
<hr />
<p>Communication is about transplanting an idea you have in your head to the mind of someone else. That’s all there is to it, but that doesn’t mean it’s easy. Every scientific field has their own conventions, norms, and language. If you want someone to understand you, it’s worth taking the time to stack the deck in your favour. Use words they know. Speak their language. Give examples that connect to concepts they are familiar with. Know your audience.</p>
<p>The words you use dress up your projects. Scientists are like any other humans: they respond to ideas they are familiar with. By definition, research is about breaking new ground. If you want others to be receptive of your work, it’s probably worth using the language they are familiar with to bring them your exciting breakthroughs.</p>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>I will write an essay about the project. There are some neat connections I want to share, and I’ll use the essay to expand on the details that didn’t make it into the paper itself. So stay tuned for this. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Sat, 25 Dec 2021 00:00:00 +0000
https://cotejer.github.io//language-of-science
https://cotejer.github.io//language-of-scienceGood Questions<p>I’ve been thinking a lot about how I present my work. When a family member or friend asks what I do, the interaction tends to look like this:</p>
<p>Them: “So, what do you do?”</p>
<p>Me, looking away and mumbling: “Research.”</p>
<p>A long pause, and then them: “Teacher?”</p>
<p>Wrong, but close enough. “Sure, in a way.”</p>
<p>But I’m a scientist, not just a teacher. Working on puzzles is what I do. So I want to take a crack at describing what I do, and what I don’t do.</p>
<!--more-->
<p>What’s the most important part of the job?</p>
<p>In my eyes, it’s chasing the questions. <em>Not</em> the answers. It’s easy to pitch scientific research as a machine whose output is an answer to a question. Think of technology that evolves from scientific discoveries. But that’s not the <em>only</em> part of science. I believe the questions are often just as important (if not more) than the answers.</p>
<p>Questions inspire. They captivate us, and bring us on a journey. Does the shape of DNA reflect something special out of all possible configurations? Are there “critical” nodes in a network which will affect a large portion of the network if they were to go offline? How do snowflakes form? Why don’t trees grow forever? How long is the coast of Britain? (See Endnote<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>)</p>
<p>These are all questions that prompt you to reflect. They make you think about energy landscapes, criticality, symmetry, forces, and measurement. Each one is an entry point in a rich mine, one that you can keep finding treasure.</p>
<p>The power of a good questions is threefold.</p>
<p>First, it provides a provocative framing which excites you about the question. This is of course subjective, but a good question makes you at least pause and ponder. It should be jarring, demanding attention and then sucking you in. The consequence of a good question is that it becomes the storyline pitch for your work. Sometimes, a great question even breaks through the scientific jargon of a field and becomes something anybody can understand. This is the mark of a great question: Simple, yet hiding great depths.</p>
<p>Second, a good question opens up a line of inquiry that might not be obvious, but makes sense as soon as you reflect on it. What is the time required<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> for a quantum measurement? For anyone who has taken a quantum theory course, you know that measuring a quantum system collapses its superposition in that basis to a definitive state. But how quickly does this happen? Is a quantum measurement instantaneous? Once you start reflecting on these questions, you realize they are thornier than they appear at first glance.</p>
<p>Third, a good question frames the issue in its most stark form. A good question might not capture all the subtleties of an issue, but it does highlight the main issue. Why is it sometimes difficult to count the solutions to easy questions? This question<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> gets at the heart of computational complexity, the difference between decision problems and counting problems, and the algorithms humans come up with for tackling them. All from a question which stems from an observation.</p>
<hr />
<p>Finding good questions is challenging. They need to be aspirational, while still offering direction. At this point in my PhD, I’m spending more time thinking about questions which provoke me, which force me to pause and ponder.</p>
<p>Having friends to bounce ideas off is a great way to generate more good questions. Look at how the questions excite others. This offers a great litmus test for what strikes a chord for people.</p>
<p>The unfortunate reality is that, a year into my PhD, I haven’t seen much evidence in people forcing me to work on asking good questions. It’s something that apparently you need to do on your own.</p>
<p>But asking good questions is a skill. Yes, you might stumble upon a question with a lot of rich results by accident. But can you do this on <em>average</em>? Working on that skill, of knowing where your lack of understanding is and asking sharp, illuminating questions, is something we should value as scientists<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>
<p>I’ve begun keeping a list of questions that fascinate me. Some are simple, some are more complex, but I keep this list to remind myself about what draws me into science. I’m not here to dig into one subfield and stay there my whole life. Instead, I’m here to think about the possible questions that are out there, and make my own contributions.</p>
<p>I want to chase the questions. And I argue that you should, too.</p>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>This question is the title of Benoit Mandelbrot’s <a href="https://science.sciencemag.org/content/156/3775/636.abstract">famous paper</a>, which brought about the study of fractals. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>You can read about this from a nice <a href="https://www.quantamagazine.org/quantum-leaps-long-assumed-to-be-instantaneous-take-time-20190605/">Quanta Magazine article</a> by Philip Ball. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>You can find an example of this with the <a href="https://en.wikipedia.org/wiki/2-satisfiability#Complexity_and_extensions">problem “2-SAT”</a>, which involves logical formulas of the form (x or y). Deciding if a given set of constraints (a formula) is “easy” (it’s in P), but counting the number of solutions is “hard”. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>There are people who can be great scientists who don’t come up with their own problems. Someone gives them a problem, and then they use creativity to find the answer. This scientist still makes a great contribution. It’s just not the mode of operating that I want to spend my whole life in. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Thu, 25 Nov 2021 00:00:00 +0000
https://cotejer.github.io//good-questions
https://cotejer.github.io//good-questionsFinding the Story<p>Science is a game of exploration, but it’s also a game of communication. If you make a groundbreaking discovery but you lack the skills to communicate it, your discovery won’t amount to much. I suspect most scientists will agree with this sentiment. Communication is important if we want our results to diffuse into the broader community. We also need to communicate when applying for grants and scholarships, or otherwise selling the value of our work. Each time we answer the question “Why does this matter?”, we’re communicating our values.</p>
<p>And yet, we don’t teach the art of scientific communication. Instead, we rely on students figuring things out as they go. We expect graduate students to just “level up” from writing laboratory reports and essays as undergraduates to writing papers for the scientific community. Not only is this a tall expectation, I think we miss out on communicating we should value when writing.</p>
<p>I’m thinking about this within the context of my own PhD. I’m done my first project, and I’m in the stages of writing. Despite my love for writing, there’s no doubt this process has been the most tedious and exhausting part of the project. Through multiple drafts, rewrites, and so much editing, I want to discuss the topic of <em>finding your story</em>.</p>
<!--more-->
<hr />
<p>Imagine you’ve spent many months in the thick of a problem. You ran simulations, studied the equations of your model, and made discoveries while analyzing your data. You’ve written tons of code, have a folder filled with a zillion figures and data, and even have a few derivations written down. You have all this <em>stuff</em>, under the label of your Project.</p>
<p>What’s next?</p>
<p>The instinct might be to dump everything onto the page. To explain all you know, perhaps in the order you discovered it, and every tiny detail. You want to give everything it’s place, and make sure that whoever sees your project will appreciate just how much work you’ve put into this project.</p>
<p>This approach might work, but I want to point out that it will involve a lot of chipping away. Think of a sculpture out of stone, with the artist chipping away at the material to reveal the masterpiece. In this scenario, creation is a process of subtraction, of removing elements from your giant collection of data points, equations, and figures that you have.</p>
<p>Going this way is fine, but you will often put in a lot of work, only to strip it away later. Why? Not everything belongs in the finished scientific artifact. If you want to make something that isn’t just a record but an <em>artifact</em>, you need to take care in what you include.</p>
<p>In my mind, there are three “levels” at which a person approaches a scientific paper:</p>
<p>Story -> Methods -> Technical Details</p>
<p>As we go from left to right, we read more deeply. Each is worth its own essay, so I will only focus on the story for now.</p>
<p>Think about the big folder you have with your data and research objects. If I asked you to summarize what’s in that folder, what would you say? What did you learn from doing all this work? What was missing in the community before you came along and built up these objects?</p>
<p>The story level is where you settle on the format for your project. Everything else flows from it. The story dictates how you introduce the project, the way you juxtapose it with the rest of the community, and how you hook the reader into wanting to know what you discovered.</p>
<p>Every good scientific project has a story. If you want to craft the best possible project, I don’t think you can forget about the story. And this should come <em>first</em>, because it’s the roadmap for everything else<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>The key question at this level is: What makes a good story?</p>
<p>A tempting option is the “autobiographical” story. This involves sharing the details of the science as you came across them. This sometimes comes in how you introduce the project, the presentation of a model, and the sequence of figures and results. But almost nobody cares about how you came across your discoveries. Unless it’s a truly spectacular autobiographical story, we tend to care about the results. What did you discover? Why is this important? How does it change our knowledge of the world? Building your story around those questions will likely bring you more success than sticking to the autobiographical narrative.</p>
<p>I don’t have the answers for how to craft the perfect story, but here is what I think about.</p>
<p>First, where is the tension? This comes from the lack of information in your research community. Where’s the gap? What’s the challenge that you will address? Tension is how you introduce the problem of the story. In an abstract, you might spot this as, “X occurs across many fields, but there is no model which captures X.” In the second half of this sentence, there’s a challenge. Nobody created a model that works, and perhaps your paper is addressing this issue.</p>
<p>Second, each supporting explanation and result helps reinforce the tension. It’s one thing to introduce tension in the abstract and introduction, but can you maintain it throughout the story? Think of fiction: we keep reading because there’s an expected payoff. The characters of the story will experience the tension until they have the tools to deal with it. In a scientific artifact, this is where you present your work and how it eases the tension.</p>
<p>Third, a good story is easy to keep in mind. I think of this as having a small amount of RAM in my brain. To understand a complex problem, I need to spend a bunch of time creating elaborate structures in my head that help me understand what I’m looking at in a way that’s effortless. But the reader doesn’t have this in their head. Unless your paper builds this up step by step, it might be worth simplifying your story so that the reader connects with it. The other benefit here is that it’s easy for them to share it with another person.</p>
<hr />
<p>After spending weeks and months looking at a text editor, manipulating the raw data and equations in your project, and learning the shape of your work, you have a micro view of the project. Maybe you’ve grown fond of a particular part of the code you’ve written, or you think this neat mathematical trick is worth sharing. And it is, but not as the story. By spending so much time in the trenches of your project, you can’t see the whole thing. You forget why you were doing this in the first place. In other words, you’ve forgotten the story that will captivate the newcomer.</p>
<p>The newcomer doesn’t ask about the nitty gritty details. Rather, they ask story-level questions. What is missing in the scientific community? What does this project highlight? What do we learn? Why do we care?</p>
<p>It’s only once you convey those to the newcomer that they will even ask for the details. But if they have no connection to the story, why in the world would they even care about the technical details?</p>
<p>Remember, the newcomer doesn’t have your months of experience. Just by working on a problem you become attached to it. The newcomer has none of that, and plenty of other things they could do with their time. Why is your work worth pondering? Why is your story any good?</p>
<hr />
<p>As I said at the outset, I’m learning these lessons as I go. Whether I’m writing a paper, writing an essay for the blog, or crafting a presentation, I think of the story.</p>
<p>Science is about discovery, yes. But people don’t know about discoveries unless they are excited by them. And excitement comes from finding a great story to tell.</p>
<p>You’re a scientist. But you’re also a communicator, and it’s worth spending the time doing the work of a writer when crafting your scientific artifact.</p>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>The story will sometimes change as you craft your scientific artifact. It happened in my own project, and the painful truth is that this often requires a significant reworking of the artifact. The more objective parts will stay the same (for example, the introduction of a model), but the parts which lead the reader into the story will change. This is why it’s worth spending a lot of time on the story first, so everyone on the project is happy and on board with where the project is heading. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Mon, 25 Oct 2021 00:00:00 +0000
https://cotejer.github.io//finding-the-story
https://cotejer.github.io//finding-the-storyA Quantum Summer<p>After spending a long time on my PhD projects during the past year, I wanted to break away from them and work on something new. That’s why I applied to the <a href="https://www.lanl.gov/projects/national-security-education-center/information-science-technology/summer-schools/quantumcomputing/index.php">Los Alamos Quantum Computer Summer School</a> (I’ll call it QCSS now). The summer school was in its fourth edition, and was virtual like the one last year.</p>
<!--more-->
<p>First, some preliminaries. Despite the name, QCSS is not <em>really</em> a summer school. It’s closer to a research internship. It’s ten weeks long, and during that time you work on projects with the other students and your mentor. The goal of QCSS is to get a paper out of the project. In fact, the goal is often to finish the project <em>during</em> those ten weeks, so QCSS can be quite intense. There are a few lectures in the first weeks, but QCSS isn’t for teaching quantum computing. The lectures are by researchers on the latest in quantum research, so people are expected to know the basics.</p>
<p>QCSS is composed of undergraduates, graduate students, postdocs, professors, and others researchers. It’s also paid, which is a nice bonus.</p>
<p>The program is competitive. I first applied to QCSS just after finishing my master’s degree last year, and got rejected. I reapplied this year and got in, so for those that apply, don’t stop if you’re rejected the first time!</p>
<p>My goal with this essay is much like my <a href="https://cotejer.github.io/psion">PSIon essay</a>: explain the program for students who want to apply. There’s a shortage of blog posts out there for QCSS (particularly since its only in its fourth year), so I’ll fill the gap.</p>
<p>My experience was virtual, so I didn’t go to Los Alamos. Keep that in mind when you read.</p>
<p>With that, let’s jump in.</p>
<h2 id="projects">Projects</h2>
<p>QCSS is all about projects. Each person gets one project, but can join multiple if they want. In my case, I chose to focus on one project. I’m already suffering from project overload with my PhD projects, so I didn’t want to add ten more things. I know others who dabbled with a bunch. It’s really up to you.</p>
<p>You can suggest a project, or your mentor will suggest one. I landed in the latter case. I didn’t know what I wanted to do this summer, and was happy to jump into a new project. The mentors of QCSS choose the students based off their potential <em>and</em> their skills. The latter is important for project choice. I wrote on my application that I knew about tensor networks and how to code them. I’m pretty certain this led to the project I worked on. I had specific expertise that my mentor was looking for.</p>
<p>Some projects are long, while others are shorter. There’s also a big difference in the number of people working on them. Most projects have a few people, though some are more. My project had two mentors and two students, which was nice because I got to ask a bunch of questions and get help while building the code for the project.</p>
<p>In terms of project topics, they span the gamut of quantum computing. Some deal with quantum machine learning, some are more mathematical, some deal with physical systems, and others are on optimization. If you look at the link above for QCSS, you will see some of the projects in previous years.</p>
<h2 id="mentors">Mentors</h2>
<p>Each student has a mentor, assigned at the beginning of the summer school. They make sure you have everything you need, and help advise on the projects.</p>
<p>I didn’t have a lot of mentoring in the sense of one-on-one meetings outside of the project. That was fine with me, since I didn’t ask for it. The extent of mentoring probably depends on who you have as a mentor as well. That being said, my mentor was great in that they answered any question I threw at them, so I was happy with that.</p>
<p>Everyone is happy to discuss all things quantum. If there’s one thing I regret, it’s not discussing <em>more</em> with the mentors during the summer.</p>
<h2 id="students">Students</h2>
<p>This edition of QCSS has twenty-six students. We didn’t have the bonding activities that I imagine occurred during in-person editions, but we all kept touch in the Slack channels. Every project was open, letting us see what others were working on.</p>
<p>My conversations with the other students were always positive. There were a few “coffee break” sessions where we grouped up to have virtual meetings. These were a little awkward (as I’ve found most social gatherings online are), but everyone I talked to was lovely.</p>
<p>I also loved how diverse our collective expertise was. Some students knew a ton of algebra and mathematics. Others thought a lot about optimization and machine learning. Some liked working on analytical problems, while others (like myself) worked on computational problems.</p>
<p>With QCSS, I appreciated how we all have different and complementary skills. Everything helps in making a project move along. It’s not just about knowing all the theory. That’s why we do research: to learn and build off each other’s strengths.</p>
<h2 id="conclusion">Conclusion</h2>
<p>The ten weeks went by fast. After taking a bit of time to understand what was going on in my project, I spent the rest of QCSS trying to push forward in it. I’ll tell you about what I worked on in a future essay.</p>
<p>I met some great people, both within my project and within the whole school. As a new (now, a year old!) PhD student in quantum computing, I will meet these people over and over again during the upcoming years. It’s good to start mingling within these social circles. After all, science is a human activity.</p>
<p>Because QCSS was virtual, asynchronous work ruled. This was fine for me, because my time zone is only two hours ahead of Los Alamos. However, those in Europe and Asia were in a trickier situation. This also meant learning to work with people from multiple time zones, which is a skill we’re going to need much more going forward, I suspect.</p>
<p>If you’re looking to apply to QCSS, my advice is simple: make your application highlight your specific skills. If you know how to code, say that. If you are amazing with group theory, say that. You never know what will be needed for the next edition of QCSS, and this will help you application.</p>
<p>I want to thank my mentor <a href="https://scholar.google.com/citations?user=opZLj2AAAAAJ">Lukasz Cincio</a>, as well as the others who worked on my project: <a href="https://www.unm.edu/~talbash/index.html">Tameem Albash</a>, <a href="https://www.linkedin.com/in/matiasjonsson/">Matías Jonsson</a>, and <a href="https://scholar.google.com/citations?user=mpQ0hgwAAAAJ">Martin Larocca</a>. Also, thank you to the other leaders of the summer school: <a href="http://patcoles.com/">Patrick Coles</a>, <a href="https://omalled.com/">Daniel O’Malley</a>, and <a href="https://scholar.google.com/citations?user=VUHwzlwAAAAJ">Yigit Subasi</a>.</p>
Sat, 25 Sep 2021 00:00:00 +0000
https://cotejer.github.io//qcss
https://cotejer.github.io//qcssThe SATisfying Physics of Phase Transitions<p>For the past few months, I’ve been thinking about the following equation.</p>
<!--more-->
<p>\(A\vec{x} = \vec{b}.\)
Specifically, I’ve been wondering about the possible configurations $\vec{x}$ that solve this system of equations.</p>
<p>If you’ve taken a linear algebra course, you will remember (okay, <em>maybe</em> you will remember) that this equation can have no solutions, one solution, or even infinitely many solutions. To get a handle on these three cases, think about a system of equations of two variables. We can plot the cases as lines in the plane (let’s imagine we only have two equations for now).</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/v1628438403/Blog/SolutionCount.png" alt="No solutions (the lines are parallel and don’t cross), one solution (the lines cross at once spot), and infinitely-many solutions (the lines are on top of each other)." /></p>
<p>If we want to apply an algorithm to find the number of solutions, the way to go is with Gaussian elimination. This is perhaps the most practical thing you learn in a linear algebra class, but of course it’s not something that <em>you</em> should be doing by hand. Instead, we can write some code so that any matrix we feed in will give us this information.</p>
<hr />
<p>Okay, I didn’t tell you the full story.</p>
<p>While I <em>have</em> been thinking about the above equation, I’ve neglected to append an important part to it. Instead, the equation I’ve been thinking about is:
\(A\vec{x} = \vec{b} \mod{2}.\)
The rules for manipulating equations and matrices are the same, but the difference is that now we’re working over the binary field. That’s a fancy way of saying that algebraic manipulations obey:
\(0 + 0 = 0, \\
0 + 1 = 1, \\
1 + 0 = 1, \\
1 + 1 = 0.\)
Plus, any time we have a number, we’re allowed to divide it by two and take its remainder. So $3 = 1 \mod{2}$ and $18 = 0 \mod{2}$. Essentially, this encodes whether the number you’re dealing with is even or odd.</p>
<p>In the binary field, we don’t have to worry about negative numbers because these can always be converted to either zero or one. In fact, everything we deal with will either be zero or one. So no fractions, no irrational numbers. Only ones and zeros.</p>
<p>This simplifies things quite a bit, but still leaves us enough structure to have some fun.</p>
<p>Instead of having finitely many or infinitely many solutions, we now only have finitely many (including zero). The nature of modular arithmetic and only using zeros and ones means we’ve reined in that pesky infinity.</p>
<p>The number of possible solutions can be calculated directly. Because we’re dealing with binary entries in all of our objects, our vector $\vec{x}$ will have exactly $2^N$ possible configurations, where $N$ is the number of variables (each component has two choices, so you get $2\times2\times\ldots\times2 = 2^N$).</p>
<p>To recap, we start with the equation:
\(A\vec{x} = \vec{b} \mod{2}.\)
The matrix $A$ is an $M \times N$ binary matrix, $\vec{x}$ is an $N$-component binary vector, and $\vec{b}$ is an $M$-component binary vector.</p>
<p>Then, we’re going to ask the following question:</p>
<p><strong>What happens to the solution space on average as we increase $M$?</strong></p>
<p>This question will carry us from linear algebra, to theoretical computer science, and finally to statistical physics.</p>
<h2 id="ensembles">Ensembles</h2>
<p>When posed any question that uses the word <em>average</em>, you should reply, “What ensemble are you using?”</p>
<p>Seriously, the ensemble you choose determines everything. Imagine I told you that the average person loves running 100+ kilometres every week. This would probably seem pretty strange. Except the people I asked were all long-distance runners who have been doing this for years.</p>
<p>The ensemble I chose (experienced long-distance runners) informed the sort of average I would then calculate.</p>
<p>In the exact same way, if we want to answer a question probabilistically in mathematics, we should be careful about our assumptions and how we define our ensemble.</p>
<p>Here’s a recipe for drawing a sample from an ensemble:</p>
<ul>
<li>Choose the number of variables $N$.</li>
<li>For each row, choose three columns to contain a one, and set the rest to zero.
<ul>
<li>Repeat for the $M$ rows.</li>
</ul>
</li>
<li>Choose the components of the vector $\vec{b}$ at random from $\lbrace 0, 1 \rbrace$.</li>
<li>If any row is repeated, resample until it’s different.</li>
</ul>
<p>After this procedure, you will find yourself with a matrix $A$ and a vector $\vec{b}$. You can then plug this into your favourite tool to solve binary matrix equations, and see what comes out!</p>
<hr />
<p>Before we start averaging, let’s think about what an equation will do to the configuration space.</p>
<p>At first, we have no equations, so every configuration is a solution. As we saw before, there are $2^N$ of them. After we insert one constraint (a row of $G$), this will specify the parity of the sum of three variables. It doesn’t matter what variables we look at, or how many of them. The parity can be odd or even, so half of the configurations will remain, and the other half are tossed out. The diagram below shows how imposing the parity to be zero selects four configurations, leaving the other four to be discarded.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/v1628438403/Blog/TossingSolutions.png" alt="An example of how a parity selects half of the configurations." /></p>
<p>This happens only when $M$ is small. However, as you make $M$ larger, there are more rows of the matrix which can “interact”. In the language of linear algebra, the rows will eventually become linearly dependent, at which point the solution space will stop being chopped in half for each extra row, but will decay more slowly.</p>
<h2 id="different-disguises">Different Disguises</h2>
<p>I began this essay by talking about linear algebra, but this problem pops up in many different fields of science.</p>
<p>In theoretical computer science, this goes under the name of $k$-XORSAT, where $k = 3$ in our case. This is a satisfiability problem, which means you have a bunch of variables (the vector $\vec{x}$ from above) and then you have constraints (the matrix $G$ along with the parity vector $\vec{b}$), and the question is if you can find a solution to the problem. Answering this is then the same as performing Gaussian elimination and determining if a solution exists.</p>
<p>In statistical physics, we call it the $p$-spin model, where $p = 3$ in this case. Instead of a binary variables, we map them to $\pm 1$, which are the values of the particles’ spins. For any variable $x_i$, the spin variable is $s_i = (-1)^{x_i}$. The matrix $G$ tells us how the particles interact (the constraints), and the vector $\vec{x}$ is the spin configuration of the system. We can then define a Hamiltonian (think of this as an energy function), which has its lowest energy when all interactions are satisfied. Finding a solution to in the $p$-spin language is about finding a ground state of the system.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/v1628441806/Blog/Disguises.png" alt="The different ways to see the problem. On the left we have the linear algebra view, on the right we have the p-spin model, and in the middle we have the XORSAT view." /></p>
<h2 id="the-phase-transition">The Phase Transition</h2>
<p>Hopefully you’ve thought about what happens to the presence of solutions as we increase $M$. Actually, it’s better to talk about the parameter $\alpha \equiv M/N$, since this takes into account how big the system is (the larger the number of variables, the more equations you should be able to add before completely constraining everything). Once we’ve done this, we can ask what the probability of having a solution looks like as a function of $\alpha$. And when I say the word “probability”, I’m thinking of looking at many samples of a matrix $G$ and parity vector $\vec{b}$, and then averaging.</p>
<p>Here’s what it looks like, as an animation:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/v1628438403/Blog/Transition.gif" alt="The phase transition which happens at a critical value." /></p>
<p>It’s a drastic change. Either you will certainly have a solution, or you won’t. There’s a very brief transition period where the probability goes from one to zero, but this becomes smaller and smaller as $N \rightarrow \infty$.</p>
<p>To me, I can’t help but yearn for an explanation for why this is happening.</p>
<hr />
<p>Here’s one perspective. As you add more and more vectors (rows) to your matrix $G$, there will come a point where some of these rows will become linearly dependent. At this point, the values of the parities for these rows will become super important. If they aren’t set the right way, there will be no solution (since there will be a contradiction that can’t be resolved). And since $\vec{b}$ is chosen uniformly at random, there will often be no solution. Average this over a bunch of matrices, and you will get a curve like the animation above.</p>
<p>Where should this happen? An upper bound that seems reasonable to me is $\alpha_c = 1$, since that is where a matrix can become full rank, and therefore adding more rows makes them linearly dependent. However, since we’ll likely start having dependent rows sooner, the threshold will be lower than that.</p>
<p>There’s also a perspective related to graph theory. It has to do with a notion of hyperloops, but this will take us a bit further than I want to go. If you want to learn more about it, see Endnote 1.</p>
<hr />
<p>But here’s the catch: If we change the equation to $A\vec{x} = \vec{0}$, we <em>still</em> get a phase transition like the one above. The difference is that now the number of solutions is always at least one (since $\vec{x} = \vec{0}$ is always a solution), so the transition is picked up with a different measure.</p>
<p>And that measure is found by taking the statistical physics perspective.</p>
<h2 id="who-ordered-that">Who Ordered That?</h2>
<p>Order, symmetries, and large-scale structures are the bread and butter of statistical physics. We want to go away with the pesky details, and instead focus on the big picture.</p>
<p>The concept of magnetization is one way to measure something about the system as a whole. Like a person who just learned about hammers and now sees everything as a nail, magnetization is used in many contexts. But at its heart, magnetization is a way to measure <em>similarity</em>.</p>
<p>Imagine we have a bunch of particles which can take spin values of $s_i = \pm 1$, where $i$ is just the label for a particle.</p>
<p>Many models we have in physics have rules that ask for the spins to align. These often go by the name of Ising models, and they are the iconic models for statistical physics. The Ising model usually involves some system of spins, with neighbouring interactions. The spins “want” to align, but when the temperature of the system is high, they have energy to not align. In fact, the spins will basically be distributed randomly. In two-dimensions, the model undergoes a phase transition as you lower the temperature. The spins go from being randomly distributed to being either all pointing up ($s_i = +1$) or down ($s_i = -1$). This happens at a critical temperature called $T_c$.</p>
<p>The measure we can look at here is the magnetization of a sample, which is just the average of the spin values:
\(m = \langle s_i \rangle.\)
If $m = \pm 1$, this means the spins are all aligned (either up or down). Whereas if $m = 0$, then there are an equal number of spins that are up and down.</p>
<p>For an absolutely marvelous discussion about magnetization (with beautiful animations), I highly, highly, <em>highly</em> recommend Brian Hayes’s latest essay, <a href="http://bit-player.org/2021/three-months-in-monte-carlo">“Three Months in Monte Carlo”</a>. If you look at Figure 2, you will see the phase transition as a function of the temperature. I’ve drawn what it looks like below.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/v1628438403/Blog/MagnetizationTransition.png" alt="The magnetization of Ising models." /></p>
<p>We need to be careful when interpreting this diagram. It’s what happens for <em>one</em> sample of an Ising model. As you decrease the temperature, the sample will “decide” to either go towards $m = +1$ or $m = -1$. This happens differently depending on the particular random details of a sample. The problem is that if you start averaging your value of $m$ over many samples, you will find the whole curve is flat at $m = 0$. This happens because the samples going to $m = \pm 1$ as we cool the system will cancel each other out in the averaging.</p>
<p>Combatting this is straightforward: Instead of plotting $m$, we can plot $m^2$ or $\lvert m \rvert$. Just keep this in mind if you’re trying to implement a simulation like this in code and are getting confused as to why you don’t see the transition.</p>
<hr />
<p>Let’s take a look at how the magnetization changes as we increase $\alpha$.</p>
<p>It turns out that for our purposes, plotting just $m$ is fine. However, because of a technical aspect of the sampling, it will be better to plot something slightly different:
\(q = \langle \left[s_i\right]^2\rangle.\)
This is called the spin glass order parameter (see Endnote 2). There are a few layers of abstraction here, so let’s break them down.</p>
<p>The square brackets are used to find the magnetization per site. Concretely, they indicate an average taken over the same site for different solutions to the equation $A\vec{x} = \vec{0}$. This result will always be between $\pm1$.</p>
<p>If you get a result of $\left[ s_i \right] = \pm 1$, this indicates that <em>all</em> configurations have the same value for this variable. On the other hand, $\left[ s_i \right] = 0$ tells us that there is no tendency for variable $i$ to be either value.</p>
<p>So while there are three values that describe the “extremes”, really there are two extremes: every configuration takes the same value for a given variable, or there is no preference.</p>
<p>To capture this numerically, we can simply square the result. If there is no tendency for a variable to be one value or the other, this will still be true after squaring. But now we won’t discriminate between variables which have been “magnetized” to $+1$ versus $-1$.</p>
<p>The average in angular brackets now tells us to average this overlap quantity over <em>all</em> the variables. This makes it super simple for plotting purposes, since we have just one number. However, you could forego the final average and just plot the average spin-glass order parameter as a probability distribution over the different variables (giving you a histogram-like result).</p>
<p>To recap, there are three averages being done:</p>
<ol>
<li>An average over <em>configurations</em>.</li>
<li>An average over variables.</li>
<li>An average over samples.</li>
</ol>
<p>If you picture these configurations as arrays, it looks like this for a given sample:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/v1628438403/Blog/Averaging.png" alt="The averaging over configurations versus over variables." /></p>
<p>When calculating the order parameter, we have to look at many matrices $A$ for the curve to smoothen out like in the animation. (I promise we’re now done with the averaging!)</p>
<p>If you carry all these steps out, then you will be rewarded with a beautiful plot like this.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1629042756/Blog/orderParameter.png" alt="The phase transition by looking at the order parameter." /></p>
<p>And there’s the transition! So the order parameter picks this up quite nicely. (Note that this is for $N = 100$, )</p>
<p>After the critical threshold $\alpha_c$, the variables all become magnetized, gravitating towards the solution $\vec{x} = \vec{0}$. The interpretation of the transition is the following: All of the remaining solutions after the threshold “cluster” around the solution $\vec{x} = \vec{0}$.</p>
<p>Where exactly is the transition located? It turns out to be approximately at $\alpha_c \approx 0.918$, which is the in specific case of $p = 3$ for our model. I won’t go through the details of how to find it here, but check the References for the paper that shows this.</p>
<h2 id="clustering">Clustering</h2>
<p>To get a feel for the transition, I wanted to make an animation focusing on one specific sample. It’s a 900-variable model, and I’ve arranged it in a 30x30 square for convenience. Don’t take too much stock in the arrangement of the variables, which are shown as squares. Remember, the interactions can occur between any three of the variables, so this is just for show. I’m plotting the order parameter $q_i$ at each site, which is the same as the equation for the order parameter $q$ without the final averaging over all the sites (the angular brackets). Watch for transition, which occurs at $\alpha_c \approx 0.918$.</p>
<p><img src="https://cotejer.github.io/images/phasetransition/magnetization.gif" alt="The order parameter for a single sample." /></p>
<p>Dramatic, no?</p>
<p>Notice how the variables are mostly undecided between values, indicated by their lighter colour. However, as we approach the critical threshold, there is suddenly an influx of darker squares, and it soon takes over the board.</p>
<p>Watching this is very satisfying. It gives me a sense of what’s happening in the transition, with each site being free to do whatever it once until the threshold is reached. At that point, there is a force pushing all of the variables to match each other between configurations, until we have only one solution left. (There are still multiple solutions in the animation above, since not all squares become dark. This is because I didn’t go to a higher value of $\alpha$.)</p>
<p>When you keep adding more rows to your matrix $G$, the moral of the story is that you will eventually reach a critical point where all of the solutions have the same values for most of the variables.</p>
<h2 id="satisfiability">Satisfiability</h2>
<p>This is just the tip of the iceberg when it comes to <em>satisfiability</em> problems. These are problems with constraints and a vector $\vec{x}$ that attempts to satisfy them.</p>
<p>The setting of $k$-XORSAT is nice because we can do large simulations through Gaussian elimination, which is an efficient algorithm. In fact, the existence of Gaussian elimination is what makes $k$-XORSAT a problem that’s in the complexity class P. Many other satisfiability problems are in NP, but they often show the same sort of phase transition. Since we can analytically wrap our heads around $k$-XORSAT though, I figured this would be a good starting point for the curious learner. Plus, the statistical physics version of the model is easy to think about, making it an attractive starting point.</p>
<hr />
<p>What began as a simple question about solutions to a matrix equation lead to a discussion about statistical physics, magnetization, and the nature of Gaussian elimination. The phase transition tells us something really specific about the average behaviour of these systems. You may have thought that all matrices are their own beasts, but it turns out that they can be remarkably similar in their behaviour. The perspective of phase transitions teaches us that abstracting away from the particulars can lead us to simple explanations of complex phenomena.</p>
<p><em>Thank you to Grant Sanderson of <a href="https://www.3blue1brown.com/">3Blue1Brown</a>, James Schloss of <a href="https://www.youtube.com/user/LeiosOS">LeiosOS</a>, and everyone else who made the <a href="https://www.3blue1brown.com/blog/some1">Summer of Math Exposition</a> happen! All of us in the community appreciate what you’ve done.</em></p>
<h2 id="endnotes">Endnotes</h2>
<ol>
<li>The notion of hyperloops and how they affect the $p$-spin model can be found in Section V of <a href="https://arxiv.org/abs/cond-mat/0011181">this paper</a>. I will warn you though: the diagrams are so old that they are <em>very</em> bad. Honestly, understanding the notion of a hyperloop gave me a headache looking at Figure 1. I might have to write an essay on this idea just to give people a better introduction!</li>
<li>The spin glass order parameter is not consistently defined if you scour the literature. The idea is to have something that captures the overlap of different solutions (sometimes called replicas). You can look at only two replicas, or many. Here I used many, but the effect is robust with two (though it takes more samples to get the curve to smooth out for the averaging).</li>
<li>Some technical tidbits. Because the number of solutions is exponential in $N$, I couldn’t do the full averaging over solutions that would be required for the animations and plots. Instead, I made a compromise. I set the number of sampled solutions to be the minimum between 100 and the number of remaining solutions. This mean that I would look at 100 configurations for every matrix at a given $\alpha$, unless the total number of solutions was fewer than this, in which case I took them all. This makes things a bit more memory efficient, and shouldn’t affect the results much.</li>
</ol>
<h2 id="references">References</h2>
<ol>
<li>I already mentioned Brian Hayes’s essay <a href="http://bit-player.org/2021/three-months-in-monte-carlo">“Three Months in Monte Carlo”</a>, but I would recommend all of his essays on <a href="http://bit-player.org">Bit-Player</a> if you’re the type of person who loves reading about computation.</li>
<li>A recent two-part blog post on the theory of replicas can be found <a href="https://windowsontheory.org/2021/08/11/replica-method-for-the-machine-learning-theorist-part-1-of-2/">here</a> on the wonderful blog <a href="https://windowsontheory.org/">Windows on Theory</a>. In the post I’ve linked to, they briefly discuss the satisfiability phase transition, as well as work out some of the pesky integrals needed to analyze the behaviour of these ensembles. The post is about machine learning, so this gives you an idea of how broad these ideas are!</li>
<li>The paper <a href="https://arxiv.org/abs/cond-mat/0207140">“Alternative solutions to diluted p-spin models and XORSAT problems”</a> gives a more theory-based overview of what I covered in this essay. In particular, the paper shows how to derive the threshold $\alpha_c$ by finding the solution to a transcendental equation (in the paper, it’s Equation 42).</li>
</ol>
Wed, 25 Aug 2021 00:00:00 +0000
https://cotejer.github.io//phase-transitions
https://cotejer.github.io//phase-transitionsThe Clarity of Brevity<p>As a writer, I’ve spent a lot of time crafting sentences and paragraphs. Writing a sentence is easy. Writing a sentence that communicates the idea in your mind to another is much more challenging. The translation from mind to words to mind again is lossy.</p>
<p>There’s a fashionable way to write, which suggests using concise language to make a point. This is particularly true in scientific writing. In the more mathematical sciences, I imagine this is the result of dealing with equations that already compress our thoughts. After all, an equation is just a compact representation of an idea. The equations of general relativity are easy to write down, but understanding them takes many hours of learning.</p>
<p>The writing I see in many papers seems to gravitate towards brevity, where everything is said in as dense a form as possible. My favourite example is when scientists use parentheses to fuse two sentences into one. Consider the following sentence (I’ve simply made this up):</p>
<blockquote>
<p>On the right of the figure, we plot the results of extensive simulations using dotted lines. On the left, we plot the experimental results with solid lines.</p>
</blockquote>
<p>Putting the sentence through the “compactifying filter” of scientists, the published result will likely look like this:</p>
<blockquote>
<p>On the right (left) of the figure, we plot the results of the simulation (experiment) using dotted (solid) lines.</p>
</blockquote>
<p>If we’re judging the writing on its brevity, this sentence is great. It packs in the same amount of information in a much smaller space, increasing the information density. Plus, if you’re trying to hit a specific page count, then this is a technique for shortening your paper.</p>
<p>The problem: It’s difficult to parse.</p>
<p>That’s because you’re using parallelism in a medium that is sequential. Because there are effectively two sentences superimposed, I need to read the sentence twice to understand what’s going on. In each pass, I’ll either focus on the parenthetical words or the non-parenthetical words. But because they are part of the same sentence, they clutter up the rest of the words, making it tricky to understand. We aren’t used to “jumping” across a word while reading, but rather read each word sequentially<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. By fusing two sentences into one, you’re giving me a headache.</p>
<p>But the idea of brevity is much more than this one example. In my mind, good scientific writing only requires a moderate amount of brevity. The reason is that brevity will leave things out of the writing, and what’s left out has to be “made up” by the reader. Think of a mathematical textbook or paper that explains that a proof is “trivial”, so it won’t be explained. Or perhaps it’s left as an exercise to the reader. Whenever a piece of information is left out, the reader needs to be capable of grasping it without a heroic effort.</p>
<p>There is a balance between the author giving everything to the reader (leaving no room for interpretation) and having the author simply point in a direction and let the reader fill in the blanks. Good writing will straddle this balance to share what needs to be shared, while also not overloading the reader.</p>
<hr />
<p>As I find myself in the process of writing a scientific paper, I’m thinking more and more about what to leave in and what to leave out. I’ve spent a long time thinking about this project, and the amount of accumulated knowledge I have is substantial. The question is: Does it all need to go into the paper?</p>
<p>I think the answer is “No”. After all, a paper is an artifact of a research question and its answer (or work towards that answer). If I included everything in the paper, it would be very complete and reflect what I did, but it would also incur a cost on the reader. Instead of being greeted with five pages, they might be greeted by twenty. That’s a big difference when deciding if you should invest the time to read a paper.</p>
<p>Brevity plays a role in these considerations. It’s important to say what you need to say concisely. But I would argue that it should <em>always</em> be with the reader in mind. You aren’t trying to lower a word count, nor are you trying to make a paper with the information density of a black hole. Rather, the goal is simple: Communicate an idea from your mind to the minds of others. The transmission will be lossy, so take care in making your point briefly, but not in a way that makes the reader work harder. Give them everything they need, and maybe even a little boost. To use a sports analogy: Make reading feel like cycling down a smooth hill, not up a gnarly one.</p>
<p>Brevity brings clarity, until it doesn’t.</p>
<h2 id="resources">Resources</h2>
<ol>
<li>Stephen B. Heard’s <a href="https://scientistseessquirrel.wordpress.com/2021/07/21/how-long-is-a-manuscript-all-answers-are-wrong/">post on shortening a manuscript</a> is worth reading to see examples of when brevity is <em>not</em> sought after with the reader in mind.</li>
</ol>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>I confess that I’m not up-to-speed on all of the latest research on eye-tracking while reading. I seem to remember something about eyes darting across the page in a manner that’s not sequential, but I’m talking about how we observe the whole sentence. From my own experience, this is usually a left-to-right affair. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Sun, 25 Jul 2021 00:00:00 +0000
https://cotejer.github.io//clarity-of-brevity
https://cotejer.github.io//clarity-of-brevityBuilding Blocks<p>If I ask you what comes to mind when you think of physics, what would you answer?</p>
<!--more-->
<p>Stars? Galaxies? Black holes? Particle colliders?</p>
<p>For better or worse, some fields in physics have taken up more of the limelight than others. The astrophysics-oriented answers are the ones my family would likely default to if asked about physics. And I don’t blame them: astrophysics is a branding master.</p>
<p>In fact, studying the universe was the first field of physics that I thought I would study. It seemed like the perfect fit for me, and I eagerly did multiple research internships over the summer with my supervisor <a href="https://scholar.google.com/citations?user=wqTvQCkAAAAJ&hl=en&oi=ao">Valerio Faraoni</a> (who is great!). I got to study general relativity, think about extensions to the field equations, and dove into a lot of heavy mathematics.</p>
<p>While it was fascinating, I couldn’t help but feel like I was missing something. The physics I was learning didn’t seem tangible to me. It was difficult to grasp it and wrap my head around the problems I was studying. When <a href="https://cotejer.github.io/psion">I went to Perimeter for my Master’s degree</a>, I made sure to try something new.</p>
<p>That something new was condensed matter and statistical physics. As an undergraduate, I wasn’t exposed to these fields (apart from a few thermodynamics courses). I soon learned that condensed matter offers a whole world of physics that I had never encountered before (see the References).</p>
<p>To me, condensed matter is about studying the behaviour of systems of building blocks. These building blocks vary from scenario to scenario, but physicists like to pin down the sort of collective (or emergent) behaviour that occurs when you have a bunch of these building blocks interacting.</p>
<p>(As a side note, statistical physics is often used in conjunction with condensed matter. The main distinction in my view is that statistical physics can be applied to all sorts of building blocks, while condensed matter tends to use particles as its building blocks. Statistical physicists might study the flow of traffic in a congested city, while condensed matter physicists might study the flow of electrons in a material.)</p>
<p>What I love about condensed matter and statistical physics is the way I can get a handle on what’s happening. To study a system, you first establish the building blocks and the rules of the game. Then, you ask questions about the collective behaviour of the system.</p>
<p>While it’s not easy to picture what’s happening when the number of building blocks into the hundreds and thousands, the rules of the game are often simple enough that thinking about a small example in your mind is doable. Even better, you can program an animation to see things in action! This can give you a glimpse of processes that just won’t jump out at you from the equations.</p>
<p>At the end of the day, numerical simulations and mathematical analysis are how we study these systems. However, I can’t express how critical it is to me to be able to picture the system I’m studying in my mind’s eye. Does this help the analysis? Not really. Does it give me another way to pose questions and think about directions that aren’t obvious in the mathematics? Absolutely.</p>
<p>However, it turns out that if my family would start saying I study condensed matter physics, they would <em>still</em> be slightly off.</p>
<h2 id="graphs-and-optimization">Graphs and Optimization</h2>
<p>The building blocks I play with aren’t particles or cars in a city, but <em>graphs</em>. These are objects which consist of nodes and edges that connect them. Think of a map, and place a node on every city while connecting those that have roads that go from one to the other. The resulting object is a graph. A graph is a representation, so it can be used in a bunch of different situations.</p>
<p>The rules of the game involve the edges, and usually a notion of giving values or colours to the nodes. For example, imagine I give you the following graph and ask you to use as few colours as you need to colour the nodes. The only restriction is that no adjacent nodes (they are connected by an edge) can have the same colour.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1624547195/Blog/GraphColouring.png" alt="Example of a graph with N = 11 nodes." /></p>
<p>For the above graph, after some trial and error you might find that it can be done with only two colours.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1624547195/Blog/GraphColouringFilled.png" alt="Solution for the previous graph colouring problem." /></p>
<p>Like I mentioned before, the rules of the game are pretty simple. You have a graph, and you need to colour the nodes. Easy.</p>
<p>But if I give you the following graph, things get trickier.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1624547195/Blog/Petersen.png" alt="The Petersen graph, with N = 10 nodes." /></p>
<p>The rules are the same, but the connectivity of the graph makes things more difficult (for those that are curious, this is called the “Petersen graph”, and <a href="https://en.wikipedia.org/wiki/Petersen_graph#/media/File:Petersen_graph_3-coloring.svg">can be coloured using only three colours</a>). Even though the number of nodes is actually <em>less</em> than the previous example, it might be trickier to find a solution. What we need is a more systematic way to tackle this problem. One that can preferably be programmed on a computer.</p>
<p>This problem (and many others like it) are called combinatorial optimization problems. It’s “combinatorial” because of how the configuration space (in our case, ways to colour the nodes) grows as a function of the system size. If you have ten shirts, three pairs of pants, and five pairs of shoes, the total number of outfits you can rock is: 10 × 3 × 5 = 150. It’s this kind of growth which brings about the name “combinatorial”.</p>
<p>When I write “optimization”, I’m referring to the fact that we want to solve our problem subject to some constraints. In the graph colouring problem I gave above, a way to solve <em>any</em> such problem is to use N colours, where N is the number of nodes. That way, each one will be distinct, and so we’re done.</p>
<p>We want to do better than that. For the solution of the first problem I gave, while you can definitely colour it with N = 11 colours, two is enough. In a sense, using two colours is the more “economical” way of doing things.</p>
<p>Optimization tends to mean maximizing or minimizing some quantity. In the graph colouring case, our goal is to colour the graph with our rules, while also using as few colours as possible. That additional constraint is what turns this into an optimization problem, and often it’s the thing that transforms a problem from being straightforward to being fiendishly difficult<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<h2 id="encodings">Encodings</h2>
<p>While these problems may be juicy to think about, what do they have to do with physics? These just look like problems a computer scientist might grapple with!</p>
<p>The key is in how we encode them. In combinatorial optimization problems, we usually have constraints that need to be enforced. For the graph colouring problem, we don’t want nodes that belong to the same edge to have the same colour. For a problem like <a href="https://cotejer.github.io/shattering-of-sat">satisfiability</a>, the constraints have to do with possible values a subset of nodes can take. You can come up with your own set of constraint as well, but at the end of the day you will end up with a list of constraints: <code class="language-plaintext highlighter-rouge">[C1, C2,…, CM]</code>. The goal is to satisfy all of them.</p>
<p>As physicists, our instinct it to take those constraints and write them as a Hamiltonian, H. This is an object which tells us the energy of a given configuration of the system. If we have a bunch of constraints, then our Hamiltonian will look something like:
\(H = \sum_{i=1}^M C_i,\)
where each $C_i$ is a function of the configuration of nodes, which we can write as <code class="language-plaintext highlighter-rouge">[x1, x2,…, xN]</code>. The value of $C_i$ can be zero or one. If the constraint is satisfied, it contributes nothing to the total energy (it is zero). If the constraint is <em>not</em> satisfied, then it gives a penalty of one unit of energy. (You could also change the constraints so that they are weighted, or have full of fancy modifications, but this illustrates the point).</p>
<p>For this essay, I’m thinking of the list <code class="language-plaintext highlighter-rouge">[x1, x2,…, xN]</code> as being Boolean: each element is zero or one. This means every node can have two states. Physicists like this sort of setup because it reminds us of a condensed matter system we know well: the spin-1/2 particle system.</p>
<p>This system is essentially a system of particles which can be in one of two states: up or down. Because this system has two states and a lot of combinatorial problems deal with variables which have two possible states, there’s a nice mapping from the computer science language of Boolean variables to the spin language of condensed matter physics.</p>
<p>Physicists have spent a long time working on these sorts of models. For example, perhaps the most famous one of all is the <a href="https://en.wikipedia.org/wiki/Ising_model">Ising model</a>, which can be written as a graph. We’ve spent a long time building tools to solve these problems, which is why mapping the computer science problems into our language is something that was probably inevitable.</p>
<hr />
<p>So what is it that I do?</p>
<p>Well, I work at the intersection of statistical physics, condensed matter, and computer science (with a pinch of quantum theory thrown in for good measure!). I think about problems on graphs, ways to describe the collective behaviour of the building blocks I play with, and try to apply the tools I’ve learned from statistical physics to help illuminate these optimization problems.</p>
<p>I’ll admit that this isn’t as easy to point to as the stars, but I find the collective dance of a bunch of small building blocks obeying simple rules to be mesmerizing. They often surprise us, and in science, that’s the kind of thing you’re always looking for.</p>
<p>I’m still trying to get my family to understand what I do, but I suspect I may be at this stage for a while.</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/c_scale,q_auto:best/v1535842782/Handwaving/Published/TheTalk.png" alt="A comic about trying to explain what I do to my parents." /></p>
<h2 id="references">References</h2>
<ol>
<li>For a lovely taste of condensed matter, read John Baez’s <a href="https://nautil.us/issue/97/wonder/the-joy-of-condensed-matter">“The Joy of Condensed Matter”</a>. If you want to know more about his work (which includes climate change, mathematical physics, networks, category theory, and a lot more), check out his blog, <a href="https://johncarlosbaez.wordpress.com/">Azimuth</a>.</li>
<li>Another great blog about condensed matter is Ross H. McKenzie’s <a href="https://condensedconcepts.blogspot.com/">Condensed Concepts</a>. He often writes about condensed matter rather than statistical physics, and I’ve learned a lot from him.</li>
</ol>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Talking about “difficulty” quickly leads us into the rabbit hole of complexity theory. While I’ll save that for another essay, rest assured that scientists and mathematicians have thought a <em>lot</em> about what it means for a problem to be difficult or easy. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Fri, 25 Jun 2021 00:00:00 +0000
https://cotejer.github.io//building-blocks
https://cotejer.github.io//building-blocksCode as Knowledge Distilled<p>When I was an undergraduate, I did my best to stay away from a computer to do physics. I saw myself as a theorist, and in my head, a good theorist could get everything they wanted done with pencil and paper. I remember one particularly vivid time that I used a computer algebra system to calculate Riemann tensors for my research in gravitation theory, and I hated dealing with the complexity of the program. It wasn’t easy to work with, the answers were often garbled and not simplified, and it almost felt like I was wasting more time using a computer.</p>
<p>During my time at <a href="https://cotejer.github.io/psion">Perimeter Institute</a>, I got more involved with code and programming physics simulations. I did my project on machine learning with physics, I worked on numerical simulations for a <a href="https://www.mustythoughts.com/variational-quantum-eigensolver-explained">variational quantum eigensolver</a>, and I spent more time in front of a text editor.</p>
<p>Slowly, my day-to-day life as a physicist has been evolving.</p>
<p>I used to spend my days with paper and pencil, working through long equations. I still do that from time to time, but my primary playground has leapt from paper to the text editor. Code has been my primary tool during these first nine months of my PhD. I never would have predicted this at the end of my undergraduate degree, and yet here I am, programming each day, creating my own experiments on my computer to test ideas.</p>
<p>During the many days at the text editor, conceiving of and launching experiments, I’ve learned many lessons about what it means to program in science. There are two main benefits you gain from programming in your research. The first is that you learn how to be <em>explicit</em>, and the second is that you gain a deep understanding of all the parts that go on “under the hood”.</p>
<h2 id="science-made-explicit">Science Made Explicit</h2>
<p><strong>The number one reward for learning how to work with code in your scientific work is that it forces you to be explicit about what you’re doing.</strong> There’s no hiding when you’re programming. Either you understand what you’re doing and the code runs as you expect, or something is still opaque and your code breaks<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>Imagine you want to run a simulation where you generate random samples and then have them evolve. Before you even get to programming the dynamics, you’re faced with a question: What does “random” mean”? It’s a word we say often, but a simulation requires more precision. Do you mean uniformly random (think of a fair die)? Do you mean a Gaussian distribution? Poisson? If there are repeated draws for your simulation, do they occur with or without replacement?</p>
<p>Quickly, the word “random” ends up having a lot of baggage.</p>
<p>This is the nature of programming. When writing code, you cannot afford to be sloppy or vague with your words. Or you can, but then your code often won’t run properly (and <em>knowing</em> that something went wrong is an art in and of itself!).</p>
<p>Programming requires spelling out your assumptions in great detail. This is annoying at first, particularly when you want to test an idea and not necessarily worry about all the details. But this discipline pays off: Once you have your code tested and running, you can be confident in the results.</p>
<p>As scientists, we sometimes get lazy with our hidden assumptions and implicit knowledge. This isn’t good in research, because it means you can miss crucial factors. While programming won’t tell you the right way to go about studying your problem, it <em>will</em> make explicit all of your assumptions. This provides a great opportunity to reflect on the choices you’ve made for the model under study.</p>
<h2 id="you-understand-what-you-can-program">You Understand What You Can Program</h2>
<p>Perhaps the greatest aspect of programming that I’ve found is that it helps build a concrete understanding for the scientific concepts you’re studying.</p>
<p>Reading papers can give you a general sense of the main idea, but science (and mathematics) is an active sport: <strong>You need to engage with the ideas if you want to gain anything more than a surface-level understanding.</strong></p>
<p>This could mean doing exercises from a textbook instead of directly reading the answers. It could also mean going through a research paper and writing down the steps for <em>every</em> equation, making sure that each step is comfortable for you. These techniques force you to jump from passive to active learning, giving you a chance to reason instead of simply following along.</p>
<p>Programming works the same way. I’ve read countless papers now that describe in vague terms (read: English) that a certain simulation was run, with a few equations thrown in to explain the main idea. Maybe that’s fine for a paper, but there’s a world of difference between that and implementing the simulation yourself in code. It’s only when writing code that you find the tricks, techniques, and hidden assumptions that go into calculating Equation 5 of the paper which looked so innocent. Programming makes you go through all the details so that you really understand what you’re doing.</p>
<p>The times I’ve been most confident in my results are when I’ve written the code myself, testing and being aware of each little function that goes into the final result. Doing this also gives me greater leverage in tweaking things for my custom needs, which would be more difficult using a pre-built library<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. When I’ve written my own code, the confidence I have in the results skyrockets.</p>
<p>Writing code to capture a scientific equation feels to me like a clear milestone that tells me I’ve “understood” it. If I can write a simulation or do a calculation in code that does exactly what the equation says, it’s a good sign that I’ve internalized the lessons of that equation. It’s similar to doing textbook exercises or going through proofs, just with code.</p>
<p>As I quoted one of the students in my research group last essay, “you understand a topic when you know how to code it”.</p>
<hr />
<p>Programming won’t turn you into an amazing scientist. Rather, I think code can be used as a tool to level-up your knowledge, even if your main research doesn’t lend itself easily to simulations. The act of writing code forces you to think about all of the assumptions you have baked into your model, which is always good to keep in mind. Plus, writing code helps you internalize an equation better than simply staring at it in a paper a hundred times. Programming makes you <em>play</em> with that equation, checking its assumptions and drawbacks.</p>
<p>However, I will acknowledge that programming can be an intimidating hill to tackle for scientists. This is why I avoided programming for so long. Setting up an environment for your ideas can be long, and the data structures needed for creating a simulation can bring you out of the creative process<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. This is why we need more resources to bring students up to speed. I’m not a fan of telling <em>every</em> science student to drop what they’re doing and learn how to code, but I do believe having the ability to model phenomena with code can be a great way to really understand a topic.</p>
<p>When you write code, you think about the assumptions that go into the code. You wrestle with unexpected hurdles that are hidden behind equations. You also deal with the fact that the world isn’t continuous, and that we can only access it in discrete chunks. These are all valuable lessons for a scientist. Remember, equations are only one lens in which to view the world. Code provides another, and it forces you to clarify your thinking.</p>
<p>Writing code is an act of knowledge distillation.</p>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>I am ignoring the maddening scenario when your code runs, doesn’t output any error, and yet may be hiding some errors that take a long time to get noticed. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>I’ve gone this route before. I think using existing libraries is great, as long as you know they won’t limit your end goal. For example, when I’m writing code I won’t hesitate to use <a href="https://numpy.org/">NumPy</a>, since it is powerful and more than enough for my needs. But if I’m trying to do a specific quantum simulation, I might not necessarily jump to the most popular tools available, simply because they don’t accomplish what I need them to. Of course, it’s a balance you have to strike in every project you start. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>For a lovely view of this, see Molly Mielke’s essay, <a href="https://www.mollymielke.com/cc">“Computers and Creativity”</a>. It’s a beautiful piece, and captures a feeling I’ve had with all sorts of computer tools for creativity. They often have large hurdles to using them for creative work. Yes, programming can unleash essentially unlimited creative potential, but you need to know how to set these things up. That initial time hinders the creative process. If I want to simulate water molecules, I don’t necessarily want to spend hours and hours thinking about the best data structure for the simulation. Yes, the data structure is vitally important, but I think it’s difficult to deny that it blocks the direct path towards creative work. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Tue, 25 May 2021 00:00:00 +0000
https://cotejer.github.io//code-as-knowledge-distilled
https://cotejer.github.io//code-as-knowledge-distilledEvolving Qubits With Bits<p>As a quantum theorist, my job is to study quantum systems and understand their inner workings. However, since I’m a theoretical physicist and not a experimental physicist, most of my “experiments” come in the form of simulations. My laboratory is my computer, and this means writing numerical experiments.</p>
<p>But wait a second, you tell me. Isn’t the whole point of quantum computers to do things that our regular computers can’t? And aren’t there issues with exponential memory?</p>
<p>These are both very good questions, and we’ll dive into them below. But in short: Yes, these are issues that limit the experiments I can do. And it’s a reason I’d like to get my hands on a good, error-correcting quantum computer!</p>
<!--more-->
<p>In the absence of one<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, I make do with classical simulations. However, depending on the job you want to do, different techniques are preferable. Since I’ve begun my PhD, I’ve learned how to simulate quantum systems using a variety of techniques. Each has their own quirks, and for someone as code-averse as I was, I’m happy to say that I’ve slowly found my groove.</p>
<p>The thing I’ve learned through all of these methods though is that simulating a system (any kind of system) is often <em>very</em> different from the actual system itself. When I write a simulation, what I’m trying to do is map the system I want to study to objects in my code which I can handle more easily. To put it concretely: While I imagine my “simulation” of qubits to be some strange quantum system, the reality is that it’s a bunch of lists and arrays being transformed using the tools of linear algebra.</p>
<p>At first, this really bugged me. After all, I want to simulate my quantum system, but what I’m really spending my time on is converting it into a language my computer can understand. These layers of abstraction between the system that I want to simulate and its <em>representation</em> in a computer is something I’ve learned to deal with. In a way, writing a good simulation is an art form in and of itself, since you have to figure out an equivalent way to represent your system as code.</p>
<p>As you will see, there are many ways to simulate quantum systems now, and each one has its own representation. Therefore, if you want maximal control over your simulation, it’s a good idea to learn the techniques required. There <em>is</em> a point in which you should just say, “Okay, I don’t need to know how <em>this</em> underlying thing works,” but I think this point is further than I used to believe.</p>
<p>Finally, working with numerical simulations has taught me something very important about mathematics: It’s one thing to see an equation in a paper. It’s a whole other thing to implement it in code and see it give you results. Often, an equation hides a lot of numerical baggage which needs to be implemented before you get a result. As my friend said in a presentation, “You don’t understand X unless you’ve coded it yourself.” This is something I’ve heard from others as well<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, and I think there’s a lot of truth to that statement. Which is why I try not to shy away from learning new coding practices when it comes to my research.</p>
<p>This essay will be broken up into three categories, which roughly describe the kinds of methods for simulating quantum systems that I’ve worked on. I’ll mention the specific libraries and tools I’ve used, but the point here is to discuss the broad categories, since those are more important than the specific tool.</p>
<h2 id="state-vector-simulators">State Vector Simulators</h2>
<p>This is probably the “closest” simulator to the kind you would imagine in reality<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. What I mean by this is that you start with your quantum state $\vert\psi\rangle$ and it evolves according to various quantum gates that apply on your system.</p>
<p>Schematically, the evolution looks like this:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1619274773/Blog/QuantumCircuit.png" alt="An example quantum circuit" /></p>
<p>In other words, you start with an initial quantum state, and each gate you apply is a unitary matrix which multiplies your initial state. So if I apply some quantum gate $U$ on my system, the resulting state is $U\vert\psi\rangle$.</p>
<p>One thing you might notice is that a circuit which is drawn left-to-right (in the sense that the initial state is on the left) is written in the reverse way when seen as matrix multiplication.</p>
<p>For me, the state vector picture is the easiest way to work with a quantum experiment. I just have to define my gates (the usual suspects like the Pauli operators and the CNOT gates are already defined), and then specify how they apply to a specific qubit. The libraries which do this sort of simulation then take care of actually building the right matrix $U$ to apply to the system.</p>
<p>The main library I’ve used here is <a href="https://qiskit.org/">Qiskit</a>, because of the above-mentioned partnership we have with IBM.</p>
<p>A <em>very</em> important point I want to make here is how this underlying matrix $U$ is built. Let’s say we have three qubits, and we want to apply an <em>X</em> gate on the second qubit. As an operator, it would look like this:</p>
<p>\(U = 1 \otimes X \otimes 1.\)
This is fine as a notation, but as a matrix, it becomes 8×8, which is starting to get big. In fact, if you have N qubits, then each individual qubit matrix is 2×2, so in total the size of the resulting matrix $U$ is 2<sup>N</sup>×2<sup>N</sup>. As you can probably imagine, this doesn’t scale too well for large system sizes.</p>
<p>At the end of the day, the reason this isn’t sustainable is because the vector needed to describe quantum state has 2<sup>N</sup> components for N qubits. That’s fine for small systems, but if we want to start making claims in the “thermodynamic limit” of some quantum system when N→∞, we quickly get stuck.</p>
<p>So the state vector picture is a nice starting point because there’s a very direct connection to what’s going on. The elements you see in the circuit are matrices, and they keep on multiplying the initial quantum state until you get to the end. There are a few more subtleties when it comes to measurements, but that’s the gist.</p>
<p>The drawback is that you can’t scale up very high in the circuit picture. To give you a rough estimate, in Qiskit’s online version of their statevector simulator (this directly deals with the quantum state) and the QASM simulator (more for dealing with measurement results), their upper limit is 32 qubits. I think you can potentially go higher on your own hardware if you have enough RAM, but when I say “higher” I mean something like two extra qubits. Again, this has to do with the exponential cost of storing a quantum state. Each new qubit requires <em>double</em> the information to store. This is the crux of the bottleneck, since doing linear algebra with a big object like that is not easy.</p>
<p>But what if you didn’t need all those components? What if some of them you knew were always zero, or some were fixed by a property of your system? Then, is it possible to shrink the amount of space needed to simulate a circuit?</p>
<p>The answer is <em>yes</em>, and this brings us to tensor networks.</p>
<h2 id="tensor-network-simulators">Tensor Network Simulators</h2>
<p>The term “tensor network” feels like a buzzword to me in a similar manner that “machine learning” does. To me, these two words just sound cool, and make it seem like a lot of neat research can be done with them.</p>
<p>In fact, saying this section is about tensor networks is like saying this essay is about quantum theory. <em>Sure</em>, but that’s pretty vague. There are a bunch of different areas of tensor network research, but the one I’ll be focusing on here deals with a representation of quantum states called “matrix product states” (MPS).</p>
<p>People are really good at naming things, so it turns out that the main object of an MPS is <em>not</em> a matrix, but a tensor<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. I plan on getting into tensor networks much more in future essays (because this is related to a lot of my research), but here are the essential bits.</p>
<p>When you have your quantum state $\vert \psi\rangle$, you have to keep track of all 2<sup>N</sup> components. In fact, if we want to write things out fully, we can write down our quantum state (over three qubits, for example) as:
\(\vert \psi\rangle = \sum_{i,j,k} c_{ijk} \vert ijk\rangle.\)
The coefficients $c_{ijk}$ are precisely all of the different entries of the state, and the vector $\vert ijk\rangle$ is the basis state, which tells us <em>where</em> to put the coefficient in our state $\vert \psi\rangle$. As a diagram, we can think of it like this:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1619274773/Blog/BasisState.png" alt="The basis states." /></p>
<p>The MPS form of a quantum state is a way to split the coefficients $c_{ijk}$ such that they are each defined on a <em>single</em> qubit. This doesn’t come for free though, so what we end up doing is creating tensors that can be chained together to reproduce the coefficient $c_{ijk}$ if we multiply everything out. I won’t show you all of the indices in this essay because it would scare you off (a future essay should cover it), but as a diagram, here’s the idea:</p>
<p><img src="https://res.cloudinary.com/dh3hm8pb7/image/upload/q_auto:best/v1619274773/Blog/MPS.png" alt="A diagram for a matrix product state." /></p>
<p>With this, we can potentially save space on the number of elements we need to store (see Reference 1 for more). In my case, using the MPS form of a quantum state allows me to study the entanglement entropy of a quantum state while never having the full 2<sup>N</sup> object in memory. This allows for more qubits to be simulated. For example, Qiskit has a MPS backend, and you can go up to 100 qubits with it.</p>
<p>The downside is that you can’t do anything that builds the whole state vector. For example, you can’t check what the full quantum state is, only what it is in its “decomposed” form. You can think of it like this: We’ve broken up a huge quantum state into a bunch of small packages which we can handle. What we <em>can’t</em> handle is putting them all back together.</p>
<p>Applying gates to an MPS is conceptually straightforward but I found it was difficult to get it working in code. That’s because you need to take care of so many indices. In a sense, it’s “just” multi-dimensional linear algebra, but that “just” is doing a lot of work.</p>
<p>What’s nice though about applying gates is that you can apply them without the operator becoming too “big”. They only apply to a single site (or perhaps a few sites) at a time, which reduces their size.</p>
<p>There are costs to this method though. The main one is that you need to do a ton of singular value decompositions, and this becomes expensive as you have more and more qubits arrange in a line. There are alternative ways to deal with this, but I’m not as familiar with them.</p>
<h2 id="clifford-circuit-simulators">Clifford Circuit Simulators</h2>
<p>The final method I’ve learned how to do involves very particular gates called “Clifford” gates. Quantum circuits that are built up of only these gates, which are the CNOT, SWAP, Pauli X, Y, Z, the phase gate P, and the Hadamard gate H, are special. That’s because we can <em>drastically</em> reduce the number of objects we need to store in memory when simulating these circuits.</p>
<p>Usually, we want to keep track of 2<sup>N</sup> components (or perhaps somewhat less if we’re using tensor networks), which grows quickly. But in the case of Clifford circuits, the number of objects we need to track only grows linearly with N.</p>
<p>How can this be?</p>
<p>It has to do with the fact that we can represent quantum states in Clifford circuits using what are called “stabilizer generators”. If you’re a long-time reader, you may remember I covered stabilizers in <a href="https://cotejer.github.io/game-of-loops">“A Game of Loops”</a>, where the stabilizers were the plaquette operators acting on my surface code. Using this, I was able to think of my quantum state’s evolution mainly through these operators (though in that case I also looked at the full 2<sup>N</sup> objects).</p>
<p>I won’t go through all the technical details of the stabilizer formalism here, but the gist is that we can represent quantum states using a set of operators which “stabilize” the system, and do evolution by evolving these objects. See Reference 2 for more on this formalism.</p>
<p>This means that if we have a stabilizer G acting on the quantum state, then G essentially acts as the identity. It’s <em>not</em> the identity, but the combined action of the operator on the quantum state looks like an identity operation.</p>
<p>To give you a small example, imagine we start the quantum state in the computational basis state $\vert \psi\rangle = \vert 00110\rangle$. It turns out that there are precisely five stabilizer generators which stabilize the state (apart from the identity). They are:
\(Z\otimes1\otimes1\otimes1\otimes1, \\
1\otimes Z\otimes1\otimes1\otimes1, \\
1\otimes1\otimes -Z\otimes1\otimes1, \\
1\otimes1\otimes1\otimes -Z\otimes1, \\
1\otimes1\otimes1\otimes1\otimes Z.\)
You can check that applying these operators to the quantum state results in no net change. And using some principles of group theory, it turns out that these can generate any sort of other stabilizer we might dream up, such as $1\otimes1\otimes Z\otimes Z\otimes1$, which is a combination of the third and fourth generators above.</p>
<p>So this is great for representing a quantum state, but the real magic is that this representation gives us a way to simulate the <em>evolution</em> of a quantum state.</p>
<p>That’s because a stabilizer state will remain a stabilizer state as you evolve it in a Clifford circuit. The only difference is that the generators will change. To evolve a quantum state then, all we have to do is track how these generators change (and remember, there are only N of them).</p>
<p>For example, suppose we apply a Hadamard gate on the first qubit. This gives us the transformation $H\vert 0\rangle = \vert +\rangle$, so our new quantum state is $\vert +0110\rangle$. We then change the stabilizer generators to account for this, and we end up finding:
\(X\otimes1\otimes1\otimes1\otimes1, \\
1\otimes Z\otimes1\otimes1\otimes1, \\
1\otimes1\otimes -Z\otimes1\otimes1, \\
1\otimes1\otimes1\otimes -Z\otimes1, \\
1\otimes1\otimes1\otimes1\otimes Z.\)
The rules for changing the generators can be found in Reference 2. I won’t go through all of them here, but there are tools that track these changes automatically, and all you have to do is supply the quantum circuit. The idea here is that these generators can be represented by a binary matrix of size N×2N, where N is the number of qubits. There are 2N columns because we will essentially keep track of the X and Z operators separately. These are all we need to specify our state, and so each one will get its own N×N matrix. Within this binary matrix, a 1 tells us that there’s a Pauli X or Z at the given site. Then, simulating the circuit requires updating this matrix.</p>
<p>Personally, I haven’t created my own simulator. The one I’ve used is called <a href="https://github.com/quantumlib/Stim">Stim</a>, by <a href="https://algassert.com/">Craig Gidney</a>. It’s fast, it works for my purposes, and I don’t have to reinvent the wheel. There’s also a Clifford circuit simulator in Qiskit, where the online version says it can simulate up to 5000 qubits. I’ve used Stim to simulate circuits with about 3000 qubits, so I can attest that this is achievable (though I was using a supercomputer). Now that I think of it though, the longest part of my computation was probably what I did <em>after</em> simulating the circuit and getting this matrix. The linear algebra on such a matrix becomes quite long.</p>
<p>Using this formalism is fantastic if you want to simulate really big system sizes. Unfortunately, it also depends on what you want to study. For example, it’s possible to use these Clifford circuits to get the entanglement entropy<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>, but it’s not possible (as far as I know) to get the entanglement spectrum (the eigenvalues of the reduced density matrix for a state) using this method. So depending on what you want to study, this might not be helpful.</p>
<h2 id="quantum-toolbox">Quantum Toolbox</h2>
<p>During this first year of my PhD, I’ve been in a sort of transition mode. I’ve had to learn the ropes in both this field of quantum system, as well as how to work with code and numerical tools. The learning curve has been high at times, but I’ve found an appreciation for building my own tools that are perfect for my needs.</p>
<p>I came across a <a href="https://4gravitons.com/2021/04/23/building-ones-technology/">blog post</a> on exactly this by Matthew von Hippel. As a lazy person, I love being able to just grab some code off the shelf that works for my needs. But as a scientist, I know that it’s probably a good idea to build up a set of tools that I’m confident will work. This isn’t easy, and it often involves a lot of “wasted” time where research isn’t moving forward as quickly as I want. But once I have the tools built, I can then tweak things to my heart’s content.</p>
<p>In my case, this has been a year of learning techniques on how to simulate quantum circuits. Learning the ropes of the three methods above has been a big part of my research, and I’ve reached the point where I have a decent enough handle on these aspects that I can use them to start answering questions.</p>
<p>I hope to write more about how these tools are used in my research in future essays. For now though, I’ve built my toolbox, and now I can start digging into the research.</p>
<h2 id="references">References</h2>
<ol>
<li>I really like the <a href="https://tensornetwork.org/">this site</a>, which explains the various ways to use tensor networks to simulate quantum systems. In particular, the <a href="https://tensornetwork.org/mps/#toc_2">MPS article</a>, is well-done, and the link I’ve provided here will give you an idea for how many parameters you need to describe your MPS.</li>
<li>The Clifford stabilizer formalism for simulating quantum circuits was developed (to my knowledge) by <a href="https://www.scottaaronson.com/blog/">Scott Aaronson</a> and <a href="https://www2.perimeterinstitute.ca/personal/dgottesman/">Daniel Gottesman</a> in <a href="https://www.scottaaronson.com/papers/chp6.pdf">this paper</a>. There are others as well around the same time using slightly different approaches, but this is the one I’ve based my code on.</li>
</ol>
<h2 id="endnotes">Endnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>The university which I’m doing my PhD at has a partnership with IBM, so I can actually use their quantum devices, including their largest available quantum computer: the <code class="language-plaintext highlighter-rouge">ibmq_manhattan</code>, which has 65 qubits. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>I’m thinking of Philip Moriarty’s <a href="https://www.numberphile.com/videos/philip-moriarty">podcast episode</a> with Brady Haran on Numberphile. Philip Moriarty also has a <a href="https://muircheartblog.wpcomstaging.com/">great blog</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Though, I imagine experimental quantum physicist’s would disagree with me and think that even this is oversimplifying things. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>This reminds me of the Standard Model class I took during my master’s. When people write down the Standard Model Lagrangian (think: the expression governing how a system evolves), there are a bunch of indices present. However, what you later learn is that there are <em>hidden</em> indices which are implicit in the notation, because adding those would make the expression even messier. I suspect a similar thing is going on here, where an index is omitted from consideration because it’s not relevant to the discussion, and so the tensor is demoted to a matrix. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>I spent a long time trying to figure out how to do this. To save every curious person in the future time, I asked and answered <a href="https://quantumcomputing.stackexchange.com/q/16718/">my own question on the Quantum Computing Stack Exchange</a>. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Sun, 25 Apr 2021 00:00:00 +0000
https://cotejer.github.io//evolving-qubits-with-bits
https://cotejer.github.io//evolving-qubits-with-bits