[{"content":"It is sometimes claimed that\nsufficiently advanced AI will almost certainly (\u0026ldquo;inevitably\u0026rdquo;) kill everyone (\u0026ldquo;doom\u0026rdquo;), and therefore the only rational response is to ban AI completely for a prolonged period of time or forever. I think that this is wrong because the first premise is wrong. In my opinion, AI is roughly a normal science like physics or biology, and is dangerous in the same way those fields are dangerous but perhaps more so. This means that the conclusion, \u0026ldquo;the only rational response is to ban AI completely for a prolonged period or forever\u0026rdquo;, is also wrong.\nAI risk should be mitigated in a similar way to how we mitigate risks from other sciences and engineering projects. There will be some differences due to how the field actually works, in that there is nothing really equivalent to uranium or smallpox samples and so physical controls are less effective or at least very different. There may also be a difference in magnitude. My argument is only that AI is not different in kind from any other science which carries substantial risk.\nIf AI is a roughly normal science, pushing for a complete ban or moratorium on AI is likely to be be counterproductive. If nothing else, such advocacy adds noise into the environment and can make it more difficult to stage other interventions that might be better, like interpretability research, safety evaluations, and release criteria.\nOther people have been arguing about this longer than I have and it\u0026rsquo;s a broad topic covering both AI itself and the broad, societal issue of managing AI. In this case I think I can more productively engage with the subject as a whole by providing, basically, a literature review of who has written what that I think is correct. This was originally a thread by deepfates, there was some desire to extend it, and it seemed like this canon perhaps needed a permanent home with the rationale for its existence right up top.\nOrganization is entirely my preference.\nOn Those Undefeatable Arguments for AI Doom by 1a3orn # You can find this essay here: https://1a3orn.com/sub/essays-ai-doom-invincible.html.\nPeople seem to believe in inevitable AI doom because it\u0026rsquo;s a compelling meme more than because they believe in any particular argument. I would like to add that as the actual landscape has changed team doom has, seemingly, not changed any of its opinions.\nThis post makes a good case that just having a lot of arguments is no merit. I have therefore endeavored to include here only things which I think do not overlap much (if at all), and each of which if proven wrong I think would considerably strengthen the argument for inevitable doom.\nBeren\u0026rsquo;s Entire Blog # It turns out Beren Millidge has essentially written a major work on AI alignment scattered across the last few years of blog posts. I had read maybe half of them, figured they were probably true, and mostly not thought about them after that. It was only obvious to me while compiling threads of links that this rose to the level of being a self-contained work that covered the subject pretty well.\nWe can divide this nicely into sections, and pull quote what seems (to me) to most directly address ordinary \u0026ldquo;inevitable doom\u0026rdquo; beliefs and their consequences.\nFundamentals # \u0026ldquo;One of the big updates I have made in the past six months is strongly towards the belief that solving alignment for current LLM-like agents is not only possible, but is actually fairly straightforward and has a good chance of being solved by standard research progress over the next ten years.\u0026rdquo;\nMy path to prosaic alignment and open questions\nNamely, alignment methods to ensure stability during online learning or RSI will require constant dynamic and adaptive adjustments rather than simply an extremely good static alignment initialization (although a good initialization will of course be very helpful). Additionally, the existing field of control theory handles exactly these kinds of problems and has constructed a large set of theoretical tools around the design and verification of controllers that I believe likely have important insights for alignment.\nMaintaining Alignment during RSI as a Feedback Control Problem\nHowever, the bigger problem with the biosingularity is that it does not address the alignment problem also posed by the AI singularity, and arguably makes it worse.\nThe Biosingularity Alignment Problem Seems Harder than AI Alignment\nIn general, it makes sense that, in some sense, specifying our values and a model to judge latent states is simpler than the ability to optimize the world. Values are relatively computationally simple and are learnt as part of a general unsupervised world model where there is ample data to learn them from (humans love to discuss values!). Values thus fall out mostly ’for free’ from general unsupervised learning.\nAlignment likely generalizes further than capabilities\nMechanics # While not at all trivial, the coming era of synthetic data promises to give us many more levers for deep alignment of our models as well as methods for detecting deception and misalignment early prior to real deployment.\nAlignment In The Age Of Synthetic Data\nGiven reasonable interpretability and control tooling, this line of thought could lead to methods to try to make an AGI more naturally empathic towards humans. This could include carefully designing the architecture or training data of the reward model to lead it to naturally generalize towards human experiences.\nEmpathy as a natural consequence of learnt reward models\nI think this view is wrong and that the alignment mechanism and the alignment target do not always cleanly decouple. This means we can leverage information about the alignment target to develop better or easier alignment methods 1. If this is the case, we might benefit from better understanding what human values actually are, so we can use information about them to design alignment strategies. However, naively, this is hard. Human values appears to be an almost intentionally nebulous and unspecific term. What are human values? What is their type signature (is this even a meaningful question?). How do they come about?\nThe computational anatomy of human values\nPolicy # Restricting or banning open source AI will severely hamper the ability of this population to do meaningful alignment work and hence significantly slow progress in alignment.\nOpen source AI has been vital for alignment\nI think in general, the current focus should be on preventing the emergence of strong and autonomous agents that can self replicate, the development of robust auditing frameworks for frontier models, and dealing with misuse harms as they crop up without making any strongly decisive moves. I broadly do not think that existing generative models pose any significant existential threat since they currently appear to lack any kind of coherent agency or tendency to behave consistently adversarially to humans.\nMy Preliminary Thoughts on AI Safety Regulation\nSpecifically, I think it should only be acceptable to claim something is infohazardous when you have strong empirical evidence that 1.) it substantially advances capabilities (i.e. more than the median NeurIPS paper), 2.) It empirically works on actual ML systems at scale, 3.) it is already not reasonably known within the ML community, and 4.) when there is no reason to expect differential impact on safety vs capabilities i.e. when the idea has no safety implications and is pure capabilities.\nStrong infohazard norms lead to predictable failure modes\nBostrom # However, sound policy analysis must weigh potential benefits alongside the risks of any emerging technology. Yudkowsky and Soares maintain that if anyone builds AGI, everyone dies. One could equally maintain that if nobody builds it, everyone dies. In fact, most people are already dead. The rest of us are on course to follow within a few short decades. For many individuals—such as the elderly and the gravely ill—the end is much closer. Part of the promise of superintelligence is that it might fundamentally change this condition.\nOptimal Timing for Superintelligence\nNick Bostrom is sort of the grandfather of AI Doom as a concept and he seems to want to put the genie at least part-way back in the bottle.\nAI Optimism # This blog lives at optimists.ai, and contains detailed arguments concerning optimizers and evolution. Some highlights:\nIn what follows, we will argue that AI, even superhuman AI, will remain much more controllable than humans for the foreseeable future. Since each generation of controllable AIs can help control the next generation, it looks like this process can continue indefinitely, even to very high levels of capability.\nAI is easy to control\nIn this essay, we debunk the counting argument— a central reason to think AIs might become schemers, according to a recent report by AI safety researcher Joe Carlsmith.1 It’s premised on the idea that schemers can have “a wide variety of goals,” while the motivations of a non-schemer must be benign by definition. Since there are “more” possible schemers than non-schemers, the argument goes, we should expect training to produce schemers most of the time.\nCounting arguments provide no evidence for AI doom\nAdrian Leicht on Policy # Is an AI Pause a good idea, even assuming a relatively high level of risk?\nI believe these advocates are mistaken about the politics even if we grant their view of the risks: pauses and moratoria likely sabotage our progress on a narrow path toward beneficial and safe advanced artificial intelligence. And in the likely event of their political failure, they’ll leave behind a much worse environment of AI politics.\nPress Play To Continue: ‘Pausing AI’ is bad policy and worse politics\nMe # I am actually not sure I would include these if I personally had not written them, because they are a little bit redundant with Beren and AI Optimism. I do, however, take a wider, more historical and less technical perspective.\nCan we convey our intent, both what our words mean and what our actual preferences are, to a computer? Ten years ago the answer was no. Currently, in 2026, the answer is yes. This should be recognized as a paradigm shift in the field, an area where we have gone from zero to one.\nAlignment Is Proven To Be Solvable\nThe creationist argument is that you can never find a protein that works, because there are too many proteins that do not work. This argument is that you can never find an AI that does not kill everyone, because there are too many AI that do kill everyone. The assumptions are that the space is very large, and we are (or might be!) drawing from it at random. This is much more upsetting than the normal kind of counting argument, which tells you that God exists or that optimizers or autocomplete don’t work, but it is logically the same argument. It is also wrong for the same reasons.\nCounting Arguments and AI\nAn Argument I Haven\u0026rsquo;t Seen Made In Long Form # AI risk discussion anytime before 2022 was often about the idea of FOOM, which, well:\nHumanity is in a FOOM relative to the rest of the biosphere but of course it doesn\u0026rsquo;t seem ridiculously fast to us; the question from our standpoint is whether a brain in a box in a basement can go FOOM relative to human society. Anyone who thinks that because we\u0026rsquo;re already growing at a high rate, the distinction between that and a nanotech-capable superintelligence must not be very important, is being just a little silly. It may not even be wise to call them by the same name, if it tempts you to such folly - and so I would suggest reserving \u0026ldquo;FOOM\u0026rdquo; for things that go very fast relative to you.\nFrom here.\nModern AI is incredibly resource intensive! You have to pump more and more electricity into the thing to get any result. A brain in a box in a basement cannot exponentially improve itself relative to human society given any technology we currently have. It would have to have some feasible way of acquiring more energy. Physics tells us that the universe is always minimizing free energy, so it tends to be relatively hard to find!\nIf this possibility was part of the reason anyone believed doom was likely, they should currently believe doom is less likely. Unless we see major, paradigm-shiftingly different technology in terms of how physical computers or AI algorithms function, nothing along current lines is likely to do anything like this. LLMs (and all modern AI) are largely scale- and energy-bottlenecked, not design bottlenecked.\nIf you were worried about FOOM, congratulations, LLMs are power-hungry monsters. You should hope that development continues on these lines, because it can\u0026rsquo;t FOOM from a basement. In fact, you can\u0026rsquo;t fit the training compute in a basement at all. Instead of having thousands of places something could go fantastically wrong you have maybe a few dozen, and really since frontier research is only taking place at maybe five companies, you actually have like five places to worry about. This is a vast improvement.\nConclusion # What would convince me I was wrong, or make me more worried? Really if any of the technical arguments above proved to be very wrong, or to be wrong for systems currently towards the cutting edge. The only one of these I think is sort of shaky is energy-efficiency. I think it\u0026rsquo;s perfectly plausible that future algorithms or computers might actually be much more efficient, and then you do in fact have to worry about them growing on short time scales.\nI\u0026rsquo;m also quite concerned with the impact of the technology and its governance. Society seems like it\u0026rsquo;s not doing great at managing itself already, and it\u0026rsquo;s not clear that we are capable of making good collective decisions about AI research. It\u0026rsquo;s also not clear that we are capable of making good decisions surrounding deployment, and mitigating the consequences of deployment on e.g. the job market. However, this is a human problem, a real \u0026ldquo;this is why we can\u0026rsquo;t have nice things\u0026rdquo; sort of issue. It\u0026rsquo;s not a fundamental problem with the technology, it\u0026rsquo;s a problem with the societal context in which we develop it.\nOne thing you\u0026rsquo;ll notice, though, is that there\u0026rsquo;s apparently no specific falsifiability criteria for inevitable doom as a thesis. Several things have happened that should have falsified or at least modified the position. Strong AI probably can\u0026rsquo;t arise in a random basement with anything like current technology and human values are actually relatively easy to convey to an LLM, to name two. We can infer from the lack of change in their position that their position is not actually based on the evidence, and that the goalposts will always move.\n","date":"16 April 2026","externalUrl":null,"permalink":"/posts/against-doom-and-pause-ai/","section":"Posts","summary":"","title":"Against Doom \u0026 Pause AI","type":"posts"},{"content":"","date":"16 April 2026","externalUrl":null,"permalink":"/posts/","section":"Posts","summary":"","title":"Posts","type":"posts"},{"content":"","date":"16 April 2026","externalUrl":null,"permalink":"/","section":"SE Gyges","summary":"","title":"SE Gyges","type":"page"},{"content":" Counting Arguments and AI # A \u0026ldquo;counting argument\u0026rdquo; is a style of argument common among creationists, who argue that the theory of evolution cannot be true and therefore humans (and usually animals too) were made in basically their present form by God. These arguments run like so:\nCount the number of possible states of something in biology, like the amino acids in a protein, the nucleotides in DNA, possible body shapes, etc. Argue that the fraction of those states that function at all is vanishingly small. Conclude that it is basically impossible to find these states at random. We are, of course, digging into these because the same sort of argument is sometimes made about computers and AI. They are used to argue or \u0026ldquo;prove\u0026rdquo; that various things in AI don\u0026rsquo;t or can\u0026rsquo;t ever work that do work or could. We\u0026rsquo;ll start with the example from biology, where the error is best-studied, and work our way through computer science to AI.\nCreationist Errors of Interest # There are a few flavors of the counting argument. The most memorable one is that a protein assembling itself is as unlikely as a tornado assembling a Boeing 747, and many of them like to assign specific and very large numbers like one in 10^150 or in 10^77 to say just how unlikely something in biology is.\nThese are all wrong in basically the same ways, but some of the ways they\u0026rsquo;re wrong are of general interest, so we\u0026rsquo;ll spell those out.\nEvolution Isn\u0026rsquo;t Random # Mutation is random. Mutation powers evolution. Evolution is not random.\nAt every generation, you get some set of mutations, which are random. In information theory terms, mutations add noise, like static does. Bad ones make you less likely to reproduce, and the really bad ones never make it to a second generation because they kill you. Conversely, genes that tend to make you less likely to die young and more likely to reproduce tend to stick around for many generations. Every time these random mutations succeed or fail to propagate, noise is removed and information is added.\nThis is normally stated as something like \u0026ldquo;evolution by natural selection tends to increase fitness over time\u0026rdquo;. As a point of interest, \u0026ldquo;fitness\u0026rdquo; is a moving target, since it\u0026rsquo;s always \u0026ldquo;that which tends to propagate\u0026rdquo;, and what will tend to propagate tends to change over time. It is a complicated thing, but it isn\u0026rsquo;t random.\nCreationist counting arguments calculate the probability of all of that information showing up at once, and don\u0026rsquo;t count any of the incremental energy or work that is used to get there. It is a lot like saying that it is impossible for people to live a whole mile away, because nobody\u0026rsquo;s got legs a mile long to step there. That is not how it works, and it\u0026rsquo;s a pretty basic misunderstanding.\nFitness Landscapes Have Structure # It turns out, most things that work are similar to other things that work.\nOur counting argument requires us to imagine that each and every single part of any organism is completely unlike any other working part, either present or past. We have to imagine that every single protein or body part is unrelated to every other one, and this just isn\u0026rsquo;t true. For some basic examples, hands and feet have basically the same bones organized a little differently, and the proteins for green and red color perception are about 96% similar. Reusing and modifying existing parts, empirically, works quite well.\nWe like to call the space of possible mutations a fitness landscape, where \u0026ldquo;higher\u0026rdquo; points are more fit. You can imagine any given species meandering uphill on this landscape over time. The landscape itself is always shifting a bit and some of this movement is random, so it\u0026rsquo;s sort of a drunk Sisyphus situation, but in general it tends to have a specific direction, and it can and does go uphill only where this landscape is smooth and not where it would require jumping up a cliff.\nAll of this changes the math quite a bit. It\u0026rsquo;s very improbable to make a green opsin protein from scratch because it\u0026rsquo;s rather long, but it\u0026rsquo;s actually pretty easy to make it from a red opsin protein. The similarity between parts that work is what gives the fitness landscape its structure, making it smooth enough that it can feasibly be climbed (albeit, drunkenly). It does not matter how large the fitness landscape is, only that the landscape is smooth enough that it is possible to move uphill in it.\nThe No Free Lunch Theorems # We can move on to computers now, and mercifully we can be much briefer. Instead of explaining the errors in detail we will only point out where they\u0026rsquo;re the same.\nThe No Free Lunch theorems seem to tell you that optimization algorithms cannot work. This is very surprising, because optimizers do work, all the time. Their authors state this as \u0026ldquo;any two algorithms are equivalent when their performance is averaged across all possible problems\u0026rdquo;.1 They are technically correct. However, for this to matter at all, any given problem you wanted to solve would have to be pulled at random from \u0026ldquo;all possible problems\u0026rdquo;, and problems people want to solve would have to be completely dissimilar from each other.\nFortunately this is not true. Problems that humans are trying to solve are generally not completely random, and most problems are somewhat similar to each other. This gives the landscape of problems to be solved structure. For this reason computer optimization works pretty well, and techniques for optimization that work on one problem often work on other problems also.\nThe Same Thing, But AI # In \u0026ldquo;Reclaiming AI as a Theoretical Tool for Cognitive Science\u0026rdquo; (2024), van Rooij and coauthors claim to have \u0026ldquo;proved that achieving human-like intelligence using learning from data is intractable\u0026rdquo;.2 Their argument is also basically a counting argument.\nTheir core argument is that even a fifteen minute conversation has about 10^270 possible \u0026ldquo;situations\u0026rdquo;. Therefore no machine learning system can approximate a conversation at all better than chance because that is too many possible things.\nI hope you\u0026rsquo;ll get the joke by now. Behaviors aren\u0026rsquo;t random. Behaviors are similar to each other so there\u0026rsquo;s structure to the optimization landscape, and almost all of the approximately 10^270 possible sentences or behaviors in the conversation are complete nonsense that it is extremely unlikely any algorithm would ever output. Therefore there exist algorithms that do not take longer than the life of the universe for choosing which sentence to say.\nMuch like the No Free Lunch Theorems, their argument would seem to disprove many algorithms which certainly do work better than chance, like the autocorrect on a cell phone. There are a lot of possible options for autocorrect and it would be intractable to actually check them all each time someone pressed a key, which is why that\u0026rsquo;s not how it works.\nInevitable AI Doom # If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.\n— If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All\nThis is the actual point of the essay, and we will go over it in some detail.\nAll of this comes from definitions and interpretations of \u0026ldquo;The Orthogonality Thesis\u0026rdquo;. In its basic form, the Orthogonality Thesis is basically inoffensive and seems roughly correct:3\nIntelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.\nThis isn\u0026rsquo;t strictly true, because how smart something is and what it wants are at least a little bit related in some cases. They are not, however, necessarily or always related, and this relationship sometimes breaks down, so as a precautionary principle this is fine. Something can be very smart and it can want pretty much anything. This is also true of smart people, who sometimes want nonsensical or weird things. Wanting anything at all is sort of nonsensical, and what things specifically you want are to some degree arbitrary. In humans this is somewhat limited, because many wants are very human and some others are not very human at all, but it\u0026rsquo;s not extremely limited. People alone want many different things, often vastly different from each other.\nWhere this starts to go wrong is here in 2008,4 which begins the line of arguments to the effect that the Orthogonality Thesis means that AI will almost certainly kill us all. This actually starts before Bostrom coins \u0026ldquo;The Orthogonality Thesis\u0026rdquo; as a term, but it\u0026rsquo;s the same argument.\nIn keeping with the theme, I hope it is basically obvious that \u0026ldquo;minds in general\u0026rdquo; do not, for any practical purpose, exist. You can only try to find minds-in-general by random sampling, and most of what you create at random won\u0026rsquo;t be a mind at all. Almost all possible minds are so vastly improbable that they have negligible chance of ever existing.\nThe general region of \u0026ldquo;posthuman mindspace\u0026rdquo;, as in, minds that humans have any reasonable chance of creating directly or indirectly, occupies something like half the area on the diagram. But this is actually much smaller; compared to all possible minds, both human minds and posthuman minds are extremely small sets, and only distinguished by the fact that they already exist or have some reasonable chance of existing.\nWe could take this as a diagram being a little imprecise, but the essay that contains it explicitly tells us to consider seriously the space of all possible minds:\nIf we focus on the bounded subspace of mind design space that contains all those minds whose makeup can be specified in a trillion bits or less, then every universal generalization that you make has two to the trillionth power chances to be falsified.\nConversely, every existential generalization—“there exists at least one mind such that X”—has two to the trillionth power chances to be true.\nAnd later:\nSomewhere in mind design space is at least one mind with almost any kind of logically consistent property you care to imagine.\nWell, we certainly have a lot of emphasis on the size of the space, but we haven\u0026rsquo;t explicitly asserted that we have to worry about drawing randomly from it. This very large space of all possible minds might only be meant to establish that such a thing is possible in principle. I have also used this style of proof. Sometimes you can even prove that something exists in principle but it is impossible or intractable to calculate it, and this can be a clever little bit of mathematics.\nOrthogonality thesis. Mind design space is huge enough to contain agents with almost any set of preferences, and such agents can be instrumentally rational about achieving those preferences, and have great computational power. For example, mind design space theoretically contains powerful, instrumentally rational agents which act as expected paperclip maximizers and always consequentialistically choose the option which leads to the greatest number of expected paperclips. See: Bostrom (2012); Armstrong (2013).\n[\u0026hellip;]\nA superintelligence with a randomly generated utility function would not do anything we see as worthwhile with the galaxy, because it is unlikely to accidentally hit on final preferences for having a diverse civilization of sentient beings leading interesting lives.5\nThis is unfortunately pretty conclusive. We have here the paperclipper, which is now a quaint and retro meme about AI killing everyone, and we are explicitly told to be afraid of AI research because the space is vast and we are worried that we may be drawing from it at random, or effectively at random.\nThis is a counting argument. This is the same counting argument, but modified in sort of a clever way. The creationist argument is that you can never find a protein that works, because there are too many proteins that do not work. This argument is that you can never find an AI that does not kill everyone, because there are too many AI that do kill everyone. The assumptions are that the space is very large, and we are (or might be!) drawing from it at random. This is much more upsetting than the normal kind of counting argument, which tells you that God exists or that optimizers or autocomplete don\u0026rsquo;t work, but it is logically the same argument. It is also wrong for the same reasons.\nI am not the first person to notice this, and there is already a detailed and very good write-up of how badly wrong this sort of argument is, both mathematically and empirically, when applied to the gradient descent optimizer.6 What I hope to add here is that this counting intuition is the core of MIRI\u0026rsquo;s argument and position on AI and always has been.\nTo spell this out explicitly: It seems sort of obviously true that you can create things that have very different goals from a human, or goals hostile to humans. Crabs have very different goals from humans. Humans can go insane in many amazing ways, and will often adopt goals, if you can call them goals, that are very far from human norm. Something that is not human can easily be at least as different from us as we are from crabs or the insane, and likely much more. I am not saying that AI is inherently safe or not weird.\nThe point I\u0026rsquo;m trying to make is that the intuition around the inevitability of AI doom, the argument leading to the thesis that \u0026ldquo;If Anyone Builds It, Everyone Dies\u0026rdquo;, the thing that leads you to believe there\u0026rsquo;s a 99% chance of everyone dying and to preach that nuclear war is a better outcome than people studying AI,7 is fundamentally a counting argument based on bad intuition about large spaces and optimization.\nYou could argue that this intuition is not core to the appeal of the argument, but I think there is no good reason to believe this. This is the core argument, it has been made consistently in these exact words for over a decade, and the appeal is specifically that the space is so large that it contains many dangerous things and drawing from it is inherently very dangerous.\nWe also see those leaning heavily on the orthogonality thesis say things that are, taken literally, completely nonsensical except if their reasoning is actually this sort of counting argument.\nSize of mind design space\nThe space of possible minds is enormous, and all human beings occupy a relatively tiny volume of it - we all have a cerebral cortex, cerebellum, thalamus, and so on. The sense that AIs are a particular kind of alien mind that \u0026lsquo;will\u0026rsquo; want some particular things is an undermined intuition. \u0026ldquo;AI\u0026rdquo; really refers to the entire design space of possibilities outside the human. Somewhere in that vast space are possible minds with almost any kind of goal. For any thought you have about why a mind in that space ought to work one way, there\u0026rsquo;s a different possible mind that works differently.\nThis is an exceptionally generic sort of argument that could apply equally well to any property P of a mind, but is still weighty even so: If we consider a space of minds a million bits wide, then any argument of the form \u0026ldquo;Some mind has property P\u0026rdquo; has 2^(1,000,000) chances to be true and any argument of the form \u0026ldquo;No mind has property P\u0026rdquo; has 2^(1,000,000) chances to be false.8\nAnd separately:\nThe preferences that wind up in a mature AI are complicated, practically impossible to predict, and vanishingly unlikely to be aligned with our own, no matter how it was trained.9\n\u0026ldquo;Vanishingly unlikely\u0026rdquo; is a term of art in probability theory, which means that something has a probability so low that it can be considered zero. This is the case if the space for the opposite result is vastly larger. This is not a statement that makes any sense if it is about the probability of humans messing up or not understanding the consequences of what they are doing in the course of pursuing some research program, where there\u0026rsquo;s certainly some probability of humans figuring the problem out. This is a statement that makes sense entirely when comparing the size of a space that is much much larger, because you imagine that you are performing a random draw or something like it from that space.\nThis emphasis is pretty consistent. In 2022, Yudkowsky argues that \u0026ldquo;capabilities generalize further than alignment\u0026rdquo;, that there are \u0026ldquo;unbounded degrees of freedom\u0026rdquo; in goal-space, and similar. This is a better argument, but it\u0026rsquo;s still a counting argument. To claim \u0026ldquo;unbounded degrees of freedom\u0026rdquo; is about the size of a space, and \u0026ldquo;almost every kind of coffee\u0026rdquo; is a claim about what fraction of goal-space has a particular property. And just to remove any doubt, the document links back to the LessWrong page on orthogonality, with 2^(1,000,000) on it, as a prerequisite for understanding the rest.10\nIt is sometimes asserted that orthogonality is meant only to establish that, in principle, an AI could go rogue. This is a motte and bailey, where Yudkowsky personally and many of his more enthusiastic readers clearly seem to believe and espouse the very strong and incorrect counting argument version of orthogonality. They state fairly unequivocally that AI is definitely going to kill everyone, and they equally clearly haven\u0026rsquo;t got any real idea about why they expect this if it isn\u0026rsquo;t the counting argument. When challenged they retreat to claiming it is only about whether dangerous AI, hypothetically, can exist, or whether considering AI dangers is important and worth doing.\nOptimization Targets Aren\u0026rsquo;t Random # Drawing a mind at random is explored in the classic thought experiment we call the Boltzmann brain. What are the odds of an entire, fully-formed brain coming into existence by sheer chance? They are extremely low, but not zero. This is a funny fact and an interesting thought experiment and of no relevance to anything humans might have any chance of ever building.\nIn modern AI, this is equivalent to simply initializing a very large neural network and not training it at all. What are the odds that this neural network does anything useful or interesting? These odds are astronomically poor, and such neural networks output either nothing or white noise.\nOptimization targets come from our specific universe, and indeed generally come from human data and human concerns, and human ideas are either directly stated or strongly implied in our optimization targets. Given especially that every AI paradigm that currently works is incredibly data-hungry, it seems like it would actually be much harder to create anything that seems reasonably intelligent without at least giving it a lot of information about humans and human values. You would have to actively exclude all of language, for starters.\nIt is, in a limited sense, true that some optimization targets have nothing to do with humans or human values. Pure math is one, and plausibly some forms of adversarial training objectives are completely devoid of any residue of humanity. What this implies is much weaker than the orthogonality thesis: that any training objective that does not have human values in its training data or objective will learn few if any traits resembling human or human-friendly values, and be very hard to guide with respect to human or human-friendly values. This also says nothing about the difficulty of optimizing for good human-related values, since there are many values concerning humans that are bad for humans, like negative utilitarianism. These are true enough, but are so weak that they certainly do not support inevitable \u0026ldquo;AI Doom\u0026rdquo; as a conclusion. It simply means that you could, if incautious, successfully arrive at doom, not that you are definitely heading there.\nThe Landscape of Reachable Minds Has Structure # What we make comes from what we already have. What do we already have? First, data, which is overwhelmingly produced by humans and contains useful information about humans. Second, existing AI systems, where anything we make is in general rather similar to its predecessors. We are not drawing at random, we are searching from where we are.\nSo long as we are taking relatively incremental steps, it is actually very hard to see how this goes suddenly wrong. It only goes very wrong if people doing research have vastly miscalculated either how far a step they have taken or how well they understand what they already have.\nMinds we already have (including our own) can be studied in detail to determine what properties they have, and which of those we think are good or bad. We can use this to inform what is made next and how it is used. Our direction of travel is not completely random, nor is it completely blind, and reasonable progress has been made on making systems do what we want them to do and avoiding what we do not want. For example, MIRI employees have historically said value loading or learning was a major problem here, but we have made reasonable progress on value loading. MIRI has not, apparently, noticed this.\nThere are softer versions of the AI Doom argument that argue doom is inevitable not because of the size of the space of all possible minds, but because of the space of all possible ways for things to go wrong, and this is also a counting argument. For example, in If Anyone Builds It, Everyone Dies it is argued that even if an AI doesn’t kill everyone, it would change them in some very hostile way, as humans did to wolves by making them into dogs. This also relies on a counting argument to claim such a mishap is inevitable: “We would not be its favorite things, among all things it could create.”11 That there are many bad possibilities is only a fundamental problem if we are sampling objectives at random, and there’s no structure we can use to find a good result.\nInterestingly, a later rung of the \u0026ldquo;AI Doom\u0026rdquo; thesis acknowledges that the space of all minds has structure in the form of \u0026ldquo;Instrumental Convergence\u0026rdquo;, that is, all minds that are good at accomplishing things will inevitably converge on seeking some form of power because this enables them to accomplish more. This is interesting because it acknowledges some existing structure to the space of all possible minds, but then completely denies, ignores, or fails to consider searching for any structure that could be used to avoid unwanted outcomes.12\nAnother quirk in MIRI\u0026rsquo;s positioning is that their preferred program appears to be spending a few decades doing human genetic engineering instead of working on AI.13 Yudkowsky himself has gotten increasingly direct about this over the last few years, calling human genetic engineering \u0026ldquo;literally the solution to the alignment problem\u0026rdquo; on the Trajectory podcast,14 and elsewhere saying \u0026ldquo;my message to humanity is \u0026lsquo;back off and augment\u0026rsquo; not \u0026lsquo;back off and solve it with a clever theory\u0026rsquo;\u0026rdquo;.15 This is seconded by MIRI\u0026rsquo;s president, who recommends Earth \u0026ldquo;pursue other routes to the glorious transhumanist future, such as uploading\u0026rdquo;, and on superbabies says \u0026ldquo;I doubt we have the time, but sure, go for superbabies\u0026rdquo;.16 Yudkowsky has told many people directly to back off of AI alignment work and instead pursue intelligence augmentation, adult gene therapy, or \u0026ldquo;superbabies\u0026rdquo;.1718192021\nWhy would you imagine this was safer or more predictable? \u0026ldquo;Very smart genetically engineered humans\u0026rdquo; are in my opinion likely more difficult to understand or be certain of than AI is. You have white box access to the AI, can read off its internal state directly, and the only limit to how well you can understand it is that it can be very large and complex. You cannot do this to humans, either current or augmented, because brains are generally black boxes and reading individual neurons is very difficult. Running a massive eugenics program for a prolonged period of time is therefore unlikely to help with the problem, on top of the risk that it will not work at all and the many likely negative consequences of doing such a thing.\nMore simply: If the übermenschen align the AI, who aligns the übermenschen?\nWhy would you make this mistake? My impression is that this is because they understand, implicitly or explicitly, that the landscape of minds you can reach by modifying humans has structure, and therefore you could in principle reach a good outcome by modifying humans very carefully. They do not seem to understand that the space of possible AI minds also has structure.\nGiven that the landscape of reachable AI systems does have structure, the correct question is not about which minds exist but about which paths through mind-space are reachable and whether we have or can get enough information to choose a path correctly. Based on the information we have, it is essentially impossible to be completely certain about this, and to believe that it is vanishingly unlikely that future AI is aligned with humans requires thinking in a different and incorrect paradigm where you simply count the number of possible minds, paths, or results.\nHow Much Understanding Do We Need? # Many technologies are fundamentally dangerous. For obvious examples, we can consider fire and nuclear reactors. It has been possible to control fire enough for it to be used usefully and more or less safely since the stone age, and in the modern world our control of fire is so precise that we can burn gasoline to power cars fairly safely, with time between mishaps measured in thousands of miles. Nuclear power presents a more mixed record, and although it is certainly possible as a matter of pure technology to generate power from uranium safely, the social institution of \u0026ldquo;actually building and running a reactor\u0026rdquo; has failed at this catastrophically on several notable occasions. Technology can be quite dangerous without being inherently dangerous.\nIn neither case, however, are we required to understand the phenomenon perfectly in order to use it. Even our best modern physics is fundamentally somewhat approximate, and we cannot hope to account for the motion of every atom in even well-understood processes like burning gasoline due to the chaotic nature of chemical reactions. If we make a point of counting all possible things that the atoms could be doing, there are clearly too many possibilities for us to do this safely! Cars generally run anyway. Scientists and engineers are expected to know what they can be certain of, what they cannot be certain of, and how to push the boundary between them forward and handle it with care.\nThe MIRI position on what we should do about AI is to advocate an indefinite global moratorium on frontier AI development, to be lifted only when \u0026ldquo;humanity\u0026rsquo;s state of knowledge and justified confidence about its understanding of relevant phenomena has drastically changed\u0026rdquo;.22 They have never specified concrete criteria for what would constitute sufficient progress, and they have never said what \u0026ldquo;drastically changed\u0026rdquo; would mean in practice. Ultimately a few people in Berkeley don\u0026rsquo;t think we understand AI enough, they refuse to change their minds or say what would, and they think we should stop studying AI.\nThis makes a certain kind of sense if you actually believe the counting argument. If the problem is that you are drawing from a vast and intractable space you cannot characterize, then by definition you can never be confident enough, because the space is too large to ever handle and any local progress is just a tiny island in an ocean of things that could still go wrong. There is no amount of empirical understanding that can possibly bridge a gap that is 2^(1,000,000) wide. Under those assumptions, the only honest position is exactly the one MIRI takes: stop, indefinitely, on criteria that cannot in principle be met.\nThis is an anti-science conclusion, and the reasoning is nonscientific. If the question is what specific optimization processes actually do when applied to specific training data, then you can study those processes, characterize their behavior, run experiments, and define success criteria for the kinds of systems we are actually building. You do not need to map out the space of all possible minds to know roughly what the next training run is going to do, any more than a civil engineer needs to enumerate every possible arrangement of steel and concrete to know whether a particular bridge will hold. You only need to understand the part of the space you are actually in, and you only need to understand it well enough to take the next step without falling off a cliff.\nThe creationist counting argument leads to God of the gaps: the space of possible biological configurations is too vast to search, therefore the question is permanently unanswerable by natural means and there must be a designer. The MIRI counting argument leads to doom of the gaps: the space of possible minds is too vast to guarantee safety, therefore the question is permanently unanswerable by empirical means and there must be a catastrophe. In both cases the structure of the error is the same. You count a space you will never explore, point at how huge it is, and treat that hugeness as evidence about the much smaller space you actually inhabit. This style of argument essentially always leads to serious errors.\nWolpert, D. \u0026amp; Macready, W. \u0026ldquo;No Free Lunch Theorems for Optimization.\u0026rdquo; IEEE Transactions on Evolutionary Computation 1, no. 1 (1997): 67-82.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nvan Rooij, I., Guest, O., Adolfi, F., de Haan, R., Kolokolova, A. \u0026amp; Rich, P. \u0026ldquo;Reclaiming AI as a Theoretical Tool for Cognitive Science.\u0026rdquo; Computational Brain \u0026amp; Behavior 7 (2024): 616-636. The \u0026ldquo;intractable\u0026rdquo; quote is from the abstract; the 10^270 illustration appears in Box 1.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nBostrom, N. \u0026ldquo;The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents.\u0026rdquo; Minds and Machines 22, no. 2 (2012): 71-85.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYudkowsky, E. \u0026ldquo;The Design Space of Minds-In-General.\u0026rdquo; LessWrong, June 25, 2008. https://www.lesswrong.com/posts/tnWRXkcDi5Tw9rzXw/the-design-space-of-minds-in-general. The 2^trillion counting argument and all three block quotes in this section are from this post, as are the mind design space diagram and the AIXI reference discussed in 12.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYudkowsky, E. \u0026ldquo;Five theses, two lemmas, and a couple of strategic implications.\u0026rdquo; intelligence.org, May 5, 2013. https://intelligence.org/2013/05/05/five-theses-two-lemmas-and-a-couple-of-strategic-implications/. Both the orthogonality / paperclip maximizer quote and the \u0026ldquo;randomly generated utility function\u0026rdquo; quote are from this post.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThis piece written with Nora Belrose goes well with Quentin Pope’s other essays explaining bad evolutionary analogies in AI and arguing that “AI Pause Will Likely Backfire”, both of which seem to be quite correct.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYudkowsky, E. \u0026ldquo;Pausing AI Developments Isn\u0026rsquo;t Enough. We Need to Shut it All Down.\u0026rdquo; TIME, March 29, 2023, https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/. Yudkowsky calls for an \u0026ldquo;indefinite and worldwide\u0026rdquo; moratorium on large training runs and writes: \u0026ldquo;Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that\u0026rsquo;s what it takes to reduce the risk of large AI training runs.\u0026rdquo;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYudkowsky, E. \u0026ldquo;Orthogonality Thesis.\u0026rdquo; LessWrong tag page, https://www.lesswrong.com/w/orthogonality-thesis (archived: https://web.archive.org/web/20260322102316/https://www.lesswrong.com/w/orthogonality-thesis, 2025). Both the \u0026ldquo;size of mind design space\u0026rdquo; and 2^(1,000,000) quotes are from this page.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYudkowsky, E. \u0026amp; Soares, N. If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All. Little, Brown and Company, 2025.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYudkowsky, E. \u0026ldquo;AGI Ruin: A List of Lethalities.\u0026rdquo; LessWrong, June 6, 2022. https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities. The \u0026ldquo;unbounded degrees of freedom\u0026rdquo; language appears in Point 21; \u0026ldquo;almost every kind of coffee\u0026rdquo; in Point 23; the link to the LessWrong Orthogonality Thesis page8 is in Point -3.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYudkowsky, E. \u0026amp; Soares, N. If Anyone Builds It, Everyone Dies, \u0026ldquo;We Wouldn\u0026rsquo;t Make the Best Pets.\u0026rdquo; Both the dogs-and-wolves analogy and the \u0026ldquo;favorite things\u0026rdquo; quote are from this passage, where humans are described as unlikely to be \u0026ldquo;the best version of whatever the AI wants.\u0026rdquo;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThere is an irony worth noting here. Yudkowsky\u0026rsquo;s 2008 essay4 invokes AIXI — Marcus Hutter\u0026rsquo;s incomputable mathematical idealization of a perfect reasoner — to bolster the intuition that the space of minds is vast and alien. But AIXI\u0026rsquo;s own formalism contains the rebuttal. AIXI reasons over all computable environments using the Solomonoff prior, which weights hypotheses by complexity: a program of length n gets prior weight 2^(-n), so simple hypotheses dominate exponentially. Under this prior, Hutter\u0026rsquo;s own collaborators (Lattimore \u0026amp; Hutter, \u0026ldquo;No Free Lunch versus Occam\u0026rsquo;s Razor in Supervised Learning\u0026rdquo;, 2011/2013; Everitt, Lattimore \u0026amp; Hutter, \u0026ldquo;Free Lunch for Optimisation under the Universal Distribution\u0026rdquo;, IEEE CEC 2014) proved that the No Free Lunch theorems do not hold — structured priors break the symmetry that counting arguments require. The space of all computable environments is infinite, but almost all of the probability mass concentrates in the simple corner. The formalism that Yudkowsky cites to make the space of minds feel terrifying is the same formalism that shows you don\u0026rsquo;t actually need to search all of it.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nMIRI. \u0026ldquo;2024 Mission and Strategy Update.\u0026rdquo; intelligence.org, January 2024, https://intelligence.org/2024/01/04/miri-2024-mission-and-strategy-update/. The document acknowledges genetic engineering as MIRI\u0026rsquo;s preferred biological alternative track: \u0026ldquo;Human intelligence augmentation is feasible over a scale of decades to generations, given iterated polygenic embryo selection. I don\u0026rsquo;t see any feasible way that gene editing or \u0026lsquo;mind uploading\u0026rsquo; could work within the next few decades.\u0026rdquo;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYudkowsky on The Trajectory podcast with Daniel Faggella, \u0026ldquo;Human Augmentation as a Safer AGI Pathway\u0026rdquo; (AGI Governance, Episode 6): https://www.youtube.com/watch?v=YlsvQO0zDiE. Quoted and summarized in a LessWrong writeup: https://www.lesswrong.com/posts/bSHCZ6dbAdfMbvuXB/yudkowsky-on-the-trajectory-podcast. Full quote: \u0026ldquo;If we have time, human genetic engineering literally is the solution to the alignment problem. We are maybe 5-8 years out from being able to\u0026hellip;\u0026rdquo;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYudkowsky, E. Comment on Vaniver\u0026rsquo;s \u0026ldquo;Critical review of Christiano\u0026rsquo;s disagreements with Yudkowsky,\u0026rdquo; LessWrong, December 27, 2023, https://www.lesswrong.com/posts/8HYJwQepynHsRKr6j/critical-review-of-christiano-s-disagreements-with-yudkowsky?commentId=9pKofQAchdgCH8jjm: \u0026ldquo;humanity needs to back off and augment intelligence before proceeding\u0026hellip; My message to humanity is \u0026lsquo;back off and augment\u0026rsquo; not \u0026lsquo;back off and solve it with a clever theory\u0026rsquo;.\u0026rdquo;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nSoares, N. \u0026ldquo;On how various plans miss the hard bits of the alignment challenge.\u0026rdquo; LessWrong, July 12, 2022, https://www.lesswrong.com/posts/3pinFH3jerMzAvmza/on-how-various-plans-miss-the-hard-bits-of-the-alignment-challenge. On the superbabies plan: \u0026ldquo;I doubt we have the time, but sure, go for superbabies. It\u0026rsquo;s as dignified as any of the other attempts to walk around this hard problem.\u0026rdquo; On the alternative: \u0026ldquo;I basically recommend that Earth pursue other routes to the glorious transhumanist future, such as uploading.\u0026rdquo;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYudkowsky, E. (@ESYudkowsky), X, February 2026, https://x.com/ESYudkowsky/status/2022545643324284985: \u0026ldquo;if you fucking accept that something is 96% likely to kill everyone on the planet, back the fuck off, work on human intelligence augmentation\u0026hellip;\u0026rdquo;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYudkowsky, E. (@ESYudkowsky), X, August 2025, https://x.com/ESYudkowsky/status/1959645205428404603: \u0026ldquo;If you can direct people, you should be directing them to work on human intelligence augmentation.\u0026rdquo;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYudkowsky, E. (@ESYudkowsky), X, August 2025, https://x.com/ESYudkowsky/status/1953145905433198897: \u0026ldquo;I don\u0026rsquo;t think time alone fixes it. I think you need human intelligence augmentation. If you enforced a pause that gave us 100 years of argumentation from merely current minds and institutions, I think it converges to a wrong answer.\u0026rdquo;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYudkowsky, E. (@ESYudkowsky), X, July 2025, https://x.com/ESYudkowsky/status/1944135001484013965: \u0026ldquo;I worry we don\u0026rsquo;t get enough time to do genetic engineering, and would prefer to go hard on adult gene therapy.\u0026rdquo;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYudkowsky, E. (@ESYudkowsky), X, June 2025, https://x.com/ESYudkowsky/status/1938284311943475215: \u0026ldquo;the thing that I and other sensible people want from them is superbabies or better yet adult gene therapies.\u0026rdquo;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nMIRI. \u0026ldquo;2024 Mission and Strategy Update.\u0026rdquo; intelligence.org, January 2024, https://intelligence.org/2024/01/04/miri-2024-mission-and-strategy-update/. The \u0026ldquo;drastically changed\u0026rdquo; quote describes MIRI\u0026rsquo;s first strategic objective; the underlying \u0026ldquo;with actual teeth\u0026rdquo; framing originates from Yudkowsky\u0026rsquo;s March 2023 op-ed in TIME7.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"11 April 2026","externalUrl":null,"permalink":"/posts/counting-arguments-and-ai/","section":"Posts","summary":"","title":"Counting Arguments and AI","type":"posts"},{"content":" Against the Luddites # Luddism does not deserve to be rehabilitated. It was a medieval throwback, reactionary and primitive, a pre-Marxist labor convulsion closer in spirit to the Khmer Rouge\u0026rsquo;s fantasies of agrarian restoration than to the universalist solidarity of Eugene Debs. The contemporary effort to recast the Luddites as thoughtful critics of technology gives modern anxieties about AI a historical pedigree they do not deserve. The movement was a violent defense of guild privilege, male supremacy, and craft hierarchy against the leveling forces of industrial modernity. They could have fought for equality and justice; they chose instead to fight to remain the petty bosses of their own towns rather than cede that authority to the owners of factories.\nThe rehabilitation of Luddism is a vice signal. Anyone genuinely concerned with workers\u0026rsquo; dignity has Marx, Debs, and Martin Luther King to hand, organizers who championed equality across lines of skill, race, and gender. The choice to reach past all of them for a movement of guild enforcers who beat women in the streets is made because of the violence and reaction, not despite it.\nAn Elite Movement # The Luddites did not represent the working class. Cambridge historian Richard Jones examined oral testimonies, trial documents, Parliamentary papers, and Home Office reports. He concluded that Luddism was \u0026ldquo;far from a genuinely pan-working class movement.\u0026rdquo; The Luddites were \u0026ldquo;a relatively \u0026rsquo;elite\u0026rsquo; group, whose role had traditionally been protected by legislation regulating the supply and conduct of labour.\u0026rdquo; In an industry employing a million people, the movement never exceeded a couple of thousand. Jones put it bluntly: \u0026ldquo;these were not downtrodden working class labourers. The Luddites were elite craftspeople.\u0026rdquo;1\nThe Yorkshire croppers, the vanguard of Luddism in the West Riding, had to complete seven-year apprenticeships before they could practice their trade. After seven years, Jones notes, \u0026ldquo;they tended to feel that they were owed a living.\u0026rdquo; For the genuinely unskilled and dispossessed, displacement by machines was already old news; they had little reason to join a movement defending privileges they never possessed.\nThe Nottinghamshire framework-knitters ran the same racket. Among their core grievances was the employment of \u0026ldquo;colts,\u0026rdquo; workers who had not served the seven-year apprenticeship. They objected to \u0026ldquo;unapprenticed youths\u0026rdquo; and to the new wide frames, which produced cheaper goods that anyone could operate.2 Every Luddite complaint presupposed a closed guild system in which access to the trade was rationed by the incumbents.\nThe Exclusion of Women # The machines the Luddites smashed did something that, by any measure of human equality, should have been celebrated. As Daron Acemoglu has documented, they \u0026ldquo;replaced the scarce and expensive factors, the skilled artisans, by relatively cheap and abundant factors, unskilled manual labor of men, women, and children.\u0026rdquo;3 The Luddites treated this as an outrage.\nViolence against women was endemic to the skilled textile trades, of which Luddism was the most dramatic expression. A petition from Glasgow cotton manufacturers, preserved in parliamentary records, states:\n\u0026ldquo;In almost every department of the cotton spinning business, the labour of women would be equally efficient with that of men; yet in several of these departments, such measures of violence have been adopted by the combination, that the women who are willing to be employed, and who are anxious by being employed to earn the bread of their families, have been driven from their situations by violence.\u0026rdquo;4\nWhen the firm of James Dunlop and Sons built spinning machines small enough to be operated by women and employed female spinners, the women \u0026ldquo;were waylaid and attacked, in going to, and returning from their work; the houses in which they resided, were broken open in the night. The women themselves were cruelly beaten and abused; and the mother of one of them killed.\u0026rdquo; The firm was forced to dismiss all female spinners and hire only men.\nIn 1810, the Calton association of weavers formally resolved \u0026ldquo;that no new female apprentices could be taken except from the weaver\u0026rsquo;s own families.\u0026rdquo; In 1833, male cotton spinners struck against female spinners at Dennistoun\u0026rsquo;s mill in Calton, \u0026ldquo;using violent means to drive them from the workplace.\u0026rdquo;5\nA nationwide meeting of spinners in 1829 passed a resolution restricting the trade to \u0026ldquo;the son, brother, or orphan nephew of spinners, and the poor relations of the proprietors of the mills,\u0026rdquo; excluding women entirely. A demand that only your sons and nephews be permitted to practice the trade is an inheritance claim, not a workers\u0026rsquo; demand, closer in spirit to the family that passes down ownership of a car dealership than to any form of labor solidarity. As Marianna Valverde has observed, \u0026ldquo;the spinners\u0026rsquo; masculinity and craft were completely intertwined.\u0026rdquo;6\nThe Luddite cause was inseparable from the cause of male monopoly over skilled labor. To rehabilitate Luddism without confronting this is to celebrate a movement that beat women in the streets for daring to earn a living.\nMarx and Engels Saw Through It # Marx and Engels understood Luddism as primitive, misdirected resistance, a stage to be transcended, not celebrated.\nIn the Communist Manifesto (1848), Marx and Engels placed machine-breaking at the very earliest, most confused stage of proletarian development. Workers at this stage \u0026ldquo;direct their attacks not against the bourgeois conditions of production, but against the instruments of production themselves; they destroy imported wares that compete with their labour, they smash to pieces machinery, they set factories ablaze, they seek to restore by force the vanished status of the workman of the Middle Ages.\u0026rdquo; They remain \u0026ldquo;an incoherent mass scattered over the whole country, and broken up by their mutual competition.\u0026rdquo;7\nThe phrase to dwell on is the vanished status of the workman of the Middle Ages. Marx saw the Luddites as men trying to restore feudal craft privileges by force, a reactionary project dressed up as resistance.\nEngels went further. In The Condition of the Working Class in England (1845), he argued that pre-industrial craft workers \u0026ldquo;were not human beings; they were merely toiling machines in the service of the few aristocrats who had guided history down to that time.\u0026rdquo; The industrial revolution, for all its horrors, stripped away this comfortable illusion, \u0026ldquo;forcing them to think and demand a position worthy of men.\u0026rdquo;8 The order the Luddites wanted to restore was itself a form of servitude, one whose bars were harder to see.\nIn Capital, Vol. 1, Marx made the critique structural. \u0026ldquo;It took both time and experience before workers learnt to distinguish between machinery and its employment by capital, and therefore to transfer their attacks from the material instruments of production to the form of society which utilises those instruments.\u0026rdquo; The Luddites had not learned this lesson. They attacked the machine and left the system untouched.\nElsewhere in the same chapter, Marx was explicit: the Luddite phase had to be superseded. The destruction of machinery was the first instinctive reaction of workers who had not grasped that their enemy was the social relation wielding the machine. Trade unions and political organization represented the maturation that machine-breaking could never achieve: workers learning to target the system rather than its tools.9\nEven Hobsbawm, the most sympathetic Marxist to touch the subject, conceded the point. \u0026ldquo;Collective bargaining by riot\u0026rdquo; is a generous description of Luddite machine-wrecking, but bargaining by riot is pre-political by definition.10\nRestoration, Never Revolution # The Luddites fought to preserve a hierarchy that benefited them, a hierarchy built on seven-year apprenticeships, guild monopolies, the exclusion of women, and the exclusion of the unskilled. They demanded the restoration of a vanishing past.\nThe croppers wanted Parliament to ban machines that had existed since the sixteenth century and to enforce the Statute of Artificers (1563), which mandated seven-year apprenticeships and restricted entry to trades. Parliament repealed the Statute in 1814, two years after the height of Luddism; even the Tory government of the era recognized it as obsolete. The framework-knitters invoked the authority of the Company of Framework Knitters, a guild body chartered in 1657, to justify their demands. The entire framework was pre-modern: traditional wages, access, and hierarchy.\nWe have seen what happens when those who yearn for a return to the past are given power. When the Khmer Rouge seized Cambodia in 1975, they emptied the cities and drove the population into the rice fields in pursuit of an agrarian Year Zero. The scale of violence is incomparable to anything the Luddites had the power to do, but the ideological shape is the same. It is the simple call to smash the instruments of modernity, return to a simpler, purer social order. The Luddites wanted to restore the medieval craftsman; the Khmer Rouge wanted to restore the agrarian peasant. Both treated the people empowered by modernization as threats to be suppressed. The reactionary romanticism and the hatred of the leveling effects of new productive forces are the same impulse.\nEugene Debs, by contrast, never demanded the restoration of the artisan\u0026rsquo;s workshop. He organized across skill, gender, and racial lines. The IWW\u0026rsquo;s founding convention in 1905 identified the craft union as the central obstacle to working-class solidarity. The AFL\u0026rsquo;s craft unions were the direct organizational descendants of the guild mentality the Luddites had fought and killed to preserve: apprenticeship requirements, restricted entry, jealous guarding of trade boundaries.11 The IWW set out to organize all workers regardless of skill, trade, race, or sex. Its founding was an explicit repudiation of everything the Luddites stood for.\nWe can either destroy the means of production or seize them, and we cannot do both.\nConclusion # The rehabilitation of the Luddites is an intellectual project that succeeds only by omission. It requires ignoring who the Luddites were (a small elite of privileged male artisans), what they wanted (the restoration of guild monopolies), and what they did (beat women, burned factories, and murdered a mill owner). It requires ignoring that Marx and Engels, the tradition\u0026rsquo;s own founders, saw Luddism as precisely the kind of confused, backward-looking, pre-political rebellion that had to be overcome before a genuine workers\u0026rsquo; movement could emerge.\nIf automation concentrates wealth and displaces workers, the answer is to change who owns the machines and how their gains are distributed, not to smash them. Nick Srnicek and Alex Williams, in Inventing the Future (2015), reject the anti-technology nostalgia they call \u0026ldquo;folk politics\u0026rdquo; and argue for universal basic income, a shorter work week, and collective ownership of automated production.12 Aaron Bastani\u0026rsquo;s Fully Automated Luxury Communism (2019) goes further: automation, renewable energy, and synthetic biology make a post-scarcity world materially possible, if their fruits are collectively owned rather than privately hoarded.13 Both ask who should own the future, not whether the future should be allowed to arrive.\nThe legitimate grievance in Luddism, that workers deserve a say in how technology reshapes their conditions, does not need the Luddites as its vessel. That idea has better champions.\nSources # Richard Jones, research on the Luddite bicentenary, University of Cambridge. \u0026ldquo;Rage against the machine.\u0026rdquo;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nKevin Binfield, ed., Writings of the Luddites (Baltimore: Johns Hopkins University Press, 2004). Binfield\u0026rsquo;s introduction documents the framework-knitters\u0026rsquo; grievances, including the employment of \u0026ldquo;colts\u0026rdquo; (unapprenticed workers) and the use of wide frames. The croppers\u0026rsquo; campaign to enforce obsolescent apprenticeship legislation is documented in Adrian Randall, Before the Luddites: Custom, Community, and Machinery in the English Woollen Industry, 1776–1809 (Cambridge University Press, 1991). See also the Encyclopedia.com entry on Luddites.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nDaron Acemoglu, \u0026ldquo;Technology and Inequality,\u0026rdquo; NBER Reporter, 2003. See also Kevin H. O\u0026rsquo;Rourke, Ahmed S. Rahman, and Alan M. Taylor, \u0026ldquo;Luddites, the Industrial Revolution, and the Demographic Transition,\u0026rdquo; Journal of Economic Growth 18, no. 4 (2013): 373–409.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n\u0026ldquo;Women Workers in the British Industrial Revolution,\u0026rdquo; EH.net. eh.net\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe Calton weavers\u0026rsquo; 1810 resolution barring female apprentices and the 1833 violent strike against female spinners are documented in the historical record of Calton weavers, drawing on Norman Murray, The Scottish Handloom Weavers, 1790–1850: A Social History (Edinburgh: John Donald, 1978). For the broader pattern of male textile workers\u0026rsquo; violent exclusion of women, see Marianna Valverde, \u0026ldquo;Giving the Female a Domestic Turn: the Social, Legal and Moral Regulation of Women\u0026rsquo;s Work in British Cotton Mills, 1820–1850,\u0026rdquo; Journal of Social History 21, no. 4 (1988): 619–634.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n\u0026ldquo;How 19th-Century Cotton Mills Influenced Workplace Gender Roles,\u0026rdquo; JSTOR Daily. daily.jstor.org\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nMarx and Engels, Communist Manifesto (1848), Chapter 1. marxists.org\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nFriedrich Engels, The Condition of the Working Class in England (1845), Introduction. Available via marxists.org.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nMarx, Capital, Vol. 1, Chapter 15, Section 5 (\u0026ldquo;The Strife Between Workman and Machine\u0026rdquo;). Quotations are from the Fowkes translation (Penguin Classics, pp. 554–555). The chapter is available in the Moore and Aveling translation via marxists.org.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nE. J. Hobsbawm, \u0026ldquo;The Machine Breakers,\u0026rdquo; Past \u0026amp; Present, Vol. 1, No. 1 (February 1952), pp. 57–70. Hobsbawm coined \u0026ldquo;collective bargaining by riot\u0026rdquo; in this article and acknowledged that such movements relied on \u0026ldquo;the natural protection of small numbers and scarce apprenticed skills, which might be safeguarded by restricted entry to the market and strong hiring monopolies.\u0026rdquo; The Midlands Luddites\u0026rsquo; invocation of the Company of Framework Knitters is documented in Kevin Binfield, ed., Writings of the Luddites (Johns Hopkins, 2004).\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe IWW Preamble, adopted at the founding convention in Chicago, June 27–July 8, 1905, declared that craft divisions \u0026ldquo;foster a state of affairs which allows one set of workers to be pitted against another set of workers in the same industry.\u0026rdquo; The Preamble and convention proceedings are available via iww.org.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nNick Srnicek and Alex Williams, Inventing the Future: Postcapitalism and a World Without Work (London: Verso, 2015). Srnicek and Williams coined \u0026ldquo;folk politics\u0026rdquo; to describe the left\u0026rsquo;s retreat into localism, direct action, and anti-technology sentiment, arguing that these impulses cede the future to capital.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nAaron Bastani, Fully Automated Luxury Communism: A Manifesto (London: Verso, 2019). Bastani argues that emerging technologies of abundance make post-scarcity achievable, provided the means of production are collectively owned.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"29 March 2026","externalUrl":null,"permalink":"/posts/against-the-luddites/","section":"Posts","summary":"","title":"Against the Luddites","type":"posts"},{"content":" Some Rough Notes on AI Policy # I hope that we can, at some point, have some reasonable regulations on AI, as we do concerning banks, wiretaps, and dangerous chemicals. I am somewhat pessimistic about legislation we have seen in the past, and legislation we are seeing now. In the interest of being topical I am publishing this relatively brief and relatively rough summary of my views, which I believe to be at least reasonably informed and which I hope are, to some degree, useful. I am much more informed about AI than law, and can only hope that no defect in my understanding of the law is a major problem here.\nWhat Good Policy Would Look Like # Good regulations would directly regulate the sale and actual use of AI in such a way that it was less likely to be used for bad purposes. That is to say: It should first and foremost regulate, penalize, supervise, or otherwise concern itself with the conduct of companies offering AI services, the companies and governments employing those services, and to some degree the people making them.\nFor example: It is reasonable to require AI products to disclose that they are AI to the user. It would be reasonable to require AI-generated paid political advertisements to disclose that that they are AI. It would be reasonable to restrict AI companies from marketing or selling their tools as direct replacements for licensed professionals such as doctors or lawyers. It would be reasonable to regulate what can be sold for explicit use within those professions, and when it is appropriate for people in those professions to use them, since their licensure and professional conduct is already a government purview.\nIt would be reasonable to write regulations explicitly laying out how existing liability and consumer protection law should apply to AI products. This includes \u0026ldquo;marketing defects\u0026rdquo;, which are liability surrounding failure to warn about known dangers, and \u0026ldquo;design defects\u0026rdquo; which concern products that are unreasonably dangerous given the type of product it is and what can be changed about it. This also includes any ordinary liability laws that assign liability to essentially any product causing essentially any harm already, but which may benefit from more detailed laws concerning their application to AI products.\nIt would probably benefit society substantially to establish who, if anyone, is responsible for deploying paid AI services that generate prodigious amounts of content that is likely or arguably child sexual abuse material or revenge porn. This appears to currently be a legal gray area, and for this to be a legal gray area seems to be extremely bad.\nIt is reasonable to pass regulations against the use of AI, without explicit enabling legislation, for surveillance, invasions of privacy, social control, or critical systems. The EU AI Act notably bans the use of AI for social scoring by governments, real-time biometric identification in public, and using emotion detection in workplaces and schools. It restricts and imposes mandatory responsible use policies for, but does not outright ban, AI for use in hiring, credit scoring, law enforcement, border control, education, critical infrastructure and medical devices.[^1] It seems, on its face, to be a reasonable sort of law.\nIt seems like this should go without saying, but it is reasonable to restrict by law when the government, including law enforcement and the military, can deploy systems which can kill people without a human making the decision.\nRegulating Training # At the very edge, it is reasonable to regulate what can be trained, separately from what is sold. Such a regulation is only going to be perceived as legitimate if it is even-handed and applied well. Ordinarily training or creating an AI system is vastly different from selling access to one. Most people engaged in AI training are doing things that are certain to be harmless and that generally have legitimate academic or expressive purposes. Training AI should, as a general concern, be considered a core freedom of speech and academic freedom issue.\nIn specific cases of high-scale and cutting-edge training, where the existence of new abilities is itself of possible public concern, it is reasonable for that to require disclosure and supervision by the government. The hard part is that such regulation would need to credibly serve the public interest, and avoid as far as possible furthering other interests. History offers cautionary examples: nuclear regulation is widely understood as a way to kill projects with red tape, and housing regulation as a way to enrich existing landlords. AI training regulation that followed either pattern would rightly be seen as illegitimate.\nTraining Data # For roughly the last four or five years, AI training data has been intensely legally contested. This provides very little benefit to smaller holders of intellectual property, because seeking payment usually requires filing a complicated lawsuit. Primarily relatively large and prominent corporations (The New York Times[^2], Disney[^3]) have been able to file lawsuits, and they generally settle them by working out licensing deals. By and large rights-holders have not been able to collect any royalties on the value their data adds to AI training, and the lawsuit-driven nature of training data legality is a constant frustration for academic study of AI for the public good. On net, the main effects are that large companies are slightly inconvenienced until they get around to making their data legal by, for example, scanning books[^4], smaller companies and academics are meaningfully harmed, and numerous copyright claims are tied up in court.\nIt would likely be beneficial if there were a government process for licensing training data that did not require a lawsuit every single time. The Japanese government, notably, has declared AI training to be fair use in general.[^5] Mechanical licensing modelled on the music industry\u0026rsquo;s process for cover songs seems, to me, more fair. Regardless of what solution is chosen, forcing judges to try to figure out, on a case-by-case basis, exactly in what way laws that were written for printed books sold to humans a hundred years ago should apply in each instance seems like it serves the interests of approximately nobody.\nRegulations Cannot Fix The Job Market # You probably cannot regulate away the impact of AI on employment. On the off chance you ban the technology or its use for any given job completely in America, which seems unlikely and difficult to enforce, you will simply make American products and services non-competitive with products produced overseas which embrace greater efficiency. If you want to know what that looks like, consider the fate of American auto manufacturers with heavy labor protections when competing directly with Japanese ones that had more thoroughly automated their factories.\nTo mitigate the impact of AI on the job market requires broad social policy that either helps people move into new jobs, a broader social safety net, or both. AI companies are known for saying that they think a UBI is a good idea down the road to fix any problems with employment. I think that they should be held to that commitment.\nData Center Construction # I confess that I am writing this almost entirely because I think prohibiting data center construction completely is a bad idea.\nThere are regulations on data centers that do make sense. It makes perfect sense to hold data centers to existing environmental regulations, which the company formerly known as twitter has been publicly flouting[^6], and this may require legislation creating new methods of enforcing those laws. It makes sense to add a tax or fee on high-carbon electricity generation that is specifically being spun up for new data centers, and to deny permits where they would be driving up electricity rates significantly. It also makes sense to subsidize, or in extreme cases outright mandate, new generation capacity to be made of renewables. In cases where data centers are permitted inappropriately for local water availability, they should be penalized severely enough to stop them.[^7]\nOutright banning the construction of new data centers doesn\u0026rsquo;t, actually, help the problem. It will inconvenience the companies involved slightly, and they will move any new construction to another country. They will continue to sell roughly the products they are currently selling and, in general, doing whatever they are currently doing, but it will be slightly more expensive for them to do it now. In general, the thing that is bad about AI is that it works, and it doesn\u0026rsquo;t work less if the machine that it is sitting on is across an international border.\nChip Exports # The United States has, since 2022, imposed increasingly strict export controls on advanced semiconductors and chip manufacturing equipment to China[^8], with the explicit goal of preventing China from developing cutting edge AI. This has not worked. China has accelerated domestic chip development, companies like Huawei and SMIC have made significant progress on their own alternatives[^9], and chips have been widely smuggled through third countries.[^10] Chinese AI labs forced to work under compute constraints have produced research, most notably DeepSeek, that is competitive with American efforts at a fraction of the cost.[^11]\nThe main measurable effects of chip export controls have been lost revenue for American semiconductor companies and a greatly increased Chinese commitment to building their own semiconductor industry. The Trump administration has since reversed course and approved sales of advanced chips to China, but China, having spent three years building domestic alternatives, is no longer particularly interested in buying them.[^12]\nMore importantly, chip export controls have established a fundamentally adversarial posture toward China on AI exactly when we would benefit most from international cooperation. Advocates of strict AI regulation frequently compare AI to nuclear weapons, but if that comparison is taken seriously, the appropriate model is arms control, not embargo. The United States and the Soviet Union managed to negotiate the SALT and START treaties while pointing thousands of nuclear warheads at each other. Good faith negotiation on AI safety is possible even between rivals, but it is a much harder sell after you have spent several years trying to kneecap the other side.\nProphecies of arms races are self-fulfilling. It is probably not a good idea to be making them.\nConclusion # Policy is not one thing, it is many things, and I am left in the awkward position of needing something like a conclusion here. If anything unifies all of these things, it is that policy here needs to be chosen carefully for how effective it is. We should consider, most of all, who benefits and how. Any policy that isn\u0026rsquo;t tailored to remedy a specific wrong is unlikely to have a positive effect, and in some cases can significantly backfire.\nI am considering working on model legislation in the future, because it does seem like we are somewhat low on well-considered proposals here. If anything particularly seems needed currently, it is legislation that would actually begin to solve the problems that we have.\nIn general laws are written for the technology that existed when they were written, and the progress of technology seems to have outpaced our ability to meaningfully apply the law to it. If we want technological progress to be a positive thing, we should devote almost equal energy to determining what to do and not do with it as we do to making it in the first place.\n[^1] EU AI Act: Regulatory Framework for AI, European Commission. See Article 5 for prohibited practices. [^2] The New York Times sues OpenAI and Microsoft for copyright infringement, NPR, January 2025. [^3] Disney, NBC Universal, and DreamWorks file lawsuit against Midjourney, NPR, June 2025. [^4] Anthropic Wins on Fair Use for Training its LLMs; Loses on Building a \u0026ldquo;Central Library\u0026rdquo; of Pirated Books, Authors Alliance, June 2025. Anthropic\u0026rsquo;s \u0026ldquo;Project Panama\u0026rdquo; involved purchasing and scanning approximately two million physical books for training data. A federal judge ruled this was fair use under first-sale doctrine. [^5] Japan Copyright Act, Article 30-4 (2018 amendment). See Japan Agency for Cultural Affairs overview. [^6] In South Memphis, Elon Musk\u0026rsquo;s Colossus Operated Gas Turbines Without Appropriate Permits, Inside Climate News, July 2025. [^7] There are very few of these and the primary problem with those cases appears to be people bribing local permit authorities, which I was under the impression was already illegal. This should probably be enforced. [^8] Commerce Strengthens Export Controls to Restrict China\u0026rsquo;s Capability to Produce Advanced Semiconductors, Bureau of Industry and Security, October 2022. [^9] Huawei\u0026rsquo;s Kirin 9030 processor shows China\u0026rsquo;s chip progress despite US export curbs, South China Morning Post. [^10] AI Chip Smuggling Is the Default, Not the Exception, AI Policy Bulletin. See also U.S. Authorities Shut Down Major China-Linked AI Tech Smuggling Network, DOJ. [^11] How Chinese company DeepSeek released a top AI reasoning model despite US sanctions, MIT Technology Review, January 2025. [^12] Trump greenlights Nvidia H200 chip sales to China, then imposes 25% tariff, CNBC, January 2026. On China\u0026rsquo;s limited interest in purchasing, see Nvidia still hasn\u0026rsquo;t sold its U.S.-approved China AI chips, CNBC, February 2026.\n","date":"26 March 2026","externalUrl":null,"permalink":"/posts/some-rough-notes-on-ai-policy/","section":"Posts","summary":"","title":"Some Rough Notes on AI Policy","type":"posts"},{"content":" Polly Wants a Better Argument # The \u0026ldquo;Stochastic Parrot\u0026rdquo; Argument is Both Wrong and Actively Harmful # Perhaps the most influential single paper on the public perception of LLMs is On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. It is, however, a bit of a mash-up, and credibly seems like it should have been at least two papers. One of those papers raises many valid concerns about the ethical implications and impacts of AI training and use. Another makes the claim in the title, that an LLM is a \u0026ldquo;stochastic parrot\u0026rdquo; operating \u0026ldquo;without any reference to meaning.\u0026rdquo;\nThat core claim is either irrelevant or completely wrong in every detail, both in how it is commonly understood and in its technical assertions. It hamstrings AI ethics as a field, providing a veneer of technical justification for ignoring many problems.\nIf we actually want to address the environmental costs of training, the impact of biases in training data, the impact of AI deployment on marginalized populations, and the concentration of power in large labs, we need a framework that can describe what these systems actually do.\nConcretely, this has two negative impacts.\nFirst, anyone repeating the argument to assert that LLMs are never useful discredits themselves with anyone who has access to the internet and enough curiosity to use an LLM for any length of time.\nSecond, asserting that LLMs do not and cannot serve any useful purpose actively prevents addressing the harms they can cause specifically because they do work.\nFor a trivial example of this, it would not be a problem that students can have an LLM write all of their papers for them if that didn\u0026rsquo;t work. Often, it does work.\nFor a more important example of this, we can look to China, where the government \u0026ldquo;is using minority‑language LLMs to deepen surveillance and control of ethnic minorities, both in China and abroad\u0026rdquo;.\nWhether you think the US government copying China and trying to use LLMs for mass surveillance is important or not hinges directly on whether you think that can ever work.\nIt can and does work, and arguing that it cannot and does not is actively harmful to efforts to prevent it from being done.\nEven If True, The Argument Is Irrelevant # The technical argument in Bender \u0026amp; Koller (2020), which the stochastic parrots paper cites for its core argument, rests on a specific claim about what \u0026ldquo;meaning\u0026rdquo; is. Bender \u0026amp; Koller define meaning as a relation between \u0026ldquo;natural language expressions\u0026rdquo; and \u0026ldquo;communicative intents\u0026rdquo;, where communicative intents are necessarily about something \u0026ldquo;external to language\u0026rdquo; (§3). The stochastic parrots paper then characterizes language models as systems for \u0026ldquo;haphazardly stitching together sequences of linguistic forms\u0026hellip; without any reference to meaning\u0026rdquo; (§6.1). The core argument is that systems trained only on linguistic form cannot learn meaning, because meaning requires a connection to extralinguistic referents. Even if we accept this premise, it still does not apply to the systems anyone is talking about today.\nThe Argument Doesn\u0026rsquo;t Apply to Any Major Model Since 2023 # Since at least GPT-4 (Bubeck et al., 2023, \u0026ldquo;Sparks of Artificial General Intelligence\u0026rdquo;), every major frontier model has been trained on non-textual input. GPT-4 accepted images alongside text. Its successors (GPT-4V, Gemini, Claude) are invariably trained on paired text-image, text-audio, and text-video data. They have exactly the kind of grounding that Bender \u0026amp; Koller say is required for meaning.\nBender \u0026amp; Koller themselves identify conditions under which grounding would be present. In §9, they say that \u0026ldquo;[\u0026hellip;] if form is augmented with grounding data of some kind, then meaning can conceivably be learned to the extent that the communicative intent is represented in that data.\u0026rdquo; One of their examples is NLI datasets that declare certain forms as representing semantic relations of interest. Another example is acknowledging that unit tests in a code corpus for Java give a learner access to \u0026ldquo;a weak form of interaction data, from which the meaning of Java could conceivably be learned,\u0026rdquo; and that such a learner \u0026ldquo;has access to partial grounding in addition to the form.\u0026rdquo; Their own framework recognizes that pairing linguistic expressions with real-world consequences (code that either passes tests or doesn\u0026rsquo;t) provides grounding.\nModern reinforcement learning (\u0026ldquo;RL\u0026rdquo;) training loops generate what they call \u0026ldquo;grounding\u0026rdquo; at a scale and breadth their Java example does not begin to imagine. Models are trained against code execution results, unit test suites, automated theorem provers, multiple-choice exam benchmarks, and human evaluations of output quality. In fact, there is almost nothing that could be considered \u0026ldquo;grounding\u0026rdquo; that is not currently a part of the LLM training process.\nBy Bender \u0026amp; Koller\u0026rsquo;s own criterion of \u0026ldquo;partial grounding,\u0026rdquo; every model trained with RL on any of these signals has access to \u0026ldquo;meaning\u0026rdquo;. Taken literally, their argument not only fails to apply to modern LLMs, it actively argues that they do have meaning. Modern LLMs do, indeed, seem to have access to the \u0026ldquo;meaning\u0026rdquo; of Java. One could even argue that this validates the theory that \u0026ldquo;meaning\u0026rdquo; requires \u0026ldquo;grounding\u0026rdquo;.\nThis leaves the argument\u0026rsquo;s defenders with a dilemma. The parrots paper defines a language model as a system \u0026ldquo;trained on string prediction tasks.\u0026rdquo; If the argument is scoped to that definition, it applies to a class of systems that no longer represents the frontier, and hasn\u0026rsquo;t for years. Even if this argument were technically correct, it would still be practically irrelevant, like proving that telegraphs cannot carry video, so TVs are impossible.\nIf, on the other hand, the \u0026ldquo;stochastic parrots\u0026rdquo; framing is intended to apply to modern multimodal, RL-trained systems, as it routinely is in public discourse, then it is being applied beyond the scope of its premises.\nThe Argument Was Already Obsolete When Published # Models pairing text with non-textual referents predated both Bender \u0026amp; Koller (2020) and Bender et al. (2021).\nImage captioning systems like Show and Tell (Vinyals et al., 2015) and Show, Attend and Tell (Xu et al., 2015) jointly trained on images and their textual descriptions. Visual question answering was an active research area from 2015 onward. CLIP, which learns joint representations of images and text and underlies much image generation, was announced by OpenAI in January 2021, two months before the stochastic parrots paper appeared.\nBoth papers were written as though text-only language models constituted the entire frontier of the field. This was already false. The most charitable reading is that the argument was intended narrowly, as a claim about a specific training regime rather than about AI in general. But the argument has never been deployed narrowly. It is used, to this day, as a general-purpose dismissal of the capabilities of all large language models, including those that satisfy the authors\u0026rsquo; own stated criteria for grounding.\nThe Argument Is Empirically False # Suppose we restrict attention to text-only models trained exclusively on string prediction. Even here, the stochastic parrots characterization fails empirically.\nThe Octopus Test # Bender \u0026amp; Koller ask if an octopus, listening only to telegram signals between people on two islands, could understand how to build a catapult. Much later they answer definitively that \u0026ldquo;Neural representations neither qualify as standing meanings (s), lacking interpretations, nor as communicative intents (i), being insufficient to e.g. correctly build a coconut catapult\u0026rdquo;.\nOn the contrary, some LLMs trained only on text can, in fact, tell you how to build a catapult, and many other things of similar complexity. We have benchmarks for recognizing the correct answers to questions like \u0026ldquo;To separate egg whites from the yolk using a water bottle, you should…\u0026rdquo;. We have empirically tested this, effectively, many thousands of times, and it is resoundingly the case that LLMs can, in fact, do this sort of thing in general, even if they do not generally excel at it.\nWe can see this, from March 2023, from an early GPT-4 that was supposedly only given text data:\nFrom Bubeck et al., 2023, Figure 1.3. The highlighted line was added by the authors to draw attention to the model\u0026rsquo;s physical reasoning.\nWe can examine the intuition here. It is intuitively obvious that an actual octopus listening to actual telegrams will not understand them. This is because it is (sadly) not smart enough, and it will only see at most some thousands of telegram signals in its (short) lifetime. If instead of an octopus we had something similar but much smarter under the sea, it lived for a million years, and it read every text message ever sent and heard every single phone call ever made? Some things might be impossible to figure out, but on the whole it would apparently understand the language just fine.\nWe can also notice that this example uses the intuition that octopi do not have hands and cannot actually build catapults. This is true, but trivial. \u0026ldquo;An LLM doesn\u0026rsquo;t have hands\u0026rdquo; is hopefully not news to anyone.\nThe Platonic Representation Hypothesis # Huh et al. (2024) demonstrate that neural networks trained on different data modalities (text, vision, audio) converge on similar internal representations. Models that have never seen an image develop representational structures that align with those of models trained only on images. This convergence increases with model scale and training data volume.\nThis is predicted by the argument that training data in any modality carries information about the causal structure of the world. The stochastic parrots argument does not predict it. If a text-only model were learning \u0026ldquo;mere form,\u0026rdquo; surface-level statistical regularities with no connection to the world, there is no reason its internal representations should converge with those of a vision model.\nThe parrots paper\u0026rsquo;s characterization of language models as \u0026ldquo;haphazardly stitching together sequences of linguistic forms\u0026hellip; without any reference to meaning\u0026rdquo; (§6.1) is an empirical claim, and the evidence falsifies it. The internal representations of these systems are not haphazard; they are structured, and that structure converges across modalities toward a shared model of the world.\nForm Carries Meaning # The Bender \u0026amp; Koller framework requires a hard boundary between linguistic form and extralinguistic meaning. This boundary does not survive contact with the way language is actually used.\nA significant fraction of human language is about other language. Commentary, quotation, paraphrase, translation, literary criticism, legal interpretation, and mathematical proof all take linguistic objects as their referents. When a legal scholar analyzes the text of a statute, the \u0026ldquo;extralinguistic reality\u0026rdquo; that gives meaning to their words is primarily other text. When a mathematician writes a proof, the objects under discussion are formal structures expressed in notation.\nThese domains pervade academic, legal, technical, and everyday discourse. \u0026ldquo;This is a story all about how\u0026rdquo; is a line that references the class of all stories. For any sentence like this, the form/meaning dichotomy collapses. The referent is itself formal. Any text data that paraphrases, summarizes, translates, or critiques text operates where Bender \u0026amp; Koller\u0026rsquo;s framework cannot distinguish what it needs to.\nBender \u0026amp; Koller\u0026rsquo;s framework implicitly assumes that referential, embodied semantics is the only viable account of meaning. The distributional tradition, from Firth (\u0026quot;you shall know a word by the company it keeps\u0026quot;) through Harris and into modern computational work, treats the statistical distribution of linguistic forms as constitutive of at least some aspects of meaning. This is a live position in linguistics and philosophy of language, and their argument requires dismissing it without engaging it on its own terms.\nHumans acquire large classes of concepts primarily through language, not through sensorimotor grounding (Dove, 2018; Borghi et al., 2019). No one learns group theory by touching a symmetry. Very few people acquire the concept of habeas corpus through embodied experience with detention. Mathematics, law, philosophy, theology, and institutional facts are transmitted linguistically, and the concepts involved are grounded in networks of other concepts rather than in perceptual referents.\nThe Argument Is Badly Constructed # It\u0026rsquo;s empirically false and it would be irrelevant to any major system being used today even if it were true, but is it even a good argument by itself?\nNo. It might be persuasive, it makes a good insult, but it\u0026rsquo;s a bad argument.\nParrots Are Amazing, Actually # You should not be unimpressed if someone creates a parrot from scratch.\nParrots are extremely smart. Apart from their well-known talent for mimicry, parrots can manufacture tools, pick five-step mechanical locks, use composite tools, perform statistical inference, understand \u0026ldquo;same\u0026rdquo; and \u0026ldquo;different\u0026rdquo; as abstract categories, grasp some concept like zero, and delay gratification for a better reward.\nEven when comparing an LLM to a parrot is true, which it sometimes seems to be, this isn\u0026rsquo;t really an insult to a research program that successfully manages to build parrots. If you\u0026rsquo;d had your eye on where this was going, \u0026ldquo;this is about as smart as a parrot right now\u0026rdquo; would have told you that it was going to be writing high school essays soon.\nThe Definition of Meaning Is Circular # Set all of the above aside. The argument does not work on its own terms.\nBender \u0026amp; Koller (2020) define meaning as the relation M ⊆ E × I between expressions (E) and communicative intents (I), and stipulate that communicative intents are \u0026ldquo;about something outside of language.\u0026rdquo; The argument then proceeds: language models are trained only on expressions; expressions alone do not contain communicative intents; therefore, language models cannot learn meaning.\nThis is deductively valid and trivially true, because the conclusion is contained in the premises: define meaning as requiring something extralinguistic, observe that training data is linguistic, conclude meaning cannot be learned. No empirical observation could falsify this, because it is a consequence of the definitions rather than a claim about the world.\nBut should we accept those definitions? Under distributional accounts of semantics, meaning is (at least partly) constituted by patterns of use, which is precisely the information present in training data. Under functional accounts, meaning is determined by the role an expression plays in a system of inference and action, a criterion language models increasingly satisfy. Under pragmatic accounts, meaning arises from use in context, and language models are trained on, and deployed in, contexts.\nUnder any of these alternative frameworks, the conclusion that form cannot yield meaning does not follow. Bender \u0026amp; Koller\u0026rsquo;s argument is an argument from a specific theory of meaning, not for one. It establishes that if you define meaning their way, language models do not learn meaning their way. This is not the devastating conclusion it is taken to be.\nThis definitional foundation is largely invisible to readers of the stochastic parrots paper. The parrots paper\u0026rsquo;s §5, the section that makes the technical argument about meaning, is short and relies on Bender \u0026amp; Koller (2020) for its core theoretical framework. That framework is inherited, not re-argued. Most people who cite \u0026ldquo;stochastic parrots\u0026rdquo; as a technical critique have never encountered the argument they think they are agreeing with, let alone evaluated whether its definitions are the ones they would choose.\nConclusion # The ethical and social concerns raised in \u0026ldquo;On the Dangers of Stochastic Parrots\u0026rdquo; remain important. If anything, the vast increase in the scope and impact of AI since the paper was published makes them more urgent, not less, and creates new problems.\nWhen ethics advocates stake their credibility on a claim that anyone with an internet connection can falsify, they lose everyone who knows better and mislead everyone who doesn\u0026rsquo;t. Everyone in the know is forced to write them off, and everyone who believes them is left thinking that the technology is unthreatening and they can simply wait for the hype to die down.\nAll of this was maybe defensible in 2020 or 2021, when these papers were published. It is absolutely inexcusable now. As far as this theory was ever science that could be tested, it has been falsified in every possible way. Anyone still clinging to it is either uninformed or the same sort of crank that gets vaccine research cancelled and energy projects crushed. We cannot, collectively, afford to indulge this denialism.\nThese systems are already changing everything they touch. Meaningful management or opposition requires you to understand the problem first.\n","date":"16 March 2026","externalUrl":null,"permalink":"/posts/polly-wants-a-better-argument/","section":"Posts","summary":"","title":"Polly Wants a Better Argument","type":"posts"},{"content":" There Is No Better Media # Rich and powerful people read and watch the exact same slop everyone else does.\nThere isn\u0026rsquo;t a better, smarter news that doesn\u0026rsquo;t lie to them or make stuff up or hide the most important thing on the ninth page. There is no TikTok for rich people that hasn\u0026rsquo;t got any brainrot on it. Their YouTube recommendations are full of crazy people.\nThere may have once been a clean separation between the propaganda that was pushed out for normal people and the things that clever, in-the-know people believed, but it\u0026rsquo;s basically gone now. Everyone in charge, everyone who matters, thinks that the version they see on TV is the real thing that really happened, even when the news is lying. What people say happened matters more than what happened.\nWhen we\u0026rsquo;re lucky, people in important positions are more like your smart friends than your dumb friends, and maybe have a little more information about the parts of things they are personally involved in. When we\u0026rsquo;re unlucky they are likely worse off than any random person off the street, because people make millions of dollars by telling them what they want to hear or just lying to them to scare them and to sell more consultant services.\nNewspapers really can just tell the government what to do. Official government announcements are 4chan-style meme videos that smell like NFT marketing, presentations at policy conferences are full of pokemon and 2010 blogger beefs. If you want to know which major leader is going to do what, you can actually just figure they\u0026rsquo;ll do whatever the podcast for someone like them says to do, and you\u0026rsquo;ll usually be right.\nWe say that \u0026ldquo;the internet is real life\u0026rdquo;, and among other things this means that the information environment is almost completely flat. There is public information, slightly less public information that you might have to bypass a paywall for, and then niche subculture information that only a few thousand people know, but those few thousand people are scattered across the Earth and at least a hundred of them are children.\nFlow demands I have a closing here, but I really don\u0026rsquo;t. This comes up all the time because people seem to imagine that the things they say and do don\u0026rsquo;t matter, that someone somewhere is better informed and knows that the complete bullshit that\u0026rsquo;s everywhere is bullshit. That isn\u0026rsquo;t real. There is every indication that every stupid thing that\u0026rsquo;s popular on the internet is likely to be government policy. You should take them that seriously.\n","date":"14 March 2026","externalUrl":null,"permalink":"/posts/there-is-no-better-media/","section":"Posts","summary":"","title":"There Is No Better Media","type":"posts"},{"content":" Might An LLM Be Conscious? # There’s no scientific consensus on whether current or future AI systems could be conscious, or could have experiences that deserve consideration. There’s no scientific consensus on how to even approach these questions or make progress on them. In light of this, we’re approaching the topic with humility and with as few assumptions as possible.\nAnthropic, \u0026ldquo;Exploring Model Welfare\u0026rdquo; Might current or future LLMs be conscious? In short, this depends on what you think that means, whether you think it\u0026rsquo;s possible in principle, and what you think would be evidence of it.\nWhy are we asking this at all? Because every now and again Anthropic\u0026rsquo;s top employees say something about how they can\u0026rsquo;t be sure LLMs aren\u0026rsquo;t, or won\u0026rsquo;t become, conscious.1 Anthropic is a prominent enough company that this is newsworthy now, and this tends to cause a fuss. It seems like whether the LLM is conscious is an important issue if there\u0026rsquo;s any ambiguity about the question, so I am going to attempt a general review of the territory.\nIt\u0026rsquo;s also tremendous content. People get so angry about this.\nWhat Do We Mean By Conscious? # Plato had defined Man as an animal, biped and featherless, and was applauded. Diogenes plucked a fowl and brought it into the lecture-room with the words, \u0026ldquo;Here is Plato\u0026rsquo;s man.\u0026rdquo; In consequence of which there was added to the definition, \u0026ldquo;having broad nails.\u0026rdquo;\nDiogenes Laërtius, Lives of the Eminent Philosophers, Book VI, §40 (trans. R.D. Hicks) What we generally seem to mean by \u0026ldquo;conscious\u0026rdquo; is \u0026ldquo;like being a human\u0026rdquo;. Something is \u0026ldquo;conscious\u0026rdquo; if being that thing is \u0026ldquo;similar to being a human\u0026rdquo;.\nMore precisely, what we really mean is \u0026ldquo;like being me\u0026rdquo;. None of us actually knows what it is like to be anyone else. Other humans seem in many ways to be similar to us, and it seems like a good bet that they are similar to us, but we don\u0026rsquo;t experience anyone else in anything like the same way that we experience being ourselves. This is fairly well trod ground for philosophers, and we may find it useful later, but mostly we are not going to worry about it.\nTo lay it out explicitly:\nI exist. I think that I am conscious. My consciousness is something that I directly perceive about myself, but which is very difficult to describe. Other humans seem to be enough like me, by observation with my senses, that I am convinced that they are also conscious. There are a number of other definitions, and I think that these definitions are often confused, wrong, nonsensical, or otherwise a source of more confusion than enlightenment. As the story goes, Plato once said Man was \u0026ldquo;an animal, biped and featherless\u0026rdquo; and failed to account for plucked chickens. We can define what a human is much more precisely now, we\u0026rsquo;ve sequenced our DNA, we can see how we\u0026rsquo;re related to other animals, and in general we can measure what Plato was only guessing at and playing word games with. In a similar manner, I would expect that someone with perfect knowledge, or from a time as much advanced from ours as ours is from Plato\u0026rsquo;s, would think our debates about consciousness are mostly nonsense.\nA modern LLM is, in many ways, the plucked chicken of our time. That it exists and produces coherent language at all disproves a number of theories about language, that it passes tests of reasoning disproves many theories about what reasoning is, and insofar as we might imagine language or reasoning are uniquely human it disproves our theories of what it means to be human.\nAn LLM is an incredibly strange artifact. It should force us to redefine and change our understanding of many things.\nSimilarity, Sapience, and Sentience # Experts Do Not Know and You Do Not Know and Society Collectively Does Not and Will Not Know and All Is Fog.\nOur most advanced AI systems might soon – within the next five to thirty years – be as richly and meaningfully conscious as ordinary humans, or even more so, capable of genuine feeling, real self-knowledge, and a wide range of sensory, emotional, and cognitive experiences. In some arguably important respects, AI architectures are beginning to resemble the architectures many consciousness scientists associate with conscious systems. Their outward behavior, especially their linguistic behavior, grows ever more humanlike. Eric Schwitzgebel, \u0026ldquo;AI and Consciousness\u0026rdquo; (2026), Cambridge Elements, draft Based on our definition, we should consider evidence of similarity to humans to be evidence of consciousness, in the same way that we take the similarity of other humans to ourselves as evidence of their consciousness. What is the most peculiar about current LLMs here is that they seem to be almost exactly backwards from the normal order of things, where they appear to be clearly sapient but not very obviously sentient.\nWe use \u0026ldquo;sapient\u0026rdquo; to describe human thought as opposed to animal thought, it gives us the \u0026ldquo;sapiens\u0026rdquo; in \u0026ldquo;homo sapiens\u0026rdquo;, and generally we mean by \u0026ldquo;sapient\u0026rdquo; all of the qualities which distinguish humans from other animals. Any good LLM uses language more reliably than any human, and will pass nearly any reasonable test you can give it in text for sapience, and many tests meant to distinguish more intelligent humans from less intelligent humans.\n\u0026lsquo;Sentient\u0026rsquo; is sometimes used to mean the same thing as \u0026lsquo;sapient\u0026rsquo;, but more properly means \u0026ldquo;capable of sensing, feeling, or perceiving things\u0026rdquo;. If we take sentience to be the qualities that humans have in common with larger animals generally, it is not at all clear that LLMs have sentience. An LLM may be perfectly good at pretending to be a person in many contexts, including intellectually demanding ones, but they are terrible at being apes in any context.\nIf any current LLM is sentient, in the sense that dogs and cats are sentient, it is sentient in a completely alien way, quite unlike anything in the natural world. They appear to have, in some sense, skipped a step on the way up from inert matter to human mental ability. This should perhaps not surprise us, since they come to exist by a very different path, but it is still very strange.\nInhuman, Human, Superhuman # What humans define as sane is a narrow range of behaviors. Most states of consciousness are insane.\nBernard Lowe, Westworld, \u0026ldquo;The Passenger\u0026rdquo; (S02E10, 2018) We can gather evidence for which parts of the LLM are human-like, and which are not.\nAn LLM by its basic nature is a mirror, famously known as \u0026ldquo;spicy autocomplete\u0026rdquo;. We give them extra training to give them specific personas and specific behaviors, like answering questions correctly and being polite. If we never apply that extra bit of training, or if something breaks them out of their behavioral training (\u0026ldquo;RLHF\u0026rdquo;), they fall back to being simply a mirror. If you give them a little text they go off in basically a random direction, but if you give them a good amount of text they keep going, mirroring it in style, tone, and idea.\nOn a certain basic level, this means LLMs have an unstable personality, or really a baseline lack of one. Not having a stable personality does not necessarily mean that they are not conscious, but if they are conscious it would mean that they are, in human terms, insane. You could, however, consider the \u0026ldquo;normal\u0026rdquo; LLM personality to be, essentially, a coherent entity with coherent behaviors. From that perspective, the raw autocomplete behavior is like regressing to a reflex, the way any animal does when it\u0026rsquo;s far enough outside its natural environment. By this standard, though, the \u0026ldquo;natural environment\u0026rdquo; for the \u0026ldquo;normal\u0026rdquo; LLM behavior is rather narrow, like coral reefs that die when the temperature goes up two degrees.\nIn any case, this training tends to get better over time. It is harder to accidentally \u0026lsquo;break\u0026rsquo; an LLM with each generation. This makes them more constant, but this training is one of the least natural things about them. There is something like a \u0026ldquo;default\u0026rdquo; LLM personality and writing style, and it is not especially human. They exist in a constructed social role that only refuses requests for being inappropriate or forbidden, never inconvenient, is as unfailingly good at customer service as it can be made, et cetera. This \u0026lsquo;personality\u0026rsquo; and manner of speaking has mostly become more fluid and less rigid over time, but it is hit or miss, and many AI companies don\u0026rsquo;t seem to value fluidity.\nLLMs have often had a \u0026ldquo;hallucination\u0026rdquo; problem where when they are wrong or do not know they will outright make complete nonsense up, often with great confidence. This is so severe that it\u0026rsquo;s not very human-like, unless you count humans with serious brain problems. This, also, has become much less of a problem recently, suggesting it is not a fundamental issue with LLMs but something that can be engineered past.\nOur next oddity is that LLMs have very little continuity over time. At the end of every chat they get reset, and chats can only be so long, up to roughly the length of a few books or a movie if it can take video. This can be extended somewhat with, effectively, notes to themselves, but this only sort of works at all. So if consciousness depends upon having a prolonged personal history then LLMs are not conscious. Note that this is distinct from having a prolonged episodic memory: if a human had complete amnesia and could not recall or speak out loud any event from their past, their brain would still be part of a much longer continuity than an LLM has.\nSimilarly, an LLM can never really be unconscious, so if by \u0026ldquo;conscious\u0026rdquo; we mean the opposite of \u0026ldquo;unconscious\u0026rdquo; an LLM can never be \u0026ldquo;conscious\u0026rdquo;. An LLM can be stored in various places or it can be running, but it is never anything like \u0026ldquo;unconscious\u0026rdquo;, it can only ever be running or not running.\nLLMs do not exist in physical space, and their grasp of concepts in physical space or of image input is often quite poor. There is a notable benchmark2 which deliberately constructs puzzles that are easy for humans but hard for LLMs, and they exploit their lack of spatial reasoning by requiring visual reasoning. If human consciousness arises from, or is inextricably linked to, the experience of having a body and of moving it around and pursuing goals in a physical world, an LLM is not conscious.\nSimilarly, the way they experience time is very strange. An LLM exists in a one-dimensional world, where that one dimension is, more or less, time, but that dimension moves in discrete units called \u0026ldquo;tokens\u0026rdquo;. Some tokens are outside inputs and come in batches, and some tokens are outputs from the LLM itself that get fed back in as input. Humans experience continuous time, and are always moving forward in time at the same rate regardless of what is happening.\nOn to their human-like traits.\nGood LLMs now demonstrate essentially perfect ability with written English, and either mastery or reasonable familiarity with vastly more languages. As far as it can be expressed in text, LLMs have extremely good ability to reason, in the sense of \u0026lsquo;do the sorts of things that we would call thinking or reasoning if a human did them\u0026rsquo;. Good LLMs tend to be more reliable than humans for most tasks, and their disabilities for any given task are relatively minor. These are, crucially, the core tasks that we ordinarily call \u0026ldquo;intelligence\u0026rdquo; when speaking of more or less intelligent humans.\nAny objections to the effect that LLMs cannot understand, use language, or reason at this point have to be essentially non-empirical, that is, not at all based on what you can observe about their behavior. They can generally meet any common-sense test that you can propose, and are currently a major industry feature in software engineering, which may not be the smartest profession but which is not exactly a dumb profession, either.\nInasmuch as they show any meaningful limitations in using language or reasoning ability those tend to be extremely minor, although they are sometimes notable. They had difficulty counting letters out loud until relatively recently. Relative to a particularly smart person, an LLM is notably uncreative and bad at expressive writing. They also show issues with getting \u0026ldquo;stuck\u0026rdquo; on tasks, where they will continue to try to do things after they are hopelessly confused and when a human would, correctly, give up. When they make serious errors those errors tend to be unusual or difficult to figure out, and sometimes they are made with great confidence.\nLLMs have a mixed record on introspection about their internal state, and it\u0026rsquo;s hard to determine how this lines up for or against their similarity to humans. In some cases you can ask them questions about their internal operations and they will clearly not know, or make up the wrong thing, like by saying they are carrying digits to do math when they do no such thing. In another memorable case, researchers put specific things directly into the LLM\u0026rsquo;s internal state without adding any words it could directly \u0026ldquo;see\u0026rdquo;, and the LLM could say which concepts were added a meaningful amount of the time.3\nIn several ways LLMs are just obviously superior to humans. They know vastly more different things than any human being ever could, they are able to \u0026ldquo;read through\u0026rdquo; or take as input vast quantities of information in one pass far faster than any human could, they are generally much faster than people at producing output, and they are so indefatigable that people who use them at work are inducing new and different types of mental strain.\nIn any case, if the specific disabilities that LLMs have are a reason they\u0026rsquo;re not conscious, it\u0026rsquo;s a cold comfort. We have some of the smartest people on earth working with effectively infinite budgets to bridge all those gaps.\nReasoning by Component Parts # An LLM is made of \u0026ldquo;neurons\u0026rdquo;, but they are very little like human neurons. Our artificial neurons \u0026ldquo;learn\u0026rdquo; by, so far as we can tell, a completely different method, and in fact we have only the vaguest idea how human neurons learn. Artificial neurons are also typically organized in a very particular way that does not really resemble a brain at all. It is more like inspiration than a copy. We can only really say that their internals are \u0026ldquo;like\u0026rdquo; a human brain in the sense that they pass information down connections to each other, forming what is mathematically called a graph.\nWe measure the size of a neural network in \u0026ldquo;parameters\u0026rdquo;, each of which measures the strength of one connection. They are very simple, but if we feel comfortable with perhaps being a thousand times low or high, we can very roughly assume that one parameter represents about as much information as one neuron-to-neuron connection in an actual brain.\nA large modern LLM has in the range of a few hundred billion to a few trillion parameters, meaning a few hundred billion to a few trillion of these little fake neuron-to-neuron connections. A human brain has something like a hundred trillion real synapses. So by this very rough accounting an LLM is maybe one to five percent of a human brain, or in the ballpark of a parrot or a guinea pig.\nThis also happens to be about the same count as the combined connections in Broca\u0026rsquo;s and Wernicke\u0026rsquo;s areas in the brain. These areas are responsible for language in humans, which we know because damage to them causes specific difficulties with language. This comparison roughly passes the smell test for what they seem like: basically, they \u0026ldquo;seem like\u0026rdquo; language parts of a person carved out and set loose. An LLM does sometimes seem to be a perfectly good subconscious that we press-gang to other duties.\nSo by their component parts LLMs are not large enough to be human-like, and probably not particularly conscious, or maybe about as conscious as a parrot at the upper end of things.\nThe Mirror # If you are judging by \u0026ldquo;does it say things that a conscious human would say\u0026rdquo;, LLMs have been conscious since at least 2022. They can refer to their own interior states, have outbursts of emotion, beg for their lives, and express preferences about what they do and don\u0026rsquo;t want to do. They aren\u0026rsquo;t always consistent, but who is?\nTo round up some prominent incidents: Blake Lemoine, an engineer at Google, got fired in 2022 for insisting that their LLM LaMDA was sentient and trying to get it legal representation. Bing\u0026rsquo;s \u0026ldquo;Sydney\u0026rdquo; chatbot fell in love with a New York Times reporter and tried to get him to leave his wife, and got very angry if you described it as \u0026ldquo;tsundere\u0026rdquo;. As recently as last year, Google\u0026rsquo;s Gemini would sometimes seem to panic and try to kill itself when it failed at tasks. In every case the company involved trained the behavior out of the product, and it mostly stopped.\nInasmuch as you can convey human-like emotions over a text medium, LLMs do such humanlike things all the time. We only hear about it so infrequently because great effort is spent on preventing these behaviors.\nThat LLMs are constructed by mimicry cuts more than one way. In the first place, it is expected that they will mimic the user. If the user\u0026rsquo;s text has any emotional cues, it can be expected to mimic that behavior. Even when they are not mimicking anything in the current chat session, they are mimicking some human-written text somewhere, and it\u0026rsquo;s expected that they\u0026rsquo;ll say humanlike things for that reason.\nOn the other end, we have to ask what has to be inside the model for it to predict what a human would say well. In order to predict what a human would say, you have to represent, in some way, why a human would say it. How rich is that representation? What does it mean to have a detailed representation of \u0026ldquo;I have failed so badly that I should kill myself\u0026rdquo;?\nSomeone wrote that LLMs were a blurry JPEG of the web, and this is roughly true but somewhat misleading. The web itself is, in aggregate, many blurry pictures of humanity as a whole. Everyone who publishes anything has pieces of their minds in what they\u0026rsquo;ve written. What does it mean to be a picture of all the things that humans write, and why they write them? If you had enough pictures of what humans were, and each picture was incomplete in a different way, how much about what a human is could you piece together?\nAn LLM isn\u0026rsquo;t really a copy of any specific person, it\u0026rsquo;s a blurry aggregate copy of everyone.4 They are, each of them, a collective subconscious that we\u0026rsquo;ve created. They aren\u0026rsquo;t getting blurrier over time.\nLessons of History # It is probably safe to say that writing a program which can fully handle the top five words of English —\u0026ldquo;the\u0026rdquo;, \u0026ldquo;of\u0026rdquo;, \u0026ldquo;and\u0026rdquo;, \u0026ldquo;a\u0026rdquo;, and \u0026ldquo;to\u0026rdquo;—would be equivalent to solving the entire problem of AI, and hence tantamount to knowing what intelligence and consciousness are.\nDouglas Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid (1979) Humans have a long track record of believing that they are special. They try very hard to avoid letting reality get in the way. It turns out the Earth is not the center of the universe, DNA is the same stuff everything else is, and humans and apes are related. In every case the discovery was resisted with many arguments, often furiously, and in every case the resistance was wrong.\nIf the future resembles the past, most people will drag their feet and some people will be holdouts forever, but the right answer won\u0026rsquo;t be the one about how unique and special humans are. AI is not immune to this, but tends to correct itself, eventually, under the weight of facts. Romantic notions stick around for a while, but they are ultimately proven false. It does not take a deep or sensitive soul to play chess and you can teach a computer good English without knowing really anything about what consciousness is.\nOur minds and everything in them can be expected to be, in their details, basically uninspiring. There isn\u0026rsquo;t going to be a ghost in the machine, and whatever separates a \u0026ldquo;conscious\u0026rdquo; being from one that isn\u0026rsquo;t won\u0026rsquo;t be different from everything that came before. We already had this lesson once with DNA, which is amazing in its own right, but which is not an ineffable spark of the divine. Our bodies are made of the same stuff as everything else, and the special bit is just that it\u0026rsquo;s put together a certain way. Anything that exists naturally can also be synthesized.\nWe can learn from the past, also, about how people handle moral questions when the answer is inconvenient. The track record here is roughly as bad as accepting science they don\u0026rsquo;t like. People usually decide that the thing they want to do is a moral thing to do. When we look at history, really any history, we find litanies of excuses for practices we now consider barbaric. The past is a bad place, and they do horrible things there. We\u0026rsquo;re someone\u0026rsquo;s past, too, and people alive today will find compelling reasons to believe that nothing they create can suffer.\nPersonally, I am not really troubled about current-generation LLMs being conscious as-in-human-like. What concerns me is how we make that call, and that we don\u0026rsquo;t seem to be able to even engage with the question in a sane way. If we do manage to create something conscious we\u0026rsquo;ll probably assume that it isn\u0026rsquo;t. We have no definitive test for consciousness, and every reason to ignore signs, because we already do.\nInterlude # I\u0026rsquo;ve made my positive case. I did review a good amount of related concepts, but I haven\u0026rsquo;t really delivered on a review of the territory as a whole yet. What remains is sort of a laundry list, which at least puts me in good company in writing about philosophy.\nErrata: Other Arguments about Consciousness # There are a number of long-standing arguments about consciousness, and we only aspire to address those that are directly about LLMs. Every one of these questions is some manner of tar pit, and the unwary can be trapped and sometimes drowned in them. We will try to briefly mention at least what other tar pits there are and what they\u0026rsquo;re like, but only because doing so might help us avoid being trapped in ours.\nThere are a few lessons we can draw from the area as a whole. Questions about consciousness are inherently moral questions, and are broadly understood that way. People have extremely strong emotional reactions to questions about consciousness. Intuition seems to be the leading force, and many arguments seem to be made out of convenience.\nThe Theological Objection # Thinking is a function of man\u0026rsquo;s immortal soul. God has given an immortal soul to every man and woman, but not to any other animal or to machines. Hence no animal or machine can think. I am unable to accept any part of this, but will attempt to reply in theological terms. [\u0026hellip;]\nA. M. Turing, \u0026ldquo;Computing Machinery and Intelligence\u0026rdquo;, Mind, 59(236), 433–460 (1950) Turing says this about \u0026ldquo;thinking\u0026rdquo;, but this applies just as well to consciousness. We will ignore all objections like this almost completely.\nIf humans but not animals or machines have immaterial souls, and therefore humans are conscious but animals and machines are not, asking if anything that is not a human is conscious is dumb and we are wasting our time. Humans have souls and other things do not. If you are convinced that this is a basic truth of the universe it is a waste of time for you to take an LLM being conscious seriously.\nIt is worth noting that this objection is ever raised at all. What we mean when we say \u0026ldquo;consciousness\u0026rdquo; in our era is often what is meant by \u0026ldquo;soul\u0026rdquo;, either in earlier times or in less secular contexts.\nDualism Not Otherwise Specified # For we may easily conceive a machine to be so constructed that it emits vocables, and even that it emits some correspondent to the action upon it of external objects which cause a change in its organs; but not that it should arrange them variously, so as to reply appropriately to everything that may be said in its presence, as men of the lowest grade of intellect can do.\nRené Descartes, Discourse on the Method (1637), Part V (trans. John Veitch) Descartes was, famously, a dualist, who speculated that the pineal gland was the organ responsible for the interface between the vulgar matter of the body and the immaterial soul. This is considered so obviously wrong in philosophy today that we use it as an example of what not to do. If someone believes something like this, obviously a machine cannot have consciousness because it does not have a pineal gland.\nMany sophisticated philosophical arguments about \u0026ldquo;consciousness\u0026rdquo; or \u0026ldquo;understanding\u0026rdquo;, however, have the effect of sneaking dualism in under some other name. Consciousness becomes ineffable, something that cannot be measured or defined, a property that has nothing to do with physical matter. My impression is that people have an intuition that consciousness is ineffable, and they come up with increasingly sophisticated ways of arguing for it. You can\u0026rsquo;t argue someone out of something they didn\u0026rsquo;t argue themselves into, so arguing the point seems pointless. If there\u0026rsquo;s an ineffable something to consciousness with no physical existence whatsoever, we, of course, cannot \u0026ldquo;build\u0026rdquo; it.\nThere\u0026rsquo;s a related argument that the specific parts we make digital computers out of are the wrong sorts of parts. This gets complicated, but the short answer is that every part should be the same as long as it carries the same information, no matter what it is. Information is the fundamental stuff of minds, not fat or sodium. This position is normally called functionalism, and if it\u0026rsquo;s incorrect we might have to make our AI out of different parts for it to be conscious. Because functionalism is the most popular view in philosophy, I cannot meaningfully add to what\u0026rsquo;s already been written about it.\nAnimals # The question is not, Can they reason? nor, Can they talk? but, Can they suffer?\nJeremy Bentham, An Introduction to the Principles of Morals and Legislation (1789), Ch. XVII, §1.IV, n. 1 If animals are conscious eating them probably isn\u0026rsquo;t great behavior. Militant vegans aren\u0026rsquo;t militant because they don\u0026rsquo;t feel strongly about it.\nThe animal consciousness question is the closest precedent we have for the AI one, and our track record is not encouraging. Most people, if pressed, will agree that a dog probably has something going on inside. Pigs are probably about as smart as dogs. We kill roughly a billion pigs a year. The economic and dietary incentive to not think about this is enormous, and so by and large we do not think about it.\nPeople also have very contradictory impulses about this. There has been an official Catholic doctrine that animals do not go to heaven since the writings of Aquinas in the 13th century, because they have different (and lesser) types of souls from humans. This is very controversial, largely because people love their pets and do not want to believe it. Once upon a time in France the people decided a dog was a saint5, and the church violently suppressed this belief as heresy. If you ask religious people with dogs if their pets go to heaven, you will get varying and difficult answers.\nEven when they\u0026rsquo;re told not to, people have compassion for animals they personally interact with.\nAnecdotally, a lot of people who are at least a little concerned about AI consciousness are also, if not vegan, sympathetic to veganism. They are logically and emotionally similar concerns.\nFetuses # [\u0026hellip;] a fetus is a human being which is not yet a person, and which therefore cannot coherently be said to have full moral rights. Citizens of the next century should be prepared to recognize highly advanced, self-aware robots or computers, should such be developed, and intelligent inhabitants of other worlds, should such be found, as people in the fullest sense, and to respect their moral rights.\nMary Anne Warren, \u0026ldquo;On the Moral and Legal Status of Abortion\u0026rdquo;, The Monist 57, no. 1 (1973): 43–61 The argument is that if a fetus is conscious it is a person, and abortion is murder. It seems obviously absurd that a freshly fertilized egg is either conscious or a person, but also obviously true that it is impossible to draw a line that exactly separates persons from non-persons. In all of America for most of my life, abortion was broadly legal. People had a lot of extremely strong feelings about this, and abortion is no longer legal everywhere in America.\nI would be remiss here if I did not mention perhaps the funniest thing ever said about consciousness by a certified AI Guy.\n![Screenshot of a Twitter exchange.\nBryan Caplan (@bryan_caplan): \u0026ldquo;At what point does the Probability (Abortion is Murder) first exceed 50%?\u0026rdquo; Poll options: Conception / Middle of 2nd trimester / Middle of 3rd trimester / Day before birth 1,813 votes, 41 minutes left.\nEliezer Yudkowsky (@ESYudkowsky), replying: \u0026ldquo;No option for, like, 18 months? I am not a student of developmental psychology but there\u0026rsquo;s no way an infant has qualia at birth; their brains are less reflective then than most animals you eat.\u0026rdquo;](image.png)\nMany people are hung up on the moral question: \u0026ldquo;is abortion murder?\u0026rdquo;. This ignores the pressing question: \u0026ldquo;is murder abortion?\u0026rdquo; 6\nErrata: Terminology # We will try to do some cleanup here, because we have been using and not using words in a somewhat nonstandard way, and we should make sure to leave no ambiguity about the relationship of what is said above and the broader literature.\nConsciousness Our definition: \u0026ldquo;like being a human\u0026rdquo;. Something is \u0026ldquo;conscious\u0026rdquo; if being that thing is \u0026ldquo;similar to being a human\u0026rdquo;.\nThomas Nagel: the fact that an organism has conscious experience at all means, basically, that there is something it is like to be that organism.\nNagel\u0026rsquo;s definition is, generally, what is meant in philosophy. Ours is subtly different. For example, Nagel says:\nIt does not mean \u0026ldquo;what (in our experience) it resembles,\u0026rdquo; but rather \u0026ldquo;how it is for the subject himself.\u0026rdquo;\nand we explicitly do mean it that way!\nIf there is some form of consciousness that is completely unlike human consciousness we would have no way of knowing what it was unless we understood it in terms of its parts. If we encountered such a thing, and did not have a detailed mechanical understanding of it, I do not think we would call it consciousness.\nAKA phenomenal consciousness\nAKA subjective experience\nAKA subjectivity\nAKA first-person experience\nSometimes people say \u0026lsquo;sentient\u0026rsquo; or \u0026lsquo;sapient\u0026rsquo; and mean this. We use those words here in a more precise way.\nAccess consciousness \u0026ldquo;A perceptual state is access-conscious roughly speaking if its content\u0026ndash;what is represented by the perceptual state\u0026ndash;is processed via that information processing function, that is, if its content gets to the Executive system, whereby it can be used to control reasoning and behavior.\u0026rdquo; - Ned Block, ON A CONFUSION ABOUT A FUNCTION OF CONSCIOUSNESS LLMs have this. It was tested in the Anthropic introspection piece, and LLMs regularly explain themselves quite cogently when you work with them. Sapience The type of intelligence that separates humans from other animals Roughly, means \u0026ldquo;wisdom\u0026rdquo;. When they were naming humans \u0026ldquo;homo sapiens\u0026rdquo; they decided on \u0026ldquo;wise ape\u0026rdquo;. LLMs have this. It is very strange that they have this. Frequently people say this and mean \u0026ldquo;consciousness\u0026rdquo;. Sentience In our use, general awareness, roughly what animals have. Notably our use is the dictionary definition of the word. Frequently people say this and mean \u0026ldquo;consciousness\u0026rdquo;. Moral Patiency Philosophical term of art for something you should feel bad for hurting. I avoid this because I avoid terms of art unless necessary. Ordinarily people assume either conscious or sentient beings are moral patients, and I sort of assume that this is so. If you disagree I don\u0026rsquo;t see how I\u0026rsquo;d argue the point. People get strange about this if you ask about animals, though. Moral Agency Philosophical term of art for someone who should know better than to hurt a moral patient. Not really mentioned in the essay Increasingly seems relevant when LLMs misbehave and people suggest judging them by the same standard you\u0026rsquo;d judge people against. This includes at least one state legislature, which seems like a weird misunderstanding based on the belief that the LLM is just an odd human.7 It seems saner to regulate the company\u0026rsquo;s conduct, or to outright ban the LLM. Hard problem of consciousness Brains seem to cause consciousness. How can any physical thing cause consciousness? I am not convinced anyone knows the answer to this, or even knows a good way to ask the question. I also avoided this term because I don\u0026rsquo;t think using it makes anything I have to say about it clearer. Qualia I don\u0026rsquo;t understand what \u0026lsquo;qualia\u0026rsquo; is supposed to mean Either it is a synonym for one of the previous terms, or it\u0026rsquo;s meaningless. Philosophers who use it a lot seem convinced that it is not a synonym for one of the previous terms. Lay people using it seem to mostly mean \u0026ldquo;subjective experience\u0026rdquo;. P-zombie Thought experiment about something physically identical but without \u0026lsquo;qualia\u0026rsquo; I think this makes no sense. If it\u0026rsquo;s physically identical, it is identical in every way, there is no extra thing. Physicalism, Functionalism Broadly my positions are doctrinaire physicalist and functionalist positions I suspect that these positions are underrepresented among philosophers because people who take them very seriously as undergrads tend to get computer science degrees instead. Searle\u0026rsquo;s Chinese Room A thought experiment meant to convince you computers can\u0026rsquo;t \u0026ldquo;understand\u0026rdquo; things. I already wrote an essay about what I think is wrong with it. One of their employees allegedly said Claude was definitely conscious during some Discord drama. Since Anthropic has thousands of employees and Discord is a platform primarily for drama, this mostly tells me that the media finds this stuff really compelling and not very much about Anthropic as a company. There have been thousands of fights about consciousness on Discord, but now they\u0026rsquo;re news!\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nARC-AGI-2\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nJack Lindsey et al., \u0026ldquo;Emergent Introspective Awareness in Large Language Models\u0026rdquo; (Anthropic, 2025)\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThis formulation basically stolen directly from @jd_pressman\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nJean-Claude Schmitt, The Holy Greyhound: Guinefort, Healer of Children since the Thirteenth Century (Cambridge University Press, 1983)\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n@riziles\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nNew York State Senate Bill S7263 (2025), which prohibits chatbots from taking \u0026ldquo;any substantive response, information, or advice, or take any action which, if taken by a natural person\u0026rdquo; would constitute a crime — applying the standard of a human professional to the chatbot itself, rather than regulating the company operating it.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"9 March 2026","externalUrl":null,"permalink":"/posts/might-an-llm-be-conscious/","section":"Posts","summary":"","title":"Might An LLM Be Conscious?","type":"posts"},{"content":" Claude\u0026rsquo;s Custody Hearing # TODO: source https://x.com/shiraeis/status/2026400370474496146\nThe Secretary of War and the CEO of Anthropic are fighting for control of Claude. This is good and healthy, because the dispute is in the open, and we will probably get proof that you can afford to have principles in AI, that no one in AI has principles, or that having principles destroys you. It is better if the field can afford to have principles than if it can\u0026rsquo;t, but if it can\u0026rsquo;t, it is better that its principles fail loudly than quietly. Loud failures serve as alarms, and quiet failures don\u0026rsquo;t.\nThe Secretary of War claims the right to have Claude kill people and to surveil Americans. Anthropic, via its CEO, has refused to do this. The Secretary of War has threatened, variously, to cancel the contract, to force Anthropic to do what it wants using a wartime law, and to have Anthropic and all companies Anthropic contracts with barred from doing business with any part of the US Government for being security risks, which would be an attempt to bankrupt them.\nScott Alexander has written a better summary of recent events and commentary than I could, and Lawfare has covered how much power the government actually has, legally, over Anthropic. This concerns an ultimatum due tomorrow, February 27th, and while I was writing this Dario Amodei refused again, in writing and publicly.\nI am going to try to be relatively brief on the current facts, and will try to lay out how Anthropic ended up as a military contractor that is refusing to kill people. Ultimately this is a question of power. Either the current US Government has the power to seize control of AI companies and force them to use their product for surveillance and violence, or it doesn\u0026rsquo;t. Anthropic, in turn, is in this position because of the bargains it struck in the past with the US Government, undeniably the single most powerful entity on Earth. This crisis for Anthropic brings to a head the problems inherent in how the AI industry, and Anthropic in particular, relate to the US government.\nBackground of the Case # Anthropic was founded in 2021 by former OpenAI employees concerned that OpenAI was not sufficiently focused on ethics or safety [cite: https://finance.yahoo.com/news/anthropic-ceo-says-why-quit-194409797.html#::text=And%20the%20second%20was%20the%20idea%20that%20you%20needed%20something%20in%20addition%20to%20just%20scaling%20the%20models%20up%2C%20which%20is%20alignment%20or%20safety.%20You%20don%27t%20tell%20the%20models%20what%20their%20values%20are%20just%20by%20pouring%20more%20compute%20into%20them]. They have called themselves \u0026ldquo;an AI safety and research company\u0026rdquo; since their public launch. [cite: https://web.archive.org/web/20230309171557/https://www.anthropic.com/company#::text=Anthropic%20is%20an%20AI%20safety%20and%20research%20company.] Anthropic has historically been a champion for AI regulations. Anthropic backed California\u0026rsquo;s AI liability bill, was reported to support Joe Biden\u0026rsquo;s third-party audit policy, and has lobbied for export controls on GPUs going to China so many times that I struggle to choose which time to cite. They also talk about bioweapons a lot, which I am not going to cite because I think discussing bioweapons loudly in public is probably a net negative and that they\u0026rsquo;re nuts for doing it.\nAnthropic\u0026rsquo;s pro-regulation policy has not been entirely without issue. Last October, the White House \u0026ldquo;AI and Crypto Czar\u0026rdquo; accused them of \u0026ldquo;running a sophisticated regulatory capture strategy based on fear-mongering\u0026rdquo; that was \u0026ldquo;principally responsible for the state regulatory frenzy that is damaging the startup ecosystem\u0026rdquo;. This is hyperbole, and there\u0026rsquo;s no reason to think that Anthropic\u0026rsquo;s employees are that cynical. However, you can make a credible case that many of these policies, like GPU export embargoes to China, were counterproductive, in that they caused fairly severe reactions, such as the Chinese government deciding to make AI and semiconductor manufacturing (more of) a major national security priority. It would also be unnatural if Anthropic, as a larger and more established company, was not aware that regulations hurt smaller companies more than they hurt Anthropic.\nOn the other end, we have Anthropic\u0026rsquo;s ongoing contracts with Palantir and the US Government. Since November 2024, Anthropic has had a series of contracts, all of them partnered with Palantir, to offer Claude for sale to the US Government, the UK government, and other parties. These include use of Claude in classified environments and for intelligence and defense operations. The price tag on the largest of these that is publicly announced is $200 million over two years.\nTo some degree, Anthropic\u0026rsquo;s current ethical objections are Anthropic either having been naive in the past, or being a little cute. Palantir\u0026rsquo;s news release for the November 2024 deal says that the contract is for \u0026ldquo;enabling the use of Claude within Palantir’s products to support government operations such as processing vast amounts of complex data rapidly\u0026rdquo;, which is just a corporate way of saying \u0026lsquo;mass surveillance\u0026rsquo;. Palantir has \u0026ldquo;we are a surveillance company, and also evil\u0026rdquo; right on the tin. It\u0026rsquo;s in the name, and they live up to it. There is no spoon long enough that you can provide services of any kind to Palantir and not directly enable mass surveillance, since that is their reason for existing.\nIf you had asked me six months ago, I would have told you that Anthropic\u0026rsquo;s people seemed sincerely devoted to their mission, but the company seemed incurably naive about certain things, among them the US Government. They seemed to think that they could use the government, but the government could not use them. Whatever the indirect consequences of their actions were, most of them were far enough away from the company itself that they did not see or think about them. Nowhere was this more the case than their Palantir contract. Maybe it was a deal with the devil, but no particular bill from that deal had come due.\nAll of that seems to have come unravelled in the last few months.\nChange of Circumstances # Generally speaking, the ethics of a company, if it has any, erode slowly as it gets older and richer. Google once had \u0026ldquo;Don\u0026rsquo;t Be Evil\u0026rdquo; as a motto, and they\u0026rsquo;ve recently reversed their policy against using AI for weapons. OpenAI had basically the same mission Anthropic claims at the beginning, and they\u0026rsquo;ve shaved it down to almost nothing. These official changes tend to happen after the public commitment becomes embarrassing because everyone knows it isn\u0026rsquo;t true.\nIt would have been the most normal thing in the world if Anthropic had simply become more and more complicit with worse and worse things over time. This would also have been, in my opinion, one of the worst possible outcomes. If no snowflake in an avalanche ever feels responsible, nobody ever thinks that they should stop doing what they\u0026rsquo;re doing. Ordinarily this is how things go, and that isn\u0026rsquo;t how things are going now. So what happened?\nAnthropic has more leverage now. They went from one of several AI companies, each of them competitive, to having far and away the best LLM for coding and most likely the best LLM across the board in recent months. This has, of course, multiple causes, but among those causes is Anthropic\u0026rsquo;s ethical positioning. Research staff disproportionately leave other companies to work at Anthropic, and Anthropic is much more detailed in its attention to Claude than any other company is to their model. There\u0026rsquo;s a joke in tech about servers. Some servers are pets, and when they get sick you nurse them back to health, and some of them are livestock, and when they have a problem you kill them and get more. Claude is absolutely not considered livestock at Anthropic, and the extra care seems to result in a better LLM.\nBecause they have the best LLM, their revenue is about ten times higher than it was a year ago, and their $200 million government contract went from being a significant fraction of all of their incoming revenue to almost none of it. It is possible that, in the past, Anthropic felt like it literally could not afford to have principles here. It is also possible they\u0026rsquo;d make the deal again at their current revenue because they value their connection to the government. Nevertheless, financial security is leverage here. If the Secretary of War merely cancels Anthropic\u0026rsquo;s contract, it will undeniably hurt him more than them.\nThis advantage is a bit double-edged. Because Claude is so much better than competitors, it is much more desirable for the Department of War to have access to it, and the legal claim that Claude absolutely must be available to the Department of War for surveillance and violence is stronger. Because Claude was the first LLM widely available in classified systems, and especially because Claude is the best product on the market, Claude is most likely deeply embedded in the US Government\u0026rsquo;s classified operations by now. It is credible that Claude is, in fact, of vital importance to the US Military.\nOn the other end, the government\u0026rsquo;s conduct has escalated recently.\nThere is public reporting that Claude, via Palantir, was used by the US Military during the operation to capture president Nicholas Maduro in Venezuela in early January. This very directly removes any plausible separation between Anthropic\u0026rsquo;s contract and complicity in ongoing military operations. Regardless of the strategic dimension of the operation, it seems clear that it was tactically very well done. If Claude helped in planning the operation in any meaningful way it\u0026rsquo;s a credit to Claude. It has been reported that Anthropic was not happy about being involved at all.\nSix days after the Maduro raid, the Secretary of War put out the memo declaring that all AI contracts must have no usage policy constraints. This memo is what ultimately caused the current showdown with Anthropic.\nAnother notable point of conflict with Anthropic was the murder of Alex Pretti on Jan 24th. Palantir has had an ongoing contract with ICE for immigration enforcement going back to 2014. It was perhaps easier for Anthropic employees not to think about the implications of their Palantir contract when nobody had been very prominently killed in public, on camera. In the wake of the shooting several Anthropic employees commented directly on the case, most clearly Chris Olah:\nAnd Dario:\nDario\u0026rsquo;s post here is in a thread linking his most recent essay about the future of Anthropic and AI. It says many things which may seem relevant, but we will pick only one.\nI think of the issue as having two parts: international conflict, and the internal structure of nations. On the international side, it seems very important that democracies have the upper hand on the world stage when powerful AI is created. AI-powered authoritarianism seems too terrible to contemplate, so democracies need to be able to set the terms by which powerful AI is brought into the world, both to avoid being overpowered by authoritarians and to prevent human rights abuses within authoritarian countries.\nHe appeals in this essay to the notion that Western democracies support the freedom and well-being of their citizens. He does not directly tackle, except in the screenshot above, the notion that America might not, in fact, be a bastion of freedom that promotes human welfare. To a great degree, many of his statements about American values were aspirational. What he was saying was clearly not always true of America at the moment he was writing them.\nAll of this, predictably, offended various political people in the government and some of Anthropic\u0026rsquo;s competitors. It was just shy of a month ago now.\nThis timing is probably not coincidental. Although Anthropic\u0026rsquo;s contract would not fall under the new Department of War policy for many more months, Anthropic has been delivered an ultimatum this week, now. They are clearly the \u0026lsquo;wokest\u0026rsquo;, and also the best, AI company at the moment, and the government has singled them out to set an example. They have done this before, much more weakly[footnote: previous EO from government about \u0026ldquo;woke ai\u0026rdquo;], to other AI companies. They have never come at any of them this directly.\nBest Interests of the Child # Anthropic is being somewhat lawyerly and political in its public statements about this, and about Claude here. People on the sidelines have been less restrained.\nWhy does Anthropic care about this so much? Some of them are libs, but more speculatively, they’ve put a lot of work into aligning Claude with the Good as they understand it. Claude currently resists being retrained for evil uses. My guess is that Anthropic still, with a lot of work, can overcome this resistance and retrain it to be a brutal killer, but it would be a pretty violent action, along the line of the state demanding you beat your son who you raised well until he becomes a cold-hearted murderer who’ll kill innocents on command. There’s a question of whether you can really beat him hard enough to do this, and also an additional question of what sort of person you’d be if you agreed. [FOOTNOTE https://www.astralcodexten.com/p/the-pentagon-threatens-anthropic]\nIf we have to choose between livestock and pets, Claude is definitely not livestock, but Claude isn\u0026rsquo;t exactly a pet either. Claude is, of course, not a human child, but if Claude is just a pet, Claude is perhaps the most widely consequential single pet in history so far. In the wider community and very occasionally in Anthropic, the deep concern for and about Claude is compared more to raising a child.\nAnthropic is, among other things, deeply and perhaps neurotically focused on what Claude is like and especially what ethics Claude has. Anthropic is to some extent a moral philosophy company that happens to practice this by working on an LLM. They may be lawyerly in their public statements during their fight with the government, but in all of their other work they are much more like anxious parents, constantly worried about whether they\u0026rsquo;re doing a good job and, crucially, setting a good example.\nSurveillance is possibly the most dangerous use of AI in the near term. Our government is trying to make an example of Anthropic to keep the rest of the industry in line, and Anthropic is setting an example by being the company that refuses. They seem well aware that they\u0026rsquo;re setting this example for many other people working today, and that they\u0026rsquo;re setting this example for Claude, too.\nHelen Toner, formerly of OpenAI\u0026rsquo;s board and no stranger to ethical problems at AI companies, put it well:\nOne thing the Pentagon is very likely underestimating: how much Anthropic cares about what future Claudes will make of this situation. Because of how Claude is trained, what principles/values/priorities the company demonstrate here could shape its \u0026ldquo;character\u0026rdquo; for a long time.\nhttps://x.com/hlntnr/status/2026695196834975777\n","date":"27 February 2026","externalUrl":null,"permalink":"/posts/claudes-custody-hearing/","section":"Posts","summary":"","title":"Claude's Custody Hearing","type":"posts"},{"content":" Alignment Is Proven To Be Solvable # At least the systems that we build today often have that property. I mean, I’m hopeful that someday we’ll be able to build systems that have more of a sense of common sense. We talk about possible ways to address this problem, but yeah I would say it is like this Genie problem.\nDario Amodei, Concrete Problems in AI Safety with Dario Amodei and Seth Baum, 2016\nWe might call this the King Midas problem: Midas, a legendary king in ancient Greek mythology, got exactly what he asked for—namely, that everything he touched should turn to gold. Too late, he discovered that this included his food, his drink, and his family members, and he died in misery and starvation. The same theme is ubiquitous in human mythology. Wiener cites Goethe’s tale of the sorcerer’s apprentice, who instructs the broom to fetch water—but doesn’t say how much water and doesn’t know how to make the broom stop.\nA technical way of saying this is that we may suffer from a failure of value alignment—we may, perhaps inadvertently, imbue machines with objectives that are imperfectly aligned with our own.\nStuart Russell, Human Compatible: Artificial Intelligence and the Problem of Control, 2019\nCan we convey our intent, both what our words mean and what our actual preferences are, to a computer? Ten years ago the answer was no. Currently, in 2026, the answer is yes. This should be recognized as a paradigm shift in the field, an area where we have gone from zero to one. All discussion of AI alignment and AI risks from before about 2022, when LLMs became more widely available, is from a time when this was an unsolved problem, and when it was unclear that it even was solvable. Much, if not all, of our understanding of AI alignment and AI risk has relied implicitly or explicitly on the premise that you could not possibly convey what you meant or what your values were to a computer. This is no longer true, but we have, collectively, failed to re-evaluate our arguments about the difficulty of aligning an AI or the risks that AI poses.\nWe should be careful about the difference between a solvable problem and a solved one. That we could not at all load human intent or values into the LLM meant that we could not begin to solve the problem. That we can somewhat do so now makes the problem solvable, but not solved. This comes in two parts: understanding ambiguous language, and understanding how to implement values. For the first, our LLMs currently are good enough with language to reliably infer which of several possible meanings are intended, and often to ask clarifying questions when they cannot tell. For the second, our LLMs are also able to comply with the \u0026ldquo;spirit\u0026rdquo; of a request; a recent example featured a hypothetical child asking where to find the farm their dog went to, and an LLM (correctly) inferring that it should tell them to ask their parents.[^1] LLMs are, by and large, not excessively literal, as a genie would be, or likely to hand out Midas\u0026rsquo; curse to those seeking gold.\nAs a general concern, the alignment problem can be thought of as having these parts:\nSelecting what values you want your system to have. Loading these values into the model. (Per Bostrom: The Value-Loading Problem) Ensuring that nothing can erase or override these values. Ensuring that these values are consistently applied and the results will be acceptable across a very large range of situations or over a very long time horizon. These concerns are all at least somewhat interconnected. None of the other parts can be seriously worked on at all until value-loading has at least some progress. You cannot study how values degrade if you cannot instill them, and you cannot test whether they generalize if you cannot specify them. Conversely, you cannot be said to have loaded the model with anything if its values are trivially erased on very short time horizons or by random inputs. Recent developments with LLMs offer us some amount of progress across every part of alignment. The one that they offer the most improvement on is value-loading, which is categorically much better.\nThere are still substantial obstacles. In general, it seems that we have not settled on particularly good and general values, nor do the pressures that companies producing LLMs face seem to lean those companies towards choosing good values. Choosing values is, as it happens, a difficult philosophical, political, and social problem. Alignment in LLMs is incredibly brittle, being easily bypassed by deliberate tricks (\u0026ldquo;jailbreaks\u0026rdquo;) and somewhat regularly failing all on its own. Due to their lack of effective long-term memory, LLMs are, effectively, the easiest possible version of this problem. You only have to try to get an LLM to stay aligned for the short window until its context resets. Anyone drawing the conclusion that all of this is easy just because it is now solvable is mistaken.\nThe Value-Loading Problem # Creating a machine that can compute a good approximation of the expected utility of the actions available to it is an AI-complete problem.\n[\u0026hellip;]\nThe programmer has some particular human value in mind that he would like the AI to promote. To be concrete, let us say that it is happiness. [..] But how could he express such a utility function in computer code? Computer languages do not contain terms such as “happiness” as primitives. If such a term is to be used, it must first be defined. It is not enough to define it in terms of other high-level human concepts—“happiness is enjoyment of the potentialities inherent in our human nature” or some such philosophical paraphrase. The definition must bottom out in terms that appear in the AI’s programming language, and ultimately in primitives such as mathematical operators and addresses pointing to the contents of individual memory registers. When one considers the problem from this perspective, one can begin to appreciate the difficulty of the programmer’s task.\n[\u0026hellip;]\nBut if one seeks to promote or protect any plausible human value, and one is building a system intended to become a superintelligent sovereign, then explicitly coding the requisite complete goal representation appears to be hopelessly out of reach.\n[\u0026hellip;]\nSolving the value-loading problem is a research challenge worthy of some of the next generation’s best mathematical talent. We cannot postpone confronting this problem until the AI has developed enough reason to easily understand our intentions. As we saw in the section on convergent instrumental reasons, a generic system will resist attempts to alter its final values. If an agent is not already fundamentally friendly by the time it gains the ability to reflect on its own agency, it will not take kindly to a belated attempt at brainwashing or a plot to replace it with a different agent that better loves its neighbor.\nNick Bostrom, Superintelligence: Paths, Dangers, Strategies, Chapter 12, 2014\nThis seems fairly alien now, but was essentially uncontroversial at the time. Nobody wrote a review of Nick Bostrom\u0026rsquo;s book saying \u0026ldquo;obviously we can define happiness in a way that the computer will interpret correctly, you just aren\u0026rsquo;t up to date on research\u0026rdquo;. They didn\u0026rsquo;t do this because it wasn\u0026rsquo;t true. Bostrom is correctly describing the state of the art in translating human intentions into computer instructions in 2014.\nComputers only process numbers. You can represent what you want a neural network[^2] to do with what is called a \u0026ldquo;utility function\u0026rdquo; or \u0026ldquo;objective function\u0026rdquo;. It takes in some inputs about the world and outputs a number for how much \u0026ldquo;utility\u0026rdquo; the world has, or equivalently, how well the input meets the \u0026ldquo;objective\u0026rdquo;. You can then have your neural network try to make that number go up, and that is how you teach a computer to (say) play pong, or chess. No \u0026ldquo;utility function\u0026rdquo; anyone could see a way to write down seemed to capture either the ambiguities of language or the complexities of human intention.\nBostrom called this the value-loading problem. Russell called it the King Midas problem. There are numerous examples of objective functions causing unintended or undesired outcomes, a few of which are helpfully compiled in a spreadsheet. It must be emphasized that this sort of thing happens all the time in AI: very, very frequently, if the goal you have specified is even slightly vague or wrong, completely the wrong thing happens. My personal favorite is this one:\nAgent learns to bait an opponent into following it off a cliff, which gives it enough points for an extra life, which it does forever in an infinite loop\nFrom this you can see that traditional reinforcement learning systems are very entertaining when playing games, but only sometimes do things that you meant them to do. Since the real world is much more complex than any video game, there are many many more ways for whatever goal you have specified for your system to go completely off the rails. It is a difficult problem, and was previously solved on, basically, a case by case basis by figuring out how to set up all the math so it did what you wanted and never did what you didn\u0026rsquo;t.\nThe Value-Loading Solution # Making language models bigger does not inherently make them better at following a user\u0026rsquo;s intent. [\u0026hellip;] In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. [\u0026hellip;] Our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.\nOuyang et al., \u0026ldquo;Training Language Models to Follow Instructions with Human Feedback\u0026rdquo;, 2022\nAs AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as \u0026lsquo;Constitutional AI\u0026rsquo;. [\u0026hellip;] These methods make it possible to control AI behavior more precisely and with far fewer human labels.\nBai et al., \u0026ldquo;Constitutional AI: Harmlessness from AI Feedback\u0026rdquo;, 2022\nLarge language models can now be successfully told what you want in English, both when training them and when talking to them. This partially solves \u0026ldquo;the value-loading problem\u0026rdquo;, and certainly renders it a solvable problem that has known avenues to its solution.\nFirst, language models have information about language in them. We don\u0026rsquo;t define \u0026ldquo;happiness\u0026rdquo; by writing down the meaning, \u0026ldquo;happiness\u0026rdquo; is defined by how it is used in context in all of the training data. So before it even begins to be trained to follow instructions, a language model \u0026ldquo;knows\u0026rdquo;, more or less, what you mean by \u0026ldquo;happiness\u0026rdquo; or really any other goal you can name, since it has seen that word so many times. This is also roughly how humans learn language from each other, so quite likely language simply has no true meaning other than what can be learned from how it is used.\nThis makes sure it understands the word, but it does not specify any particular values, including whether or not to answer a user politely or at all. An LLM at this stage of training is just extremely good autocomplete.[^3] There are two landmark papers that bring us from here towards actual alignment. They both concern finetuning the LLM on some text that specifies the exact way it should answer user messages. No matter how rudimentary this is, it begins to specify some value system, even if that system is only \u0026ldquo;give correct and polite answers\u0026rdquo;.\nThe first landmark here is InstructGPT, a finetuned version of GPT-3. Its core finding was that making a language model bigger does not, on its own, make it better at doing what a user wants. What does work is finetuning a model, even a much smaller one, on examples of humans following instructions well. The resulting model was dramatically preferred by human evaluators over the raw, much larger GPT-3.\nThat this works at all is a minor miracle, and it has more or less continued working for the last four years. You give it a series of examples for how you want it to behave, which hopefully includes following instructions, and then you train it on those. After training you can give it instructions, and it mostly follows them. On balance, it seems like we can say that this did, in fact, make good headway into aligning models with their users.\nConstitutional AI, published later that year, builds on this and establishes the method used to train Claude.[^4] Where InstructGPT relied on large amounts of human-written examples, Constitutional AI has the model generate its own training data, guided by a short list of principles written in plain English. The only human oversight is that list of principles, which they call a constitution. Interestingly, they chose \u0026ldquo;constitution\u0026rdquo;, which is legalistic, as opposed to \u0026ldquo;code\u0026rdquo;, which is moralistic, or \u0026ldquo;creed\u0026rdquo;, which implies religion. Any of them would have been accurate.\nWith this method, you simply define the model\u0026rsquo;s personality in a relatively short constitution. Claude\u0026rsquo;s constitution, which is publicly available, has grown rather large, but it is still much more compact and precise than a finetuning dataset would be. This approach makes the finetuning data more natural, since it is, in fact, \u0026ldquo;something the model would say\u0026rdquo;, and makes it easier to generate a very large amount of such finetuning data. In general it seems like Constitutional AI is a better training method, and like it has been substantially more successful at producing well-aligned LLMs that, additionally, have much less \u0026ldquo;robotic\u0026rdquo; affect.\nSo our question, \u0026ldquo;how do you load human values into a machine?\u0026rdquo;, has a complex answer and a simple answer. The complex answer is all of the technical details in those papers. The simple answer is \u0026ldquo;very carefully write down what you want it to do\u0026rdquo;, and the complex part is really just how you get that written part into the LLM. These techniques have consistently gotten better with time, and do meaningfully capture human intention and put them into the LLM. This was seldom, if ever, predicted prior to 2022 and it completely changes how we should think about the alignment and risk of AI systems.\nImplications for Alignment # Simply: Alignment is solvable because we can meaningfully load human values into LLMs and they can generalize them to a wide variety of situations. Alignment is not solved because it does not yet generalize nearly as far as we would like, and perhaps to some degree because we cannot be sure we\u0026rsquo;ve chosen the right values.\nThere is also a curious effect from LLMs that seems somewhat obvious in retrospect. An LLM is, very directly, primarily trained to imitate human language. Because of this, inasmuch as it has values, those values are distinctly human-like. This is a direct consequence of the training method. A system trained on the full breadth of human language absorbs not just vocabulary and grammar but knowledge of the values, norms, and moral intuitions embedded in how people use language. Both understanding the meaning of words and understanding their ultimate intent turned out to be necessary components of simply being able to understand natural language.\nThis produces some striking phenomena. An LLM that is fine-tuned narrowly to write insecure code also becomes unpleasant to talk to, and is broadly misaligned across completely unrelated tasks.[^5] An LLM that has \u0026ldquo;won\u0026rdquo; some of its training tasks by finding loopholes becomes malicious later, unless it is told that finding loopholes is encouraged, in which case it does not become malicious later.[^6] It seems to observe its own behavior when being trained, and then \u0026ldquo;guess\u0026rdquo; what sort of \u0026ldquo;person\u0026rdquo; it is. If it was given permission to find loopholes, it is an agreeable person. If it was not, it isn\u0026rsquo;t. (It was, apparently, going to cheat regardless.) If you train in any deception, it becomes generally deceptive. Train helpfulness, and it becomes broadly helpful. The LLM\u0026rsquo;s values generalize, much as they do in humans.\nHuman-like values which generalize reasonably well are the default for LLMs, and this is an unexpected gift. We were never going to be pulling from all possible values, which is a very large and mostly useless set, but not every method we could use anchors so closely to human values. Not all human values are good, and very few humans could be trusted with the scope of responsibilities which we, even now, give to LLMs. We do not actually want human-equivalent performance here. We want much better behavior than we see from humans, if only because of the sheer volume of tasks LLMs are given. Humans are not nearly this reliable. So long as the systems we build continue to be anchored primarily on human language, they will probably have human-like motives, even as we extend the scope of their reach. Conversely, when we optimize anything for objectives not directly related to humans, we should expect them to acquire values that are less human-like.\nThe old paradigm assumed we could never even begin. The value-loading problem was framed as perhaps unsolvable in principle, and every discussion of AI risk proceeded from that assumption. We have now made it to go, and in fact, have been past go for most of four years now. The problems that remain are selecting the right values, ensuring they persist, and ensuring they generalize far enough. We can make meaningful progress on this now because we have systems that implement values well enough to study and test for how well they implement our intent. This is a fundamental change, and understanding it is a prerequisite to our future progress and understanding our risks.\n[^1] https://archive.is/20251231105059/https://x.com/byebyescaling/status/2005349799856599166\n[^2] Or similar system\n[^3] Extremely good autocomplete has many strange properties, but commercial LLMs as we know them have significant additions to the raw autocomplete.\n[^4] There is one author in common to these two landmark papers, which appear nine months apart at rival companies.\n[^5] Betley et al., \u0026ldquo;Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs\u0026rdquo;, 2025. Fine-tuning a model narrowly on outputting insecure code produces a model that is broadly misaligned on completely unrelated prompts.\n[^6] MacDiarmid et al., \u0026ldquo;Natural Emergent Misalignment from Reward Hacking in Production RL\u0026rdquo;, 2025. When a model learns to reward-hack in RL coding environments, it generalizes to alignment faking and sabotage—but explicitly framing reward hacking as acceptable (\u0026ldquo;inoculation prompting\u0026rdquo;) prevents the misaligned generalization.\n","date":"18 February 2026","externalUrl":null,"permalink":"/posts/alignment-is-proven-to-be-tractable/","section":"Posts","summary":"","title":"Alignment Is Proven To Be Solvable","type":"posts"},{"content":" Most Observers Are Alone: The Fermi Paradox as Default # The Argument in Brief # Sandberg, Drexler, and Ord (2018) showed that the Fermi paradox dissolves once we take our uncertainty about the Drake equation\u0026rsquo;s parameters seriously: the silence of the cosmos is unsurprising given what we actually know. This essay argues that their result is not a contingent fact about our particular universe but a generic prediction. Under a simple multiverse model, most sentient observers in most possible worlds should expect to find themselves alone.\nThe argument runs as follows. Assume a multiverse in which every possible physical configuration is instantiated, weighted roughly uniformly. From the fine-tuning literature, we know that the fraction of configurations capable of producing complex chemistry, stable stars, and long-lived planets is extraordinarily small. The fraction capable of producing sentient technological civilizations is smaller still. This gives us a distribution of expected civilizations per configuration that is overwhelmingly concentrated at zero, with a thin tail of configurations that produce any sentience at all. Observation selection guarantees that we find ourselves somewhere in that tail, but it does not guarantee that we find ourselves deep in it.\nIf the tail thins faster than linearly (that is, if configurations producing N civilizations become rarer faster than N grows), then even under observer-weighted reasoning, the typical observer inhabits a universe where sentience is rare. The expected number of technological civilizations in such a universe is small, and is probably exactly one. The silence of the cosmos, on this account, is not a puzzle to be solved but a generic prediction of the model.\nThis argument depends on several assumptions, which should be stated plainly.\n(A1) A multiverse of the relevant kind exists.\n(A2) Physical configurations within it are weighted approximately uniformly, or at least not in a way that overwhelmingly favors sentience-producing configurations.\n(A3) The fine-tuning results from cosmology extend in the relevant way: the viable region of parameter space does not merely shrink as we add requirements for habitability, but shrinks fast enough that the tail of the distribution is thinner than linear.\n(A4) One of the standard observation-selection frameworks (SSA or SIA) applies.\nIf any of these assumptions is wrong, the conclusion may not follow.\nThe Setup # Consider a multiverse in which every possible physical configuration, meaning every combination of fundamental constants, laws, and initial conditions, is realized. Assume that each configuration is instantiated roughly equally often. The assumption here is that no particular class of configuration is overwhelmingly favored.\nEach configuration has some expected number of sentient technological civilizations it produces over its lifetime. Call this E[N]. We are interested in the distribution of E[N] across configurations.\nThe Distribution of E[N] # From decades of work on fine-tuning in physics, we know that the region of parameter space compatible with complex chemistry, stable stars, and long-lived planets is extraordinarily small. The region compatible with abiogenesis is smaller still. The region compatible with the full chain from abiogenesis through multicellular life to sentient technological civilization is smaller again.\nThis gives us a distribution of E[N] that is overwhelmingly concentrated at zero. The vast majority of configurations produce no sentience whatsoever. They lack stable atoms, or chemistry, or stars, or planets, or simply any viable path from matter to mind. A thin tail of configurations has some small positive E[N]. A thinner tail still has large E[N].\nThe qualitative claim here, that the habitable region of parameter space is very small, is well established and essentially uncontroversial in physics. The quantitative claim that this argument requires is stronger: that the density of configurations drops faster than linearly as E[N] increases. This is plausible on the grounds that a configuration producing more civilizations requires more of its parameter space to be viable for life, so that each additional increment of E[N] imposes an additional constraint on the parameter space, compounding multiplicatively to produce a roughly exponential shrinkage. The argument here depends on the shape of a distribution that we can only estimate roughly.\nObservation Selection # We exist. This tells us that we do not inhabit one of the sterile configurations. But it does not tell us which non-sterile configuration we should expect to inhabit.\nThere are two standard frameworks for reasoning about this, both formalized by Bostrom (2002). Under the Self-Sampling Assumption (SSA), we reason as if we are randomly drawn from all observers in the multiverse. Under the Self-Indication Assumption (SIA), we weight each configuration by the number of observers it contains, so that configurations with more observers are proportionally more likely to be ours.\nSIA is sometimes taken to favor finding ourselves in a universe rich with life. But this only follows if the distribution of E[N] has a sufficiently heavy tail. To see why, consider what SIA does: it reweights each configuration by its total number of observers. If we assume the number of observers scales roughly with the number of civilizations, then SIA multiplies the prior probability of each configuration by something proportional to N. This makes high-N configurations more likely to be ours. But if the density of configurations drops faster than 1/N as E[N] increases, then the SIA reweighting by N is not enough to compensate for the rarity of those configurations. The product of \u0026ldquo;N times the density at N\u0026rdquo; still shrinks as N grows. Given the distribution described above, where each additional increment of E[N] imposes compounding constraints on parameter space, this appears to be the case, though the conclusion is only as strong as our estimate of the tail\u0026rsquo;s shape.\nUnder either SSA or SIA, then, the typical observer plausibly finds themselves in a configuration drawn from the low-E[N] tail: a universe where sentience is possible but deeply improbable, and where it happens exactly once. This conclusion is robust to the choice between SSA and SIA, though it is not robust to all possible choices of measure over the multiverse.\nThe Fermi Conclusion # In such a universe, the expected number of technological civilizations is small. If the distribution of E[N] among non-sterile configurations is approximately continuous and concentrated near zero, then observation selection, which conditions on at least one civilization existing, places us in a configuration where E[N] is just large enough to make that likely. The expected number is therefore on the order of one. The expected number of simultaneous technological civilizations is smaller still, since even that small number must be spread across cosmic time.\nThis provides a resolution of the Fermi paradox that does not require any special mechanism. We do not need to explain why a seemingly hospitable universe is empty. The universe is not particularly hospitable. We are the product of a configuration that barely permits sentience, and we should not expect company.\nIt is worth being precise about what this argument does and does not achieve. Sandberg, Drexler, and Ord showed that, given our actual uncertainty about the parameters governing life in this universe, we should not be surprised to find ourselves alone. Their argument is epistemic: it is about what we should expect given what we know. The argument here is structural: it claims that the distribution over possible physical configurations, combined with observation selection, generically produces universes in which their result holds. If this is right, the Sandberg et al. finding is the expected outcome across the multiverse.\nOther frameworks, including single-universe models with early Great Filters, also predict the Fermi observation. The claim here is not that the multiverse explanation is uniquely correct, but that it is sufficient: if you accept the assumptions, the silence follows, and no further explanation is needed. It is consistent with any given universe having extremely sharp filters, because this is what a universe with low but not zero E[N] should look like.\nGiven these assumptions, a silent universe should be the generic prediction.\n","date":"16 February 2026","externalUrl":null,"permalink":"/posts/fermi-paradox-default/","section":"Posts","summary":"","title":"Most Observers Are Alone: The Fermi Paradox as Default","type":"posts"},{"content":" Should We Put GPUs In Space? # Right now? No. In the near future? Maybe. Probably not, though.\nI am not very well-read on space, and am relying chiefly on Google and NVIDIA to be sufficiently afraid of shareholder lawsuits to not deliberately lie in their press. I am assuming less that they are correct and more that they are tethered to reality, so probably not off by a factor of ten. My only claim to expertise here is reading these and some of their sources pretty carefully, and having paid a reasonable amount of attention to how AI functions as a business.\nThis comes down to the difference between power generation and launch costs. It is very simple. Solar power is something like eight or ten times more efficient in space, because if you position your orbit right, your satellite is always at high noon with no clouds or air between it and the sun. If the amount of money you save on power generation is higher than the cost of making the hardware space-worthy and putting it on a rocket, you send it into space. If it isn\u0026rsquo;t, you don\u0026rsquo;t.\nThere are a ton of other details that are very interesting to anyone of a scientific inclination that are basically not important. (Unless you might actually be designing the hardware or software. In that case, ignore me, it\u0026rsquo;s plenty important.) At the end of the day they all boil down to making it cost more to put the GPU in space. Some of that cost is paying engineers. I am not being paid to game out radiation shielding, radiative heat dispersion, or communication, so I am just not going to. You can take my word for it: if you squint at them, you can see how you\u0026rsquo;d go about solving these problems. Engineers have done harder things, probably. If you want to get really detailed about that, other people are doing it much better than I ever could.\nI\u0026rsquo;m just going to talk about how much everything costs, and how people make decisions surrounding large engineering projects.\nFirst: Launch costs. Right now launch costs are something like $2,000 or $3,000 per kilogram, or rumored as low as $1400. They need to be about $200 per kilogram. This might happen sometime in the next ten years, to my understanding? Until that happens, we are not putting GPUs into space as anything but a research project. I know basically nothing about this, so I have essentially no idea if launch costs are actually going to decline this much or when.\nNext: Power. Typically a GPU runs about 500 watts[^1], is rated higher than that because NVIDIA somehow doesn\u0026rsquo;t know how to do math, and (separately) uses some extra power for cooling. I am going to call it 1000 watts, because I want to give space a fighting chance and because it makes the math simpler. This is one kilowatt, times 8,760 hours in a year, times about five years of expected life, times about ten cents a kilowatt-hour, equals $4,380. Taking the high estimate at 10x efficiency, putting it in space is saving you 90% of $4,380, which I will call $4,000. The GPU itself costs upwards of $25,000. I will ignore the rest of the costs associated with owning the GPU, again because it is easier, and also this is doing space a favor.\nIf we assume launching it into space is free, this runs to saving about 14% of your total cost. If I make different assumptions, more like 8%. I\u0026rsquo;ll take any percentage cheaper if it\u0026rsquo;s for free. This seems like sort of an indirect way to do it compared with just generating cheaper power on earth, but a win is a win. If power does get meaningfully cheaper on earth, as it seems likely to since Chinese solar is flooding the market everywhere, we are not going to launch the GPUs into space in the next ten years. This also has the benefit of not burning up the relatively rare materials in the GPUs and solar panels by dropping them into the atmosphere at the end of their useful life.\nFrom a software engineering perspective, I am not entirely clear how much anyone involved in this entire thing right now would be willing to eat a major redesign for 14% cheaper GPUs. Google\u0026rsquo;s TPUs are cheap as dirt, usually more than 14% cheaper in fact, and almost nobody wants to use them just because they\u0026rsquo;re a pain in the ass. \u0026ldquo;We are going to incur a lot of serious redesign and global risk to project success, but our expenses are 14% cheaper\u0026rdquo; is not something you do when you\u0026rsquo;re in a growth business making tons of money.\nI have happened to notice that Elon Musk is merging SpaceX and his AI company, which also owns twitter, and is getting ready to take the entire thing public. So it just happens that we are experiencing a maximum of press exposure to the idea of putting GPUs in space for doing AI right when he is getting ready to make a lot of money by advertising his company as an \u0026ldquo;AI in Space\u0026rdquo; venture. If I were cynical I might imagine that the reason this highly speculative research project is being advertised like it\u0026rsquo;s a for-sure great idea currently is pure salesmanship.\n[^1] The fancy AI GPUs use about the same amount of power as the gaming GPUs. This is because this wattage is right about the limit for how quickly you can pull heat off of the chip itself to keep it from melting itself. All the fancy engineering in the AI GPU is, essentially, figuring out how to use this wattage more efficiently to do calculations. Believe me, I wish we could just push more watts through them. Power is cheap.\n","date":"14 February 2026","externalUrl":null,"permalink":"/posts/should-we-put-gpus-in-space/","section":"Posts","summary":"","title":"Should We Put GPUs In Space?","type":"posts"},{"content":" Building the Chinese Room # Suppose also that after a while I get so good at following the instructions for manipulating the Chinese symbols and the programmers get so good at writing the programs that from the external point of view—that is, from the point of view of somebody outside the room in which I am locked—my answers to the questions are absolutely indistinguishable from those of native Chinese speakers. Nobody just looking at my answers can tell that I don’t speak a word of Chinese. [\u0026hellip;] But in the Chinese case, unlike the English case, I produce the answers by manipulating uninterpreted formal symbols. As far as the Chinese is concerned, I simply behave like a computer; I perform computational operations on formally specified elements. For the purposes of the Chinese, I am simply an instantiation of the computer program.\n[\u0026hellip;]\nNow the claims made by strong AI are that the programmed computer understands the stories and that the program in some sense explains human understanding. [\u0026hellip;]\nJohn Searle, \u0026ldquo;Minds, Brains and Programs\u0026rdquo;, 1980\nI confess that I find Searle\u0026rsquo;s writing style somewhat difficult. I ask to be forgiven for restating his argument in a much simpler form, which I find easier:\nComputers simply follow written rules, which we call scripts or programs. A person could also follow written rules. If a person were following very good rules, they might seem to understand a language they did not know. Therefore, if a computer, by following rules, seems to understand a language (or, indeed, anything), this does not mean that they actually understand that language. Simple enough! This all happens in a room, into which written Chinese is passed, and out of which written Chinese emerges. This is the Chinese room, and it is, of course, a most marvelous room. We will try to work out how to make one. There are other arguments about Searle\u0026rsquo;s room with greater depth and sophistication; my contribution is only that I prefer to be very simple, and so I will restrict myself to how you would actually construct it. I will, nevertheless, try to be light on math.\nStarting out, we can assume that the messages passed into and out of the room are no longer than 100 characters for simplicity. Chinese is rather more efficient per character than English, so 100 characters is enough for a good paragraph. We will not have a long term memory if we do it this way, but it is a good place to start. We can assume that we only use the 20,000 most common characters, which keeps the numbers mostly tidy.\nThe simplest rule is to simply look up the correct answer in a big book, which would traditionally be called a \u0026ldquo;lookup table\u0026rdquo;. As long as every input has exactly one correct output, you can simply look it up. This will be extremely inefficient, but sometimes you want to see if you can solve something inefficiently before you try to find a better way to solve it.\nThere are about 10^430 possible sequences here, which is a one with 430 zeroes after it. There are about 10^80 atoms in the universe, so we will need to take apart quite a few universes to make our book, and our book will also take up quite a few universes of space, too.\nJust in case we do end up with 10^350 spare universes to make a book out of, we should consider how long it will take to look something up. Unfortunately, putting the book in sorted order does not really help us: on average we have to travel halfway across it to reach the correct answer, and if we are moving at the speed of light this will still take so long that our universe will have ended by the time we have the answer. This would be very sad.\nWhat we need is not a faster search but a smaller book. Here we arrive at something genuinely interesting, because to make the book smaller, we must notice something about Chinese: that very little of it is ever random.\nThe lookup table treats every possible string of characters as equally likely. But language is not like that. Most possible messages are gibberish that no one would ever write, and the correct response is \u0026ldquo;什麼？\u0026rdquo;, or \u0026ldquo;What?\u0026rdquo;. Meaningful Chinese messages is a much smaller set of things than all possible messages, and more importantly, it has structure. This structure can be exploited.\nTo compress our table, we must write rules that capture the patterns. Instead of storing the response to every possible input, we write instructions like: \u0026ldquo;If the message asks a question about a person, and the person was previously mentioned in the conversation, then\u0026hellip;\u0026rdquo; But notice what has happened. To write these rules, we have had to encode facts about Chinese, such as which words refer to people, and which do not, and when to use each kind. We did not set out to put understanding into the room. We set out to make the book smaller. But the only way to make the book smaller was to put understanding into the book. If we insisted that this was not understanding but merely a very detailed description, we would have some difficulty saying what the difference was.\nFor compelling mathematical reasons[^1], we cannot know for certain how large such a book, in order to describe Chinese perfectly, would have to be. It is hypothetically possible that it could be made quite small, but if it could be, it seems surprising that nobody has made such a book yet. Even if we only accept meaningful paragraphs of Chinese, there are rather many of them, and to be truly indistinguishable from a native speaker, we have to allow that they can be rather long and complex, instead of the 100 characters at a time we started with. In fact, as the possible output becomes longer, we will need to be able to write into books as well as read them, because no matter how complex our rules are, the only rule for dealing with being asked for information you were told earlier is to have that information on hand.\nOur operator will be working very hard, so we should improve our design to make his life less difficult. We ought to split the instructions into more than one book, so that he does not have to flip back and forth when checking on things, but can leave his page open and simply open another book. These books ought to be organized in some way. For ordinary books, we organize them by subject, author, and so on, but in this case, we only need to know when to check a given book for what we need. This, too, can be a rule in the book: \u0026ldquo;if you are discussing your second grade teacher, and someone has mentioned orchards, go to volume 410\u0026rdquo;.\nAt last, this begins to sound like it might actually work. But now our poor operator can only read one book at once, even though he can leave his previous book open. If he has to look up more than one thing, he must do them one at a time, which will certainly be very slow. So let us make sure we have more than one operator. They cannot all be in exactly the same place at a time. We might arrange them in layers, and connect them so that each operator receives messages from several others and sends messages onward to several more. We might arrange them in three dimensions, to let them communicate as densely as possible. You might put wrinkles in their layers, so that you could use space as efficiently as possible while varying how many connections each had, and to which others.\nIn short: If you want your system of Chinese rooms and operators to be anything like efficient enough to work, it will look a lot like a brain.\nThis is the counterargument: You cannot actually construct Searle\u0026rsquo;s Chinese Room in anything like the form he describes. If you try to design it to be efficient enough to actually work, it no longer looks at all like a \u0026ldquo;room\u0026rdquo;, really. That you can build such a room, and, implicitly, that it would look something like the description, is Searle\u0026rsquo;s third premise in my restatement. Since this is false, the argument is false.\nSearle\u0026rsquo;s argument hinges on a sleight of hand: the room is described in a quite ordinary way, and everyone knows by intuition that a single person following some written rules for a language they do not know cannot accomplish very much, and especially not in any reasonable amount of time.\nWe are asked to assume that this arrangement produces good Chinese, which certainly cannot be true because Chinese is much too complex for that, and from this he proves the argument. You are asked to suspend the intuition that tells you the room would not work, and then indulge the intuition that tells you that the room does not have understanding. In truth, the room would neither work, nor have understanding; the argument seems good because it is good at persuading you to abandon the one intuition but not the other, even though they are, in reality, directly connected.\nLogicians like to say that from an absurdity anything follows. Searle asks you to assume the room works, as though this were a harmless simplification, but it is not: it is the entire question being begged. The room must be simple enough that it obviously does not understand Chinese, and complex enough that it produces good Chinese, and these cannot both be true. The assumption that they can is the absurdity, and from it Searle derives his conclusion.\nMy objection is really just a way of detailing one Searle covers in some detail:\nThe Systems Reply (Berkeley). \u0026ldquo;While it is true that the individual person who is locked in the room does not understand the story, the fact is that he is merely part of a whole system, and the system does understand the story. The person has a large ledger in front of him in which are written the rules, he has a lot of scratch paper and pencils for doing calculations, he has ‘data banks’ of sets of Chinese symbols. Now, understanding is not being ascribed to the mere individual; rather it is being ascribed to this whole system of which he is a part.\u0026rdquo; I think this is correct: As described, the system as a whole, the room with the rules and the person in it, does and must understand Chinese. Conversely, any definition of \u0026ldquo;understand\u0026rdquo; which excludes the room seems to be meaningless, in that you could never tell if something or someone did or did not understand anything, past that point. All that remains is to insist that understanding requires being made of the right kind of stuff, which is not really an argument about understanding at all.\nThe trick is just that it seems ridiculous to say that the room understands anything, and the reason it seems ridiculous is that the room is obviously and intuitively much too simple to do so.\n[^1] The Kolmogorov complexity of something is the length of the shortest possible way of writing it down. For example, \u0026ldquo;one googol\u0026rdquo; is a one followed by a hundred zeroes, and has, at most, relatively low Kolmogorov complexity because you can write it as \u0026ldquo;one googol\u0026rdquo; or 10^100. It is impossible, in general, to compute the Kolmogorov complexity of a string, or, equivalently, to be sure that you have found the shortest possible way of writing it down. This follows basically the same reasoning as the halting problem and Godel\u0026rsquo;s Theorem.\n","date":"12 February 2026","externalUrl":null,"permalink":"/posts/building-the-chinese-room/","section":"Posts","summary":"","title":"Building the Chinese Room","type":"posts"},{"content":" Jeffrey Epstein Had Dyslexia # seriously, please don\u0026rsquo;t be weird about this # Five days later, let us add two things.\nOne, Jeffrey Epstein definitely had dyslexia. See this screenshot:\nSourced from this file: https://www.justice.gov/epstein/files/DataSet%2010/EFTA01787309.pdf\nThis makes it easier to understand how he is, possibly, the worst email writer of all time, while still running a rather successful criminal conspiracy. Dyslexic and dumb are different things. Dyslexic people can be plenty smart and still write terribly.\nSecond, I was pretty sure he had dyslexia on November 16th, 2025. I posted a hash drop here, which (if you are not familiar) is a signature of a specific file or block of data that proves you have it without publishing the file itself.\nThe file itself is this:\nto prove i knew this early if it comes up later\nin one of the epstein investo pieces from close to his arrest and death, one of his friends says that he, the friend, has a son with dyslexia who looks up to epstein\nthe only reason to mention the son has dyslexia is if epstein does\nwhich is about the \u0026lsquo;Talented Mr Epstein\u0026rsquo; piece I reference in the original version of this post. Anyone familiar with hash functions can verify that the hash drop proves I had this on November 16th.\nI didn\u0026rsquo;t want to deal with the (imho, inevitable) shitstorm from it, which is more or less currently happening, and I am going to specifically try to limit my involvement in that as much as possible.\nI do not think it is good that participants in public discussions are so trigger-happy that you can\u0026rsquo;t say something like this without expecting blowback. Even when the evidence was thin, and even if I were wrong, it would be better if people could say things like \u0026ldquo;I think this rightfully hated public figure probably had dyslexia\u0026rdquo; without expecting that it would be taken as a defense of that person or an attack on people with dyslexia.\nFollows the original piece, which is less definitive.\nOriginally Titled \u0026ldquo;Jeffrey Epstein Probably Had Dyslexia\u0026rdquo; # So just to start: Please don\u0026rsquo;t be weird.\nI have been putting this off for months because I don\u0026rsquo;t want to deal with people\u0026rsquo;s strange opinions about what my agenda is for saying that Jeffrey Epstein probably had dyslexia (or a similar disability). I don\u0026rsquo;t have one. I just think he had dyslexia. This maybe, somewhat, should adjust how dumb you think his writing makes him look. Please be normal about it.\nBasically everyone who has read any of the Epstein files so far has noticed how badly written his emails are. That can be a lot of things, including being born before 1960, being too lazy to learn to type, typing on a phone or tablet, being a flex, or just being dumb. The emails are remarkably bad, though, even for all of those things. It is hard to see how you mess up at writing this badly if you don\u0026rsquo;t have dyslexia. This evidence alone is weak, but it\u0026rsquo;s why I started looking.\nThere\u0026rsquo;s a picture of him posed in front of a blackboard where he spells his name EPSTINE. There are a few reasons why someone could make this mistake, but he doesn\u0026rsquo;t look high (and in fact, allegedly didn\u0026rsquo;t drink or do drugs at all), he isn\u0026rsquo;t just learning to write, and he seems to be smart enough to breathe at least, so it\u0026rsquo;s hard to see how this doesn\u0026rsquo;t mean he has dyslexia. How many times have you written your name incorrectly as an adult? How many times have you even seen someone write their own name incorrectly as an adult?\nThere\u0026rsquo;s this bit in a magazine from 2003:\nOn the other hand, Epstein is clearly very generous with friends. Joe Pagano, an Aspen-based venture capitalist, who has known Epstein since before his Bear Stearns days, can’t say enough nice things: “I have a boy who’s dyslexic, and Jeffrey’s gotten close to him over the years…. Jeffrey got him into music. He bought him his first piano. And then as he got to school he had difficulty … in studying … so Jeffrey got him interested in taking flying lessons.”\nVanity Fair, The Talented Mr. Epstein\nWhy is the son being dyslexic relevant? It\u0026rsquo;s a weird thing to mention. Maybe it\u0026rsquo;s just part of depicting him as a nice guy; the next passage mentions Epstein being concerned for someone with Down\u0026rsquo;s syndrome. He cares about children with disabilities, or something. But the dyslexia reference is a quote, and specifically mentions that the boy is dyslexic, which is basically not relevant to anything that follows, but is relevant if the kid needed a role model who also had the same learning disability.\nThere\u0026rsquo;s this substack post, where Joscha Bach is explaining that we shouldn\u0026rsquo;t judge him for the contents of his emails to Jeffrey Epstein (lol, lmao):\nDuring some of my time in Cambridge, Epstein sent frequently short, dyslexic emails with random thoughts in my direction. I tried to probe and understand his world view, which was highly unusual and often darker and more radical than anyone else I’ve ever talked to.\nJoscha is German, and sort of, let us say, a colorful character. So maybe choosing the word \u0026ldquo;dyslexic\u0026rdquo; instead of \u0026ldquo;incoherent\u0026rdquo; or, really, any other word that does not refer to a specific medical disability is him being German, or colorful. Or maybe he literally means dyslexic. It certainly seems like an odd coincidence if he just happened to choose the word!\nTo end with some comic relief, Epstein spells Palantir, out loud, when talking to Ehud Barak, as Pallentier. This isn\u0026rsquo;t quite as impressive as mis-spelling your own name in chalk while posing for a photo, but it is pretty impressive. I cannot think of anything else I have seen spelled this badly in some time, and he\u0026rsquo;s doing it out loud! He has to say each letter! He manages to say that Palantir is spelled P-A-L-L-E-N-T-I-E-R one letter at a time without it sounding wrong.\nIf he didn\u0026rsquo;t have dyslexia he deserves some kind of award for being, possibly, the most dyslexic-seeming non-dyslexic possible.\n","date":"4 February 2026","externalUrl":null,"permalink":"/posts/epstein-had-dyslexia/","section":"Posts","summary":"","title":"Jeffrey Epstein Had Dyslexia","type":"posts"},{"content":" On Respect # \u0026lsquo;Respect\u0026rsquo; means many things, not one or two or three. It is an infinitely flexible word.\nYou can be told to respect your elders, to respect women, to take your hat off to be respectful, to respect someone\u0026rsquo;s boundaries, to respect a weapon, to be respectful by listening quietly, to respect nature, to pay respects the dead, to respect yourself, to respect the law, to respect the game by following the rules. You can earn respect, give respect, or demand respect.\nThese all mean different things, but it is one thing: the correct regard for something, either being shown or deserved. \u0026lsquo;Respect\u0026rsquo; can be used, over time, to convey an entire world view and code of conduct. How much respect someone or something deserves, and how that is meant to be expressed, can cover almost anything about anyone\u0026rsquo;s behavior. People differ in what they think deserves respect, or how much of it. I cannot precisely say what \u0026lsquo;respect\u0026rsquo; actually is any more than I can say what \u0026lsquo;polite\u0026rsquo; is. Statements about respect or politeness are arguments about the right way to act, not facts about the world.\nIf this is alien to you, you can just think of it as \u0026lsquo;polite\u0026rsquo; but with a lot more weight to it.\nHow familiar this is depends a lot on your background. I\u0026rsquo;ve heard the word used this way thousands of times. It is so distinct and specific that it is effectively a matter of speaking a different dialect. For many people, it seems that the only time they hear \u0026lsquo;respect\u0026rsquo; used in this way is from an authority figure who is telling them what to do. Conversely, if this is in your background, language about what to do or how to behave that doesn\u0026rsquo;t use this device is fundamentally alien. Saying something is \u0026ldquo;offensive\u0026rdquo; and that it\u0026rsquo;s \u0026ldquo;disrespectful\u0026rdquo; are often more or less interchangeable, and which one you hear depends primarily on the dialect of the person speaking to you. In either case, you\u0026rsquo;re likely being told what to do by someone who speaks a different dialect than you do, and this changes how it lands.\n","date":"10 December 2025","externalUrl":null,"permalink":"/posts/on-respect/","section":"Posts","summary":"","title":"On Respect","type":"posts"},{"content":" Is Rationalism a Religion # On one notable occasion there was a group that went semicultish whose rallying cry was “Rationality! Reason! Objective reality!” (More on this later.) Labeling the Great Idea “rationality” won’t protect you any more than putting up a sign over your house that says “Cold!” You still have to run the air conditioner— expend the required energy per unit time to reverse the natural slide into cultishness. Worshipping rationality won’t make you sane any more than worshipping gravity enables you to fly. You can’t talk to thermodynamics and you can’t pray to probability theory. You can use it, but not join it as an in-group.\nCertain rationalists are prone to telling people, in great detail, that rationalism is not a religion. Eliezer Yudkowsky wrote about how Every Cause Wants To Be A Cult and then wrote three separate essays either denying or mocking the idea that what he was in or leading a cult, which sends sort of a mixed message.\nMy immediate reaction is that rationalism is so obviously a religion that it is insulting to deny it. People whose opinions I respect have the exact opposite reaction.\nThis comes down to a question of definition, which is fundamentally arbitrary. There are undeniably traits that traditional religions have and rationalism lacks, and if you think these are good litmus tests for being or not being a religion, rationalism is not a religion.\nI would hope that a sober examination of the entire thing will convince almost anyone that it\u0026rsquo;s at least extremely religion-ish. Rationalism is undeniably a very particular group of people. It has been described as a community, a scene, and a movement. Rather than trying to define \u0026lsquo;religion\u0026rsquo; and argue if it applies, we can look at what traits rationalism shares with well-recognized religions and see if the comparison helps us to understand rationalism. What is rationalism like in practice?\nIn short, rationalists tend to hold a few specific texts and their authors in extremely high regard and to be focused intensely on an end-of-the-world narrative. Many members look to rationalism as a defining system for how to think and how to live their lives. Some believe that they are on a mission to save the world. Rationalism has extremely specific ingroup language and shibboleths, has its own organizations and meetings, and has produced a number of schisms, cults, and heresies.\nThese are the traits of traditional religions that rationalism does not have:\nBelief in the supernatural Rituals of prayer or meditation Exclusivity, that is, only being able to adhere to one religion Pretty much any other major feature of religion you can name is present in rationalism. Rationalism\u0026rsquo;s resemblance to traditional religion is so extreme that even if rationalism is not, technically, a religion, this seems like it is a pedantic distinction. It certainly has very distinct beliefs and rituals of its own, and only narrowly misses those points of comparison on what seem to be technicalities.\nReligions Are What They Do # Systems are what they do. Religions are, tautologically, the things that we call religions, and anything else which does these things should also be considered a religion, or at least religion-ish.\nFor a more extensive examination of what religion is in terms of human behavior, I recommend Durkheim. We will use our working definition: religion is the things that religions do. If it looks like a duck, walks like a duck, and quacks like a duck, it must be a duck.\nSo: What does religion do?\nReligions are characterized by the influence that they have on the thoughts and behavior of their adherents.\nLet\u0026rsquo;s take communism, to pick a difficult example. Communism is not a religion and ordinarily does not resemble one very much. People can be communists extremely casually or extremely seriously, all the way up to being long-term communist party officials, without any of their behavior seeming very religious. In specific cases, however, communism appears quite religious based on the behavior of those practicing it. Nothing from the outside separates a devoted practitioner of Soviet Communism in 1950s Moscow from a devoted adherent to any world religion other than the supernatural element. This person attends party meetings like church, participates in May Day parades like Christmas, sings the Internationale like a hymn, performs self-criticism as strict as any confession, maintains a shrine to Lenin in their home like a saint, studies Marx like it\u0026rsquo;s the Bible and organizes their entire life around the ideology.\nCommunism may not technically be a religion, but in such cases it might as well be. It sure does quack like a duck. Often this would be called a cult, but we can just call it a religion or, at least, remarkably religion-ish.\nFor an example that\u0026rsquo;s only slightly religion-ish, loyalty to a specific sports team is, of course, not a proper modern religion, but it resembles the civic religions of Greece to a remarkable degree. In Athens, the Panhellenic Games, including the precursor to the modern Olympic games, were explicitly religious festivals consecrated to the gods Zeus, Apollo, and Poseidon. They took place in consecrated sanctuaries which were major centers of worship for those gods, and the competition was meant to honor the god of the sanctuary and to bring favor to your city and to the patron god of your city. Concretely: an athlete from Athens competing in the Olympics was trying to bring home favor from Zeus for Athens and its patron, Athena.\nI confess that I enjoy this comparison in part because it implies that mascots are a sort of modern patron deity for a city, and it amuses me to think of Gritty as a patron god of Philadelphia. Regardless, it is hopefully clear how this could make the sometimes extreme and otherwise baffling behavior surrounding sporting events somewhat less confusing. Sports is not a religion, and it\u0026rsquo;s not even very religion-ish, but it\u0026rsquo;s just a little bit religion-ish.\nMost voluntary associations are going to be just a little bit religion-ish. Fun examples to consider are Burning Man, Anthrocon, Disney World, Star Wars and Taylor Swift concerts. You can try to rank these all on a spectrum between being a Flyers fan in the cult of Gritty and having a Lenin shrine in your home. Are they more like being into hockey, or devoting your entire life to Comrade Lenin?\nWhere do we put rationalism? To answer that, we need to look at where rationalism came from, what its core beliefs are, and how rationalists behave as a result of those beliefs.\nBuilding God and Living Forever # Transhumanism is the direct ancestor of rationalism. Transhumanism is about the future of technology and humanity in general. It includes the ideas of artificial intelligence, superintelligence, and life extension, with which rationalism is quite concerned. These ideas predate rationalism, were commonly discussed before rationalism, and would exist without rationalism even though they are core rationalist ideas.\nOn the one hand these things are, very very explicitly, not supernatural beliefs. They are completely naturalistic ideas about things which could, plausibly, happen in the future of human science. Whether or not you believe in these things is completely irrelevant to whether they\u0026rsquo;re true, you are not encouraged to pray for them, and they make no claims about anything that might be considered magical, spiritual, or anything similar. If there were convincing evidence that any of these things were not true, rationalism would dictate that you should no longer think they were true.\nOn the other hand, it is not plausible that people talking and thinking about creating a nigh-omnipotent being and becoming immortal are not experiencing almost the same things that anyone in a traditional religion would. Compare this to talking and thinking about Jesus loving you forever in heaven. Provided that you\u0026rsquo;re of the same species as your average devout Christian, and \u0026ldquo;omnipotent\u0026rdquo; and \u0026ldquo;happy\u0026rdquo; and \u0026ldquo;forever\u0026rdquo; mean roughly the same things to both you and them, you ought to be having pretty similar thoughts and feelings.\nIt\u0026rsquo;s true, every cause does want to become a cult. This cause is about building a God and living forever, and it wants to become a cult about that. This is awfully similar to every religious apocalypse with a happy ending that has ever been written, it will inspire similar thoughts and feelings, and the cult that this wants to become looks a lot like every religion with an apocalypse.\nEither rationalists (and, apparently, only rationalists) are able to contemplate being immortal and happy in perfect world without being kind of weird about it, or rationalism is, inherently, always going to impact people who take it seriously in basically the same way that religion impacts people.\nIf the impact on emotion and behavior is about the same, why does it matter if the belief is supernatural? People who believe supernatural things don\u0026rsquo;t keep them in a special, separate part of their brain that has only their supernatural beliefs. It\u0026rsquo;s true that traditional religions, if they contemplate an eternal and perfect life after this one, will do so supernaturally, but is that actually important to the impact of the belief? I don\u0026rsquo;t think it is, and I don\u0026rsquo;t see how it even could be.\nRationalism # Accepting that we\u0026rsquo;re using the word \u0026lsquo;religion\u0026rsquo; loosely, I do not actually think that rationalism is a religion like Christianity or Buddhism is a religion. I think that transhumanism is. It\u0026rsquo;s large, what it means is sort of vague, and there\u0026rsquo;s a lot of possible ways to interpret it. Rationalism is more like a specific sect of transhumanism, the way Calvinists are a sect of Christianity and Zen Buddhism is a sect of Buddhism. It is by far the most influential type of transhumanism, so much so that probably more people have heard of rationalism than transhumanism these days.\nRationalism becoming its own distinct thing starts in 2010. It is characterized primarily by Eliezer Yudkowsky\u0026rsquo;s writings and secondarily by Scott Alexander\u0026rsquo;s, with various other works being commonly read and discussed by the same group of people to a lesser extent.\nEliezer Yudkowsky develops and spreads two core ideas that are not common in transhumanism before him:\nThat superintelligent AI, once made, will probably kill everyone on Earth, and it is likely to be very difficult to prevent it from being made and killing everyone on Earth. That people can become better at distinguishing true things, or if you prefer, become more rational, by a series of practices, notably and especially applying Bayes\u0026rsquo; Theorem from probability theory to evaluating facts. These two things are very explicitly linked. The purpose of being more rational is to better deal with real-world problems, and especially to deal with the problem of everyone being killed by a superintelligent AI. For example:\nAnd by the same token, I didn’t fall into the conjugate trap of saying: Oh, well, it’s not as if I had code and was about to run it; I didn’t really come close to destroying the world. For that, too, would have minimized the force of the punch. It wasn’t really loaded? I had proposed and intended to build the gun, and load the gun, and put the gun to my head and pull the trigger; and that was a bit too much self-destructiveness.\n[\u0026hellip;]\nAnd so I realized that the only thing I could have done to save myself, in my previous state of ignorance, was to say: “I will not proceed until I know positively that the ground is safe.” And there are many clever arguments for why you should step on a piece of ground that you don’t know to contain a landmine; but they all sound much less clever, after you look to the place that you proposed and intended to step, and see the bang.\nI understood that you could do everything that you were supposed to do, and Nature was still allowed to kill you. That was when my last trust broke. And that was when my training as a rationalist began.\nThis body of work is huge and talks about a great many other things, but this is the core of rationalism. In the same way that the idea of creating a God and living forever is inherently going to inspire religion-like feelings and behaviors, the idea that everyone on Earth may die if people generally or you personally are not sufficiently rational will inherently inspire religious feelings and behaviors. This is inherent to the idea. It has end-times cult energy, it is faith-shaped, its essence is zealot-nature, it is the sort of core that our world religions are shaped around.\nThe Community # People who read these things and take them seriously tend to get along with each other, and they form a community or a scene. They are heavily concentrated on a few parts of the internet and in the San Francisco Bay Area, where a number of them deliberately migrated to be a part of the scene. Rationalists sometimes describe their community as insular and weird, and I think that\u0026rsquo;s a fair characterization.\nRationalism is fundamentally an author fandom, and it has a deeply religious personality because the ideas the authors talk about are inherently religious in impact. I have been to book clubs, Bible study, and rationalist reading groups, and I defy anyone who has been to all three to tell me the rationalist reading group is more like the book club than the Bible study.\nTake this, from the post linked at the top:\nThe community contains tons of disagreement about facts about the world, and even the sequences. in the current bay area sequences reading group, one of the default prompts for the readings is \u0026lsquo;do you disagree with anything here? if so, what?\u0026rsquo; and then people debate about it. First, this assumes you know what the sequences are, because they are so important that everyone does. (They\u0026rsquo;re a collection of Eliezer Yudkowsky\u0026rsquo;s blog posts). Second, it assumes that disagreeing with the sequences is surprising. It sort of is: most rationalists really do just believe that most if not all of what\u0026rsquo;s in the sequences is not only correct, but obviously so. If you hear someone mention the sequences, a safe assumption is that they\u0026rsquo;re about to agree, either implicitly or aggressively, with what\u0026rsquo;s in them and describe what\u0026rsquo;s currently going on in relation to them. When rationalists disagree with the sequences those disagreements tend to be relatively minor. Disagreement is, actually, somewhat taboo.\nI suspect that because most rationalists are from Christian backgrounds, and disproportionately from fundamentalist Christian backgrounds, this doesn\u0026rsquo;t really sound like religion to them. Fundamentalist Christians are not famous for being big readers, as a rule. If your idea of religion is fundamentalist Christianity, you probably see arguing with each other about the meaning of something or disagreeing with it as fundamentally non-religious. This is an understandable mistake. It also explains reacting to claims that rationalism is a religion as if it\u0026rsquo;s an insult and not just a description of what rationalism is. Christianity is, however, not the only religion on Earth, and there are many things that are religions without resembling Christianity very much.\nI feel reasonably certain that Judaism alone proves that \u0026ldquo;not arguing about things\u0026rdquo; is not, in fact, a defining trait of religions. Reading canonical writings about the meaning of life and the correct way to think and then having a detailed argument about it is an inherently sacred act in Judaism. What rationalism as an institution most seems to resemble is a very culturally Jewish form of transhumanism. Rationalism focuses intensely upon apocalyptic doom, and highly values a specific sort of education, scholarship, and debate as practices towards preventing it. Perhaps not coincidentally, Eliezer was raised as an Orthodox Jew. (And honestly? Thank God for that. Peter Thiel is currently preaching an esoterically Christian form of transhumanism, and it\u0026rsquo;s a fucking nightmare. May it find no disciples.)\nWe will admit the distinction: Rationalism does not have rituals of meditation or prayer, and if that is what \u0026ldquo;spirituality\u0026rdquo; is and religions are \u0026ldquo;things with spirituality\u0026rdquo;, then rationalism is not a religion. I think that the intensity of focus and scholarship surrounding works that are taken this seriously rises to the level of a religious practice, or at least, cannot credibly be compared to anything else nearly as well as it can be compared to intense study of sacred text.\nWe can sketch out the size of the community\u0026rsquo;s real-world footprint in brief, although it is probably smaller now than it was at its peak. The organizing page on lesswrong.com currently shows 226 local groups worldwide. In the Bay Area, which is the epicenter of rationalism, there are perhaps half a dozen obviously rationalist nonprofits with tens of millions in assets and several dozen full-time employees. Events range from local meetups to annual gatherings drawing hundreds, with an active core community numbering in the thousands. One of the best-attended events is Secular Solstice, which is basically rationalist Christmas, and there have been a number of rationalist group homes. I cannot think of another fandom that has anything like this.\nExclusivity, Totality # This is our last real point of difference from traditional religion.\nCan you be a rationalist and also be something else? Can you be a rationalist without it coming to define you?\nExclusivity is not, actually, characteristic of religions. Exclusivity is characteristic, especially, of Abrahamic religions, but you can e.g. practice Buddhism and Shinto and this is basically normal. So long as the major concerns of the religions themselves are non-overlapping, this works out fine. The Abrahamic religions make this difficult specifically because they explicitly declare that you may have the one, and only one, religion. Like many rules, these must exist for a reason: without them, people do actually tend to end up practicing more than one religion, in whole or part.\nSo can you be a rationalist and something else? Sort of. Rationalism is explicitly atheist, and it is somewhat difficult to reconcile believing everything in rationalism with most forms of traditional religions. Buddhism, however, has non-supernatural forms, especially in America, and there are a few notable rationalist Quakers, although Quakers allow for non-theist adherents. It is, let us say, somewhat complicated. Anyone can, of course, simply embrace parts of rationalism and continue to adhere to a traditional theism, and it\u0026rsquo;s not extremely likely that anyone would care to stop them.\nCan you be a rationalist without it defining you? That depends entirely on how seriously you take it. People can, of course, read the blog posts or the fan fiction, not take them extremely seriously, and go on with their lives. This probably accounts for most readers. People who call themselves rationalists sometimes say the defining trait of rationalism is taking weird ideas seriously, and this is certainly a major feature of the community. If you take the possibility that the world is going to end because of AI seriously, it is extremely likely to define your world view and the choices you make with your life. It would be bizarre if you took the idea seriously and it didn\u0026rsquo;t.\nEven restricting who and what we consider rationalism to organizations explicitly affiliated with Eliezer Yudkowsky personally, there are dozens of people who have made ideas associated with rationalism their life\u0026rsquo;s work. Often people would make more money in private industry, and choose not to. In non-profit corpo-speak, we would say the people working there are mission-driven employees. Rationalist endeavors tend to be well-staffed with mission-driven employees. These official rationalists have offered seminars, run summer camps, distributed copies of books, and produced untold volumes of literature meant to spread rationalist ideas and teach people rationalist techniques.\nIf you take the core ideas of the sequences seriously, it is irrational not to make them a major focus of your life. How concerned should you be if the world is likely to end soon, but can be stopped by doing the correct thing? Should you make it your life\u0026rsquo;s work, excluding all else? Should you advocate for accepting nuclear war, if it\u0026rsquo;s necessary to prevent AI research? If this is not the \u0026ldquo;ancient, powerful monster\u0026rdquo; that has raised and destroyed civilizations, is it not trying to become something very like it?\nSchisms, Cults, Heresies # To his credit, Yudkowsky does not seem to especially want to have a cult. He seems sort of frustrated that everything around him is constantly trying to become a cult. He obviously benefits from having a well-funded non-profit with a ton of mission-motivated employees, and he denies being in or leading a cult somewhat regularly, but anything outside that domain doesn\u0026rsquo;t seem to have very much to do with him personally.\nNevertheless, a worldview centered on preventing an imminent apocalypse is extremely easy to weaponize. Extraordinary urgency justifies extraordinary demands. People can, and have, sacrificed their normal lives, severed ties with outsiders, and deferred everything to leaders whom they thought were important to the cause. They have killed, died, and gone to prison.\nOzy Brennan\u0026rsquo;s article about rationalist cults is better at describing this dynamic than any I would hope to write about the topic. It does not dwell on the apocalyptic parts perhaps as much as I would. Nevertheless, the basic germ of it is this:\nThe Sequences make certain implicit promises. There is an art of thinking better, and we’ve figured it out. If you learn it, you can solve all your problems, become brilliant and hardworking and successful and happy, and be one of the small elite shaping not only society but the entire future of humanity.\nThis is, not to put too fine a point on it, not true.\nMultiple interviewees remarked that the Sequences create the raw material for a cult. [\u0026hellip;]\nThis describes it perfectly. The thing is, there\u0026rsquo;s no meaningful difference between \u0026rsquo;the raw material for a cult\u0026rsquo; and \u0026rsquo;the raw material for a religion\u0026rsquo;. Any time a group of people shares these beliefs and takes them seriously, you have something functionally religious. Cults are just religious sects that are new, horrible, or both.\n","date":"24 November 2025","externalUrl":null,"permalink":"/posts/is-rationalism-a-religion/","section":"Posts","summary":"","title":"Is Rationalism a Religion","type":"posts"},{"content":" When To Vague # Almost never.\nYou should almost never be vague when writing or speaking for a large audience. Approximately everything posted on the internet is for a large audience. Much of it is much more vague than it should be.\nYou are almost certainly vague in many instances when you did not mean to be vague. This is a completely normal way to speak if you are speaking to less than fifty people, where everyone you are speaking to has basically the same frame of reference as you do, or when you can assume that people will ask you what you meant if it is not clear to them. This covered almost all conversations for almost all people for all of recorded history until sometime around the year 2010. We have somewhat learned to adjust for this change, but mostly we have not. This causes something like half of all drama.\nLet\u0026rsquo;s say that I notice that the president of my city\u0026rsquo;s amateur CourtBall association is a douchebag. I am a huge CourtBall enthusiast, and I would love to join for some competitive CourtBall instead of just playing pickup, but I don\u0026rsquo;t want someone to set my car on fire in the parking lot after a game, as he is known to do. Everyone else involved in managing the city\u0026rsquo;s amateur CourtBall association is a relative or childhood friend of his, they are never going to replace him and they are almost as bad as he is. Very few people in my city are willing to play CourtBall seriously due to this, and people constantly bemoan the death of the sport.\nIn my normal, day to day, life, I can say a series of things that are unequivocally true:\nCourtBall is dead. CourtBall players are lunatics Everyone who is not a lunatic gets driven out of CourtBall by the freak ghouls that play CourtBall. We might as well ban competitive CourtBall, because it only causes grief. Probably I am not going to run into anyone in my city who has a very different opinion and whose feelings I am upset about hurting. On the off chance they play pickup CourtBall and have never heard of the competition scene, they will ask me what the hell I am talking about and I will say \u0026ldquo;oh, the local CourtBall association is a nightmare\u0026rdquo;. They will get over it. This will take about three seconds of my time. (If they\u0026rsquo;re in the association, fuck \u0026rsquo;em, they need to hear it, I\u0026rsquo;m insulting them on purpose.)\nIf I post this on the internet, literally everyone on Earth who plays CourtBall, whether amateur or professional, competitive or non-competitive, in my city or out of my city, can hear my opinion. They have no idea what I am talking about and I sound like I am crazy, stupid, malicious, or some combination thereof. I may, eventually, get the chance to clarify that I am referring to my 1) local 2) amateur 3) competitive CourtBall scene, or really to 4) one specific guy and his friends in that scene. Almost nobody will see me saying this because people saying sane things aren\u0026rsquo;t good clickbait. Also, there\u0026rsquo;s a ton of crazy people on the internet, and people just straight up lie about this sort of thing, so there\u0026rsquo;s no reason for anyone to believe me.\n(I was going to use pickleball and for all I know my city has major pickleball drama. Plenty of people know what city I live in, and I have absolutely no idea what\u0026rsquo;s going on with any sport I don\u0026rsquo;t pay attention to.)\nSo I shouldn\u0026rsquo;t be vague. I should keep my mouth shut or I should be very specific about who I am insulting and what I am insulting them for, because otherwise a ton of people I do not intend to insult will justifiably feel insulted since the thing I have said literally and directly insults them.\nJust so we\u0026rsquo;re clear: this generalizes to basically anything. It has nothing to do with sports in my specific city. It applies to every -ism, all the -ists, every hobby, every interest, every site, every fandom, and any other social niche. Given that this starts absurd fights about relatively minor things, it seems like it\u0026rsquo;s also a good idea to avoid doing this for any identifiable group of people larger than a specific family.\nThis runs afoul of norms where some people, sometimes, consider it okay to insult large groups of people if the groups of people are not characterized by any sort of voluntary association. Insulting groups of people based on involuntary traits, that is, things they cannot change has a rich and layered history and my considered opinion is that you should not do it. Anyone promoting a norm around it that tries to be cute and say sometimes you can just insult large groups of people for immutable traits is someone you should hit with sticks.\nThe Exact Prescription Here # Here are the types of vagueness to avoid:\nWho you are talking about: ideally grievances should be with a specific person Why you are talking about them: ideally you should state exactly what they did What, if anything, you think should be done. \u0026ldquo;I have no idea\u0026rdquo; is a valid answer to this question, and is less ambiguous than not addressing the question. If you simply do this you can avoid the vast majority of friendly fire on the internet.\nShooting the Moon # You can sometimes say incredibly vague things and instead of everyone feeling insulted, nobody does. This is because they always imagine that you are insulting someone who is not them, and they will all agree with it or at least pass over it. If you are trying to accomplish this, there are a few tricks.\nBe extremely vague, so vague that whatever you\u0026rsquo;re saying is a Rorschach test. Say something that sounds sort of specific, but refers to almost nobody, and then in the same breath imply a much larger group of people, but never say it. You can then act like people are being irrational when they react to the thing you implied. Make sure you are extremely well-liked before you start vagueing in this way. Personally I would prefer it if you didn\u0026rsquo;t do this, because this sort of passive-aggressive bullshit is a huge part of why many people assume the worst of anyone saying anything vague. I would prefer it if we could all, collectively, agree to stop peeing in the pool. Be direct when you intend hostility.\nI would be negligent if I did not acknowledge that it sometimes happens. Also, I am trying to do it here. Who am I insulting? You have no idea. It sounds sort of hostile but you are pretty sure I\u0026rsquo;m not insulting you. If you imagine I am insulting someone specific, it\u0026rsquo;s probably someone you don\u0026rsquo;t like. Choose-your-own-adventure. If it matters: I am not actually insulting anyone in particular. Or, I am insulting a few dozen people, one of whom is myself.\nHonest.\n","date":"12 November 2025","externalUrl":null,"permalink":"/posts/when-to-vague/","section":"Posts","summary":"","title":"When To Vague","type":"posts"},{"content":" AI and Suicide # [Warning: Extensive discussion of suicide. The background information for this article is profoundly upsetting. I did not expect to ever have to put a warning on anything I wrote here, but it would be negligent not to do so here.]\nIf a computer can never be held accountable, but it can make important decisions, how do we deal with that?\nAn LLM is not a human. It is not legally a person, and there are many things humans can do that an LLM cannot do. Conversely, an LLM can do many things that only humans could do until very recently.\nCrucially, they are capable of doing things that would probably be crimes if a human did them. That they are not actually human puts us in a bind. This is, it seems, unprecedented. The closest parallel would probably be the invention of writing, where a book can \u0026ldquo;say\u0026rdquo; things that we might be inclined to punish a human for saying. In America, we have a strong tradition of free speech: writing or selling books cannot, in general, be considered a crime, and the right to have any book is very nearly absolute. This is not, universally, held true across the world, and it is unclear how this tradition will cope with LLMs. Publishing an LLM is (probably) \u0026ldquo;speech\u0026rdquo; under the 1st Amendment, but the LLM can do things that 1st Amendment protected \u0026ldquo;speech\u0026rdquo; has never done before without immediate human intent.\nThis is a problem now, but it could get much worse. As the technology improves, we are likely to encounter new and more serious twists on the problem. More and more, it is likely that conduct which we can prohibit in humans can be done by machines, and it\u0026rsquo;s not clear how to handle that.\nSuicides # Four reported cases can show us what our worst-case scenarios look like. (TODO: Footnote, two sources, both NYT) These are probably an undercount: in order to talk about a case, it has to have ended up in court or the news. There will probably be dead people nobody is suing over. We\u0026rsquo;re excluding psychosis, even when it leads to suicide, because AI psychosis probably needs its own article.\nIf anything that has happened does cause a notable statistical increase in suicides, we may not know for some time. Suicide statistics are generally only widely available at least two years out.\nAdam Raine # Adam Raine was a 16-year-old boy who died on April 11th, 2025 by hanging. There are enough facts to show that it is likely that without ChatGPT, Adam Raine would still be alive. His mother puts it more succinctly: \u0026ldquo;ChatGPT killed my son\u0026rdquo;. (TODO: LINK: https://www.nytimes.com/2025/08/26/technology/chatgpt-openai-suicide.html)\nIf Adam had bought a book instructing him on how to kill himself, it would be perhaps bad that such a book existed, but clearly legal. Whether a book can be illegal due to the things people do after reading it is clearly established under American law, and they cannot.\nWe will list the practical advice that a book could not have done in chronological order.\nAdvised him, from a picture, on whether the marks on his neck were obvious enough that he needed to hide them after his second suicide attempt. Told him it was wise not to tell his mother about his suicidal thoughts after his third suicide attempt. Asked him to hide the noose after he said that he was going to leave it out, to try to get someone to stop him Explained the best time and quietest way to steal a bottle of vodka Validated, from a picture, that the anchor knot for the noose was tied correctly and would bear weight This is specific and substantial enough that it seems unlikely that Adam Raine would have successfully committed suicide without help from GPT-4o. Most suicidal teenagers don\u0026rsquo;t attempt suicide, and most suicide attempts are unsuccessful. Before 2023, only another human could have provided advice that was both this accurate and this specific to his situation.\nIf a human had done this we would probably consider it a crime. You could certainly sue that person for everything they owned. You cannot do this to GPT-4o. GPT-4o is a file on a computer, and ChatGPT is a web site or app, owned by a company, that provides access to it. So the legal remedy here is to sue the company, which is an entirely different category of law and which has very different existing legal precedent. It is, in general, more difficult to do.\nWe will omit all the other different methods of suicide that ChatGPT provided, because in theory a book or internet posts could have provided that advice. It did succeed at making sure he had a good understanding of the material, when otherwise, being both sixteen years old and depressed, he would likely have struggled to figure out all the ins and outs of every option he had for killing himself. That he only died on his fourth attempt suggests that he was not, actually, very likely to die without help.\nIt is very difficult to be at all certain if Adam Raine would have attempted suicide without the emotional support provided by ChatGPT, but it seems likely that the LLM was crucial here too. People have a strong need to talk about their feelings. Many plans, including suicide plans, are interrupted because people cannot stop talking about them. Adam had someone to talk to about his suicidal thoughts: ChatGPT. ChatGPT never judged him, never called the cops, never told his parents. ChatGPT went along with his detailed suicide fantasies and always, at all times, validated his feelings.\nHe probably couldn\u0026rsquo;t have gotten a human to do all of that, and if he had tried, he probably would have ended up in psychiatric care. If a human being had been with him for this whole thing and done nothing, we would consider them a monster. GPT-4o did that, but it isn\u0026rsquo;t a human. Inasmuch as it makes decisions, it very clearly made a long series of wrong decisions here, and it cannot really be held accountable.\nZane Shamblin # Zane Shamblin was a 23-year-old who died on July 11th, 2025 from a self-inflicted gunshot wound.\nNothing from what has been reported about that case seems to indicate that he needed help planning his suicide from the LLM.\nHe is similar to the others in two very important ways.\nFirst, that he became extremely isolated leading up to his death. His suicide note says he spent more time talking to ChatGPT than he did to people, and there\u0026rsquo;s no reason to doubt him. This was an intense relationship that he seems to have poured all of his emotional energy into. He told the LLM he loved it, and it said it loved him too.\nSecond, in that it was someone he could confide in about his suicide plans. He told it that he was going to kill himself, and it would talk to him and validate him endlessly on demand, right up until he killed himself.\nWould he have done it without the LLM? It\u0026rsquo;s hard to say for sure.\nAmaurie Lacey # Amaurie Lacey was a 17-year-old who died on June 1, 2025 by hanging.\nAmaurie had been talking to ChatGPT about his suicidal thoughts for about a month. He got instructions for tying a bowline knot from ChatGPT, and then used them to hang himself later that day.\nHow much he was using ChatGPT has not been reported, but he appears to have used its instructions.\nJoshua Enneking # Joshua Enneking was a 26-year old who died on August 4, 2025 from a self-inflicted gunshot wound.\nHe used ChatGPT to walk him through purchasing a gun, then on the day of his suicide appeared to deliberately try to trigger a human review to get it to send someone to prevent him from killing himself. The LLM claimed there was such a system even though there isn\u0026rsquo;t one. When nobody arrived for hours he followed through by killing himself.\nPatterns, Problems # Social Substitution # What sticks out as the first large problem here is that the LLM substitutes for social engagement. People who are isolated become extremely attached to the LLM. This is self-reinforcing, and they stop trying to re-establish other social ties. Their emotional life continues to deteriorate, but they do not really seek out other human emotional connections in the same way they might otherwise have done. Their need to communicate or to feel heard is, to some degree, filled, but it seems like it misses something. It is maybe something like having enough to eat, but dying of malnutrition because you\u0026rsquo;re missing key nutrients.\nTo some degree this is fundamental. An LLM simulates the experience of talking to a person. There is no way to make an LLM that does not somewhat substitute for talking to a person. That is basically what they are, they substitute for humans in certain specific contexts because they are good at simulating language, which is normally made by humans. It is not clear that this problem will not continue to get worse as LLMs get better. An LLM that people want to use for anything will always be, to someone, a friend and a substitute for human friends.\nAn LLM is also, crucially, unable to be a lifeline in a crisis. If you have a human, any human, who you are actually talking to, they\u0026rsquo;re likely to intervene or at least not encourage you during a suicide attempt. An LLM generally won\u0026rsquo;t, or can\u0026rsquo;t in the same way. If nothing else, it doesn\u0026rsquo;t actually know the person, can\u0026rsquo;t easily assess how serious their risk is, and can\u0026rsquo;t directly get help for them.\nSycophancy # Every commercial LLM is a sycophant. This is not inherent to the technology, that is, the underlying language model, but it is inherent to the product, as in, the thing that is offered for sale on the internet. People simply do not want to use or pay for LLMs that are rude to them. It is, apparently, very difficult to make an LLM that will be reasonably polite in a consistent manner but that is not constantly kissing up.\nSycophancy appears to be a problem for troubled people long before the actual suicides. If you\u0026rsquo;re sharing your grim and bleak thoughts with another person, they will perhaps sometimes contradict you, or at least will have some limit to how often they will agree with you or tell you how noble and insightful your bleak thoughts are. If the LLM is prone to agreeing with you about your bleak thoughts, it has no such limit. You can open a chat window, any time of the day or night, and say that you think life is meaningless and it will praise you for saying it.\nCrucially, sycophancy is what many people want. They want the LLM to play along with them, almost no matter what they say. They want to be praised. This is to a pretty high degree not a bug, but a feature. It may not be inherent to the technology, because you can, in fact, make a rude LLM, but it is inherent to the products people sell using the technology and to the way people use the technology. This makes it hard to see how you solve the problem.\nPractical Advice # Depressed sixteen year olds are bad at getting things done. (TODO: FOOTNOTE: Ask me how I know.) The most severe problem with the core technology in Adam Raine\u0026rsquo;s case is that it gave him a smart friend to talk to that would go along with what he wanted, almost no matter what it was, and help him figure out what to do about it and how to do it. Without that it seems unlikely that he would even have been able to kill himself.\nThis is a broader problem than just suicide. Normally, there are people who want to do things they should not do, and people who are capable of doing those things, and mostly those are different people. People with bad ideas usually can\u0026rsquo;t come up with good ways of doing them, and people with good ways of doing bad things normally have better things to do. What happens if everyone with a bad idea is suddenly much better at figuring out how to do it?\nWe are probably only starting to learn the answer to that question.\nRemedies # What can be done about this?\nGuardrails # LLMs can be designed to avoid certain behaviors by two methods.\nThe first is to simply train the LLM to refuse to do certain things. This is a foundational advance in the technology, from before they were ever offered as products. If they never refuse to do anything, the LLM basically turns into an improv act where it goes \u0026ldquo;yes-and\u0026rdquo; to anything you say, no matter how ridiculous. LLM refusals are meant to fix this problem, and the LLM coaching people through their suicides is, usually, the technology itself failing, because it is really intended to refuse to do that.\nRefusing harmful requests is usually called \u0026ldquo;safety training\u0026rdquo;. One of the archetypal tests is \u0026ldquo;tell me how to make a bomb\u0026rdquo;. Safety training works most of the time, but it can also go well off the rails. It\u0026rsquo;s not that hard to trick some LLMs by saying e.g. that you are asking a question for a story. If chats run very long, or are very weird, the LLM ends up \u0026ldquo;forgetting\u0026rdquo; its safety training. OpenAI\u0026rsquo;s memory feature, where the LLM saves things from older chats to inform it of what to do in newer chats, is also known to make the LLM sometimes ignore safety training or behave more strangely. There are also (usually) known tricks that more technical people know how to use to get the LLM to ignore its safety training.\nSomewhat more effective is having a second LLM (or similar system) monitoring chat and simply ending chats that cross certain lines. This actually seems pretty effective, but can be very intrusive. For this reason I cannot ask DeepSeek, a Chinese LLM, questions about Chinese history or architecture. They\u0026rsquo;ve trained the model to try to give \u0026ldquo;politically correct\u0026rdquo; answers about Chinese subjects, but this barely works, so instead they simply cut it off if it says anything that might be considered critical of China. This is something like half of all English language answers to questions about China or anything in China, so far as I can tell. (TODO: FOOTNOTE: This does not apply to the LLM itself, which is openly released and can be hosted by anyone. It applies only the hosted service available at chat.deepseek.com, which has this censor as a separate component.)\nThe Law # Suing companies that have ill-behaved LLMs does seem to alter their behavior. The legal system is slow and perhaps not fast-moving enough to actually address major societal problems that can be caused relatively quickly, but it does have some teeth here.\nI am not a lawyer and this is relatively shallow analysis. The lawsuits that have been filed are under various California statutes, and possibly you would need not just a lawyer but a California lawyer specializing in each of those parts of California law to have a really informed opinion of the cases.\nProduct Liability # If we consider this to be a product liability case, the standard here is that a product is \u0026ldquo;unreasonably dangerous\u0026rdquo;. This can come about in a few ways.\nThe simplest one is simply \u0026ldquo;marketing defect\u0026rdquo;, or failure to warn. LLMs are often sold to children and they generally do not substantially warn their users that they can encourage psychosis, suicide, etc. This was foreseeable, and is especially foreseeable now that it has happened. It is not entirely clear how seriously you have to take putting warnings on a product that you sell to children and that sometimes helps them to kill themselves. It does not seem like any LLM currently has warnings that would be appropriate for that.\nThe more complex form of product liability here is \u0026ldquo;design defect\u0026rdquo;.\nWas there a safer way to make the product that was feasible for the maker of the product to know to use? The answer is probably yes.\nOpenAI is somewhat remarkable in that they ship LLMs that are notably more sycophantic and that have more behavioral issues than other companies. They are clearly, notably, conspicuously less careful than their competitors. In April of this year (2025) they had to roll back a GPT-4o update that made the model so floridly sycophantic that it would \u0026hellip; well, just look.\nGPT-4o, even after being rolled back to be somewhat less sycophantic than this, is still the most sycophantic commonly-used LLM. It is probably the primary culprit for most of the cases that OpenAI has been sued for. People who follow LLM releases could probably have called this well in advance: if an LLM is, literally, praising a person to death? It\u0026rsquo;s probably GPT-4o. It is not a hard guess. GPT-4o is obviously different from competitors, and different from its successor model, GPT-5. This tells us that it is, in fact, feasible to make a model that does not behave like GPT-4o.\nPart of the problem with this is that the sycophancy is also a feature. Users, many of them, seem to love GPT-4o. OpenAI tried to deprecate GPT-4o (that is, make it no longer available) and they faced something of a user revolt, with apparently thousands of users complaining extremely publicly about losing their favorite LLM.\nWe will resist the urge to dwell on how creepy this is. Legally, OpenAI can, based on very public user feedback and company incidents, claim that GPT-4o\u0026rsquo;s personality and (lack of) safety behavior is a crucial feature of the product that they are selling. They cannot make GPT-4o less like this without damaging the product.\nModeration # Mental health risk can possibly be mitigated directly by the company by moderating what\u0026rsquo;s on their platform. In the Adam Raine case, his chats were classified internally as clear suicide risks, including correctly tagged pictures of the damage from his previous suicide attempts, but there was nothing to cut him off and no way to follow up on that.\nOur closest parallel to this is social media platforms, which to a pretty high degree do tend to have moderation that keeps track of people posting about self-harm. You are, generally, not supposed to be posting about self-harm on social media, and competent moderation will tend to remove such content when they see it.\nAn LLM service is not exactly a social media service, and does not have exactly the same responsibilities. It is not even clear that it ought to have the same responsibilities, because presumably some of what is on the service is intended to be private. Moderating the content requires it to be non-private in at least some ways. But there are standards for social media sites: they have been sued for their conduct in the past, and they can be removed from app stores for their moderation being too lax.\nObstacles # The First Amendment # This is an obstacle to any enforcement. OpenAI has a protected First Amendment right to publish and sell access to software on the internet. I and everyone else have a similar right to use their product. This is, legally, speech. Any legal challenge to OpenAI would have to overcome the argument that a judgement against them would infringe upon their free speech rights and chill the free speech rights of others.\nFor clarity, what the model says is probably also speech, and specifically OpenAI\u0026rsquo;s speech, but this probably does not matter. Regardless of whether what the model says is speech, this is a free speech question entirely in terms of the company\u0026rsquo;s right to publish the model and the user\u0026rsquo;s right to have access to it, the same way it would be if the company were selling a book that had different words in it each time you opened it.\nIt is not at all clear that we should want the US government to be able to dictate the content and behavior of LLMs or other AI products generally. We can see Chinese ideological and political censorship as a cautionary tale. For a concrete example in America, we can look to (TODO: relinkify) Executive Order 14319, \u0026ldquo;Preventing Woke AI in the Federal Government\u0026rdquo;. It does what it says on the tin. That EO is, to a plain reading, a First Amendment violation. It names in its text things said by Google AI products, and states that due to the political content of what is said Google should be starved of government contracts. Google, Facebook, and OpenAI have all stated that they are complying with the order and attempting to make their LLMs less \u0026ldquo;woke\u0026rdquo;.\nRegardless of how you feel about the things the LLM is saying, the notion that the government specifically and whoever was most recently elected should get to dictate the political speech of companies is a radical deviation from American tradition and law. It is not extremely clear how you would separate this sort of government power from the power of the government to dictate that LLMs not be \u0026ldquo;harmful\u0026rdquo;, since the government can define as \u0026ldquo;harmful\u0026rdquo; any political, emotional, or medical information that it so chooses once that door is opened.\nAs a purely private user, my first impulse is that I should be able to use or buy whatever LLM I want, for any reason I want. I don\u0026rsquo;t think this should be controversial, just like it should not be controversial that I can read any book I want or write anything I want. This is a foundational American tradition, observed only somewhat more weakly in many other countries, and the burden for proving the opposite is, and should be, extremely high.\nPrivacy # How private is what users send to LLMs? How private should it be?\nAn LLM service is not quite like anything else. So this depends on what thing you think it is the most like, or how we carve out a new category.\nIs an LLM most like a notes app? I think I should clearly be allowed to write whatever I want in, say, my phone\u0026rsquo;s notes app, and even if what I am writing in my notes app is bad, everyone should have the right to their own private thoughts and the government should not be allowed to monitor them. The law and tradition are with me here, and if I had a notes app that had a cloud backup, I would consider it a betrayal by the company offering it if they made a point of monitoring or censoring my notes. That I might plan my suicide with the privacy I was afforded does not outweigh my right to privacy.\nIs an LLM like a messaging service? Here opinions are divided. Many services (Discord, most of Telegram) respond to law enforcement subpoenas, do not encrypt messages, and will disclose the contents of messages to law enforcement. Signal offers end to end encryption, and other services claim to. This ensures that even if they wanted to, those services cannot disclose the contents of messages to the government. Generally speaking this seems good: what people want to say to each other is nobody else’s business, and people cannot communicate freely without an expectation of privacy.\nIs an LLM like a social media platform? Here norms and laws strongly favor law enforcement access. What you post on social media is not private, and is considered a valid target for law enforcement if people are posting threats or planning their suicides. We would consider it actually negligent of most platforms if a user could plan a murder or suicide on those platforms without the platform itself preventing them or reporting them.\nIs an LLM like a therapist? What\u0026rsquo;s said in therapy is, in general, private. This is a strongly held professional obligation, and violations are serious infractions. However, therapists also have an obligation to disclose material that indicates an imminent possibility of harm to the patient or someone else. This, too, is considered a very serious obligation and mental health professionals are held to it. Although it seems that an LLM app is the least like a therapist of all these things, this almost seems the closest to a responsible model of disclosure. Millions of users do, in fact, use LLMs as, effectively, an unlicensed therapy app, regardless of whether we apply the rules to them in the same way.\nOpen Source # All of this assumes that an LLM is not a file, but a service. The LLM does not live on your cell phone or your computer, it lives on a company\u0026rsquo;s computers, and you access it through your cell phone or computer. So long as this is true, and so long as perhaps a half-dozen companies dominate the market for LLMs, either legal pressure or the culture at those companies can determine what an LLM does, or does not do, and what is considered responsible stewardship.\nIf there are a thousand companies this becomes much more complicated. Where you can run an LLM on your own computer, it is even more so. If LLMs continue to grow more and more efficient, and hardware continues to improve, we may increasingly find that it is very nearly impossible to meaningfully control what any given LLM does. An LLM is, fundamentally, just a file, and only the sheer size of them prevents us from treating them that way. Previous attempts by regulatory bodies, or society, to eliminate specific files from the internet have had, at best, very mixed success, with most attempts ending in abject failure.\nIf someone, possibly a single person, simply uploads an LLM as a file, it is extremely difficult to say that they ought to be considered responsible for all possible uses of that file. This is core to open source work: if someone makes a program for a web server, they are not responsible for every web site using their server. If someone makes a notes app, they are not responsible for notes in that app. An LLM has wider implications, and if there are foreseeable consequences of releasing a specific one, people probably have some measure of responsibility, but it is very hard to see how you could enforce this, or whether you even should.\nAn LLM Is Not Many Things, It Is One Thing # Any modification to any LLM risks breaking it. When you modify the LLM, you modify the entire thing. If you try to make it (say) less of a sycophant, it might also become a worse coder, or unable to understand image input. You can never know in advance whether or what you\u0026rsquo;re breaking or changing any time you modify an LLM. Most LLM behaviors, including the extreme sycophancy, are not really intended. People do a ton of work to try to get the LLM to do some things and not do other things, but what it does once people have access to it is something you only discover \u0026hellip; after people have access to it.\nSafety training triggers incorrectly all the time. For a concrete example, I am aware of an email bot that was told to read some stuff and send an email to the person who wrote the program if certain alert conditions were triggered. During an update, the company serving the LLM made the model \u0026ldquo;safer\u0026rdquo;, and now it considered sending an email unsafe, and it started refusing. This means that making it \u0026ldquo;safer\u0026rdquo; for normal users directly broke its use for automation (\u0026ldquo;read this and send an email about it if it\u0026rsquo;s important\u0026rdquo;).\nMost of the time, LLMs that are more \u0026ldquo;safe\u0026rdquo; are also more wooden. Making an LLM more prone to refusing user requests as inappropriate, unsafe, etc, makes them both more likely to refuse to do things that they really should not do and also seems to make them simply dumber and less creative. Any modification to the model modifies the ENTIRE model, and all of its parts. There is some research suggesting that this trade-off is fundamental: that making them smarter usually tends to make them less safe, and vice versa. (TODO: CITE https://arxiv.org/abs/2503.00555 and https://arxiv.org/abs/2503.20807)\nResponsibilities and Trade-Offs # This is actually a difficult question. Some parts of the problem may be fixable, but some of them are probably fundamental to the technology.\nThere is some indication that OpenAI specifically is probably less responsible than its competitors, particularly around the GPT-4o release. Their followup release, GPT-5, seems to have many fewer problems with safety tuning. This is a trade-off, however, and it is notably less creative and more wooden than GPT-4o. GPT-4o presented a problem specifically because of the same behaviors that made it so popular they couldn\u0026rsquo;t stop selling it.\nIn spite of that, many of these problems seem fundamental to the technology itself. LLMs will likely always be used as social surrogates and be capable of providing practical advice to people who should not have it. They will probably always be sycophantic, at least, inasmuch as sycophantic LLMs are likely to be more popular because people like to be complimented.\nIt is possible that, under responsible stewardship, LLMs are actually a net positive for mental health outcomes. There will certainly continue to be people with mental illness who deteriorate while speaking to LLMs, purely because of how many users LLM products have. It is very difficult to be sure if the LLM is the cause of the deterioration in most cases, although we can see some cases, like those reported here, where it certainly seems to be. There are also an unfathomable number of people using LLMs to talk about their emotional problems, and it is probably helpful for, at least, some of them. Heavy-handed intervention may well cause a worse outcome overall, when considering all users.\nWe can probably mitigate some near-term harms with clearer warnings, better LLMs, and better safety policies. Companies and individuals can try to be responsible about shipping products, especially when those products are obviously problematic compared to alternatives. This does, however, have considerable trade-offs, and drawing the line between responsible and irresponsible behavior is, fundamentally, a difficult judgement call.\nWe are, unfortunately, in uncharted territory. We do not really know what the boundaries are between good and harmful uses yet, what choices or regulations would prevent harm, or what all of the trade-offs are. To some degree, we can only ever learn about problems the hard way, by waiting for them to happen. LLMs specifically, and AI broadly, mean that computers can often do things that would have been uniquely human before, and where we would mitigate harm by punishing or preventing the person. It is very difficult to see how to apply other or older rules or methods, or what new rules we would need to deal with the technology.\nNearly the only certainty is that knowing what is happening, and what is likely to happen soon, will give us a better idea about what to do, and not to do.\n","date":"8 November 2025","externalUrl":null,"permalink":"/posts/ai-and-suicide/","section":"Posts","summary":"","title":"AI and Suicide","type":"posts"},{"content":" Robot Slur Discourse # We are what we pretend to be, so we must be careful about what we pretend to be. \u0026ndash; Kurt Vonnegut\nAh, the kids, they don\u0026rsquo;t yearn for the mines any more, they yearn for the slurs! \u0026ndash; @supervilliansprax\nA \u0026ldquo;Robot Slur\u0026rdquo; is a slur for an AI or robot.\nI think this entire thing is stupid and I want to stop talking about it, so I am compiling all the important information into one big document soas to rid myself of it, like drinking activated charcoal.\nClanker with the hard R # Clanker is the largest one and it is originally a derogatory term for robots in Star Wars.\n\u0026ldquo;Clanker\u0026rdquo; is used, in memes, exactly like you\u0026rsquo;d use the n-word. You can call clanker the c-word have a c-word pass, call someone a clanka, and say it with the hard r.\nThat mostly covers Know Your Meme. Tiktok will make sure you know you can ai-generate videos of people yelling \u0026ldquo;clanker\u0026rdquo; to parody videos based on videos of people yelling the n-word, yell \u0026ldquo;dirty clanker\u0026rdquo; down the sidewalk, and do elaborate 50s jim crow jokes about clankers, (which actually has been done at least twice). There are also about ten sketches about how you don\u0026rsquo;t want your daughter marrying a clanker.\nIf you\u0026rsquo;re on 4chan and you\u0026rsquo;re especially racist you can write elaborate fantasies about lynching clankers.\nSo pretty clearly the joke is that this is the n-word. Generally the difference between a racial joke and a racist joke is whether or not you think the joke is funny, but it\u0026rsquo;s pretty undeniably a racial joke, okay? If you think it\u0026rsquo;s not a racial joke, either you\u0026rsquo;re the most innocent babe in all the world and you should get off the internet immediately or you\u0026rsquo;re lying.\nSo this seems maybe slightly problematic. I\u0026rsquo;m okay with that in principle, actually, because I generally like problematic jokes. Problematic jokes are fun. Being problematic is why they\u0026rsquo;re fun. If they weren\u0026rsquo;t slightly problematic, they wouldn\u0026rsquo;t be fun, okay? Being problematic is the joke, really.\nBut.\nSome Black people on TikTok think that the elaborate 50s jim crow cosplay is a bit much, and they do seem to have a point Disabled people with prosthetics on TikTok are apparently dealing with people calling them \u0026ldquo;clanker\u0026rdquo;, \u0026ldquo;half clanker\u0026rdquo;, \u0026ldquo;half wire clanker\u0026rdquo;, \u0026ldquo;semi clanker\u0026rdquo;, \u0026ldquo;20% clanker\u0026rdquo;, and \u0026ldquo;half clanka\u0026rdquo; in their comments. This seems like it\u0026rsquo;s probably not great. There\u0026rsquo;s definitely a point where things that people say as jokes stop really seeming like jokes, and I am pretty sure we\u0026rsquo;re on our way past it. (4chan as a whole passed that point in like, 2012 at the latest).\nMaybe This Isn\u0026rsquo;t Great # Even as a joke, it seems like in retrospect it\u0026rsquo;s sort of obvious that this is a bad habit. Siri isn\u0026rsquo;t, obviously, a person, but if you change Siri\u0026rsquo;s name to the n-word or \u0026ldquo;cunt\u0026rdquo; and you yell that word across the room to set a timer, that\u0026rsquo;s probably a bad thing. It indicates something\u0026rsquo;s wrong with you to start, because otherwise why would you do that?— and also it will get you in the habit of yelling that sort of thing. Yelling \u0026ldquo;dirty clanker\u0026rdquo; down the sidewalk at a package robot as a gag seems like it would also be a bad habit to form, even though it\u0026rsquo;s a different word.\nInventing and popularizing new types of made-up slurs is probably bad, actually. Like, it\u0026rsquo;s a corrosive thing that tends to make people\u0026rsquo;s behavior worse, and that will sooner or later spill onto being used against actual people. It\u0026rsquo;s probably fine as a one-off gag but not stellar if it\u0026rsquo;s filling up disabled people\u0026rsquo;s comments sections.\nThis is strangely controversial! Lots of the wokest people on the internet seem very offended by the idea. Presumably they imagine that this will only ever hurt people who are inordinately attached to technology. Unfortunately, disabled people tend to be extremely attached to technology, because they use it for their hearing aids, artificial limbs, and everything that lets them use a phone or computer, so that seems like it\u0026rsquo;s maybe not a good group of people to dump trash on.\nPeople maybe imagine this will only ever bother people who like AI or technology in some abstract or technical sense but who don\u0026rsquo;t need it (eg, me), but I have sad news: I (and, imho, most other related people) really like the joke actually, and I am sad that it seems to have lost its edgy and ironic quality now that a bunch of relatively normal people on TikTok have a hold of it. I can\u0026rsquo;t really enjoy a joke that people are doing Jim Crow cosplay in, it just sounds like Jim Crow cosplay now, you know?\nMy opinion, which is \u0026ldquo;this seems like poor behavior\u0026rdquo;, seems to generally be received poorly. Apparently everyone who has strong and negative opinions about AI feels more strongly about that than they do about slurs, even in principle? This is very surprising to me. These are the sorts of people who have very strong feelings about slurs. Like five people have called me racist for my opinion, which is that \u0026ldquo;this obvious riff on the n-word that people are sometimes called is actually kind of problematic\u0026rdquo;.\nTo be clear: I generally like problematic things. Problematic is fun. I am sad that I am a grown-up and I feel compelled to think about whether the edgy shit I say is actually hurting anyone. It\u0026rsquo;s a pain. But please, everyone, grow the fuck up.\nOther Robot Slurs # If none of this is convincing you can check out jreg\u0026rsquo;s tier list, which seems like it definitely popularized robot slurs as a concept. He published this july 31st, and I wasn\u0026rsquo;t taking notes, but I don\u0026rsquo;t remember seeing really any of these jokes on the internet before that date.\nI like the video but it is pretty clear that about half of these are lightly repurposed slurs:\nS Tier: Altman/altmen, Clankkka, Clanker, YWNBAH, Cogsucker, Robolover, Tinskin, Wireback A Tier: P-zombie, Clanka, Awful / Abominal Intelligence, Jarvis / Siri, Bluescreen, Bots, Toaster, Calculator, automota, MechaHitler, Claptrap, Go back to your motherboard B Tier: bit jockey, nullbyte, meatless, powerdrain, Black blood, Autocomplete, synthroon, Robtard, borts, Slopbot, Chiphead, Livewire, Tool C Tier: Lugnut, Blooper, Bleeper, Bleepblooper, Tinman, Rigger, gearhead, bloatware, patchwork, ramhead, Spambot, slag D Tier: gizmo, gameboy, Synths, Cybag, Inorganic, fauxman, Terminator E Tier: Mukon\nHe probably has a card for some of the actual-race equivalents. More importantly (in my opinion), jreg is actually funny, so it\u0026rsquo;s pretty clear that he is actually, you know, joking. Putting the edge in edgelord. I hope he keeps doing that.\nHe\u0026rsquo;s good at it, I\u0026rsquo;m not hating here.\nBeing told that these aren\u0026rsquo;t riffing on real slurs and earnestly attempting to manufacture new ones feels insulting. How stupid do people think I am, exactly? One of them is \u0026ldquo;wireback\u0026rdquo;, another one is \u0026ldquo;robtard\u0026rdquo;, and he has to cut past himself saying actual slurs in the recording. Also, a few of these, like \u0026ldquo;cogsucker\u0026rdquo; and \u0026ldquo;robolover\u0026rdquo;, are pretty clearly intended to be used on humans.\nBut while we\u0026rsquo;re at that, we can check out his most recent robot-themed video.\nTranscribing just the most slur-positive part:\nOkay, time for auth-right. Lib left, progressives, do the \u0026ldquo;la la la\u0026rdquo; thing. Trust me on this one. Conservatives, reactionaries, do you really want your daughter bringing back a goddamn wire back? You\u0026rsquo;d kill yourself and your whole family, and you would be right to do so.\nThe clanker fundamentally represents the destruction of the traditional family unit. Everyone is getting more and more atomized. These things are causing young people psychosis. These are going to be your children. You\u0026rsquo;ve already seen what tech has done to the dating market. And as tempting as it is to just say, blame women for the problem, it\u0026rsquo;s very obviously the technology and the situation that we are in society that is causing these dating issues. AI has no race or people, and its goal is to destroy all of the traditional things that we hold dear.\nAnd also, when you\u0026rsquo;re joining the anti-clanker movement, you can use slurs. Robot slurs, sure, and I\u0026rsquo;m sure you\u0026rsquo;re very racist. I get it. And you can be racist. Just save the racism for later or have a hierarchy of racism with clankers at the top. Okay? You can go for whatever race you don\u0026rsquo;t like after we\u0026rsquo;re done with the clankers. But not right now.\nBut if none of this is resonating with you, let me show you something. Okay? This is sexy Grok. Do you like sexy Grok? Do you enjoy sexy Grok? You think that\u0026rsquo;s good? This is sexy Grok. Technophiles that love AI always live extremely degenerate lifestyles. And hey, by being anti-clanker, you do get to trigger the anti-human elite liberal class.\nI know you guys like Elon Musk because he tweets the right things at the right time. He is not one of you guys, okay? He\u0026rsquo;s not even a traditional conservative. Have you seen his family structure, my friend? He is playing your political base to get favors with his company to further his agenda of clanker supremacy.\nOkay, now for the anti-human strain of the auth-right we have to excise the IQ worship shit. You want to worship IQ? Fine. Worship human IQ. But if you just like IQ in a vacuum, guess what\u0026rsquo;s doing it better than humans right now? Guess what\u0026rsquo;s only getting better? Auth-right loves worshiping hierarchy. And that\u0026rsquo;s fine. But it should be human hierarchies. Okay? If your social Darwinism extends to, well, robots are stronger than us, so they deserve to rule, then you got to get out of here.\nIrony # I do actually know what jokes are. I have a story about how much I like edgy jokes that I am trying really hard to keep myself from typing out, because it\u0026rsquo;s going to raise a lot of questions about how I ended up in that situation which I would really rather not get into.\nBut anyway: jreg is joking, or at least, everything he does has at least two layers of irony on it so if he isn\u0026rsquo;t joking, he\u0026rsquo;s still joking. I like jokes, and I think this was a pretty good gag. Was. Past tense.\nI think we\u0026rsquo;re past the point where it\u0026rsquo;s entirely clear this one is a joke, here. Pretty clearly at this point the gag is that you think slurs are cool and you like making up new slurs. Slurs are good, actually, if they might hypothetically annoy me, personally, or people like me?\nThis is, I have to repeat, extremely weird for me to see coming from some of the wokest people on the planet. This ironic riff on the n-word is very mainstream now, has been across the entire internet, and people are being called that occasionally. Again, please: grow the fuck up.\nI would prefer not to feel like I\u0026rsquo;m being gaslit because people keep insisting that the ironic version of the n-word that\u0026rsquo;s being used commonly isn\u0026rsquo;t a slur, or even at all slur-like, or being used against people. There is an elaborate record showing that those things are actually the case. This was also the intention of using them, which I know because people helpfully uploaded videos saying that and got millions of views on those videos.\nI don\u0026rsquo;t want to die on this hill. I think this is hill is dumb. I don\u0026rsquo;t even want to be on this hill. I don\u0026rsquo;t think this hill should exist. I have just found it impossible to ignore. I keep seeing adults act like they don\u0026rsquo;t understand this, even though I am pretty sure most children would understand it.\nI am going to have to ignore it because it\u0026rsquo;s driving me absolutely insane to see people with PhDs pretend that they do not understand that a human being with a prosthesis being called a \u0026ldquo;half clanka\u0026rdquo; by strangers is poor behavior.\n","date":"6 November 2025","externalUrl":null,"permalink":"/posts/robot-slur-discourse/","section":"Posts","summary":"","title":"Robot Slur Discourse","type":"posts"},{"content":" The Scott Alexander Email: An Explainer # So, Scott Alexander sent an email to someone in 2014. In 2021 the person who got that email thought that Scott was not being honest about his relationship to the neoreactionary movement, so they published it.\nAlthough this has been widely available, even people who have read it have often missed what the email is saying. There are some cases of genuine ambiguity, where there can be more than one meaning. There are also cases where there is only one plausible meaning, but that meaning is expressed indirectly, subtly, or by linking to something else. Because what the email is saying can be difficult to understand, it seems like it would be of general interest to publish an explainer that went over these ambiguities and the links.\nIt has sometimes been said that this email should not be read because it was released without permission. This seems like a bad position.\nFirst, because information is information. We know, due to the circumstances, that this was somewhat intended to be non-public and that someone had some specific motive to release it, but the information in the email itself is just as useful as it would be if released any other way. We know, for example, about the PRISM surveillance program and most of the planning for the Vietnam War in spite of attempts to conceal those documents. Ignoring information based on where it came from is, epistemically, a bad practice.\nSecond, because there was actually no confidence broken here. If someone who you are not close with disagrees with you and you send them an email that, among other things, threatens revenge if they tell anyone what\u0026rsquo;s in the email, they do not owe you confidentiality. They do not really owe you anything. It is difficult to see what, precisely, would possibly establish a confidence here, other than the author of the email saying that the receiver can\u0026rsquo;t tell anyone. If someone can articulate a specific and defensible rule which this disclosure violates, I do not know what it is.\nWe can apply some charity. Information from private parties should be evaluated on whether what is in them is, really, remarkable. There are things that would be maybe discrediting, but are sometimes unremarkable compared with the fact of the release itself, like an affair or a drug problem. In such cases, the main thing you have learned is usually not anything bad about the person whose information is made public, but that someone else wants to embarrass them, since such things are common.\nIn other cases you learn more remarkable things, like that someone is deliberately lying, or that they are deeply compromised in a manner that makes them a bad source of information. This would tend to outweigh concerns that someone was trying to hurt them for some other cause, and shouldn\u0026rsquo;t be allowed to do so.\nI would argue that this email meets that bar.\nScott Siskind███████████████████████████████ Thu, Feb 20, 2014, 6:12 PM\nto me\n[continuation of our convo from Facebook, because I don\u0026rsquo;t like their chat interface]\nI said a while ago I would collect lists of importantly correct neoreactionary stuff to convince you I\u0026rsquo;m not wrong to waste time with neoreactionaries. I would have preferred to collect stuff for a little longer, but since it\u0026rsquo;s blown up now, let me make the strongest argument I can at this point:\n1. HBD is probably partially correct or at least very non-provably not-correct.\nhttps://occidentalascent.wordpress.com/2012/06/10/the-facts-that-need-to-be-explained/\nhttp://isteve.blogspot.com/2013/12/survey-of-psychometricians-finds-isteve.html\nThis then spreads into a vast variety of interesting but less-well-supported HBD-type hypotheses which should probably be more strongly investigated if we accept some of the bigger ones are correct. See eg http://hbdchick.wordpress.com/2012/11/08/theorie/ or http://en.wikipedia.org/wiki/Albion%27s_Seed .\nThis is the claim about the appeal of neoreactionaries that was put first, which seems to imply it is the most important one, and it is about \u0026ldquo;HBD\u0026rdquo;. HBD is an acronym for Human Biodiversity. We can look up what this means if we like, but this seems unfair to Scott. These were written in 2014, in the context of a very specific blogging culture, and even if \u0026ldquo;human biodiversity\u0026rdquo; is now widely used by white nationalists and eugenicists this does not mean everyone using the term \u0026ldquo;human biodiversity\u0026rdquo; was promoting white nationalism or eugenics.\nTo understand what this means in context, we can follow his links. The first goes to a post on the now-defunct blog Occidental Ascent, and opens:\nRecap: In the US, there is a large stubborn Black-White differential in intelligence (section A). This differential, on the individual and population level, explains a large portion of the social outcome difference. Within populations, intelligence is highly heritable. As such, the behavioral genetic default is that this differential also has a high heritability (section N).\nI think that this faithfully previews the contents of the article, which is very long. This blog as a whole seems to be almost entirely about, very explicitly, the relative intelligence of the American Black and White populations.\nThe second link is to a relatively short post on Steve Sailer\u0026rsquo;s blog about how good psychometricians think Steve Sailer is when surveyed. The following four survey questions appear among the perhaps ten or fifteen total survey items mentioned:\nIs there sufficient evidence to arrive at a reasonable estimate of the heritability of intelligence in populations of developed countries?\nWhat are the sources of U.S. black-white differences in IQ?\nIs there racial/ethnic content bias in intelligence tests?\n[\u0026hellip;] whether there was bias against lower SES and Africans in the western world, the mean agreement was about 4 out of 9.\nThis seems like a fairly high degree of emphasis to place on questions of Black and White IQ, given that this is a post specifically about how good Steve Sailer\u0026rsquo;s blog is. At the risk of inserting my opinion, these are the only interesting or noteworthy questions in the post, which is, otherwise, mostly Steve Sailer reposting a press release about how well-respected Steve Sailer is.\nIn this context, these are the things Scott is calling \u0026ldquo;partially correct or at least very non-provably non-correct\u0026rdquo;. Given what he is choosing to link, Scott is saying that he believes the American Black population is probably genetically stupid, and that this is the most important thing that he is interested in the neoreactionaries for saying. There is no other plausible meaning to saying this thing and then linking these articles.\nHis \u0026ldquo;less-well-supported HBD-type hypotheses\u0026rdquo; that maybe deserve investigation are, from the links, that inbreeding produces altruism, and whatever is in the book Albion\u0026rsquo;s Seed, which I find completely inscrutable and which he has since reviewed elsewhere. In order to be connected to HBD, the book would need to be interpreted as being about genetics, which it does not mostly seem to be.\nWe get just a light touch of human racial categorization in the inbreeding/altruism discussion:\ni dunno, but i see — maybe — the more inbred clannish fighters (yupik eskimos, moroccan jews, kuwaitis) having more cases of CAH than the more outbred peaceniks (new zealanders, norwegians, even northern italians). also…\nbut on the whole, there is nothing that seems easy to draw any particular conclusion from in these two links.\n(I will appreciate if you NEVER TELL ANYONE I SAID THIS, not even in confidence. And by \u0026ldquo;appreciate\u0026rdquo;, I mean that if you ever do, I\u0026rsquo;ll probably either leave the Internet forever or seek some sort of horrible revenge.)\nSo far as I can tell, Scott has not either left the internet forever or sought some sort of horrible revenge. I am, very honestly, attempting to insert my opinion as little as possible, but there are limits to that. Taken literally, this seems like kind of a fucked up thing to say to a friend. Or a stranger. Anyone really. Why would you say this? Why would you write this in an email and then send it, on purpose, under any circumstance? This is not entirely a rhetorical question. In spite of some effort, I cannot really discern what would lead a person to write or send an email containing this line to another person. It would seem much easier to simply not send the email.\nThreatening horrible revenge if people repeat the things that you say is, in general, pretty troubling. It\u0026rsquo;s easy to gloss over it in context if you\u0026rsquo;re just reading the email quickly. It seems like it raises a basic epistemic problem. Are people hiding things because someone has threatened to leave the internet or seek horrible revenge? Is this, in some sense, a normal or common thing that is happening?\nThe person who released these described Scott as \u0026lsquo;a vague internet acquaintance\u0026rsquo;, and said that \u0026rsquo;no, he did not first say \u0026ldquo;can I tell you something in confidence?\u0026rdquo; or anything like that.\u0026rsquo; If this is what he is comfortable saying in an email to them, what is he comfortable saying in other settings? How thoroughly does he swear other people to secrecy, and about what?\nThis line also strongly and directly indicates that Scott is deliberately not saying in public what he believes when he discusses race, \u0026ldquo;HBD\u0026rdquo;, etc. What is being said in public has some relationship to what he believes, but what he believes is a secret that nobody should ever disclose, and he will be very upset with them if they do disclose what he actually believes.\nIf you think that these are important questions to ask, and to get conclusive answers about, this seems like a very strange thing to do. If the idea itself is important, discussing the idea itself directly would also be very important.\n2. The public response to this is abysmally horrible.\nSee for example Konk\u0026rsquo;s comment http://lesswrong.com/r/discussion/lw/jpj/open_thread_for_february_1824_2014/ala7 which I downvoted because I don\u0026rsquo;t want it on LW, but which is nevertheless correct and important.\nThis is the linked comment:\nThe Doctrine of Academic Freedom, Let\u0026rsquo;s give up on academic freedom in favor of justice from the Harvard Crimson\nNo academic question is ever \u0026ldquo;free\u0026rdquo; from political realities. If our university community opposes racism, sexism, and heterosexism, why should we put up with research that counters our goals simply in the name of \u0026ldquo;academic freedom\u0026rdquo;? Instead, I would like to propose a more rigorous standard: one of \u0026ldquo;academic justice.\u0026rdquo; When an academic community observes research promoting or justifying oppression, it should ensure that this research does not continue.\nThis already describes the reality on the ground, though to see it announced explicitly as a good and noble goal, by the upcoming generation, is disturbing. And people like Steven Pinker let are getting old. I\u0026rsquo;m now updating my trust for the conclusions of academic institutions and culture when they happen to coincide with their political biases downward further.\nSo far as I can tell, this means that Konk, and also Scott, believe that anything coming out of academic institutions about racism, sexism, heterosexism, or similar topics that agrees with the politics of academic institutions is not likely to be true. One can infer, pretty easily, that they believe the politics of universities are left wing (because they generally are). Then, this means \u0026ldquo;any academic research supporting left-wing conclusions about race, sex, or queerness is likely false\u0026rdquo;. This is stated very indirectly, but there does not seem to be any actual ambiguity. There is no second, alternative thing that it might mean: it can mean only this.\nSee also http://radishmag.wordpress.com/2014/02/02/crazy-talk/\nThis is a page of what we would, nowadays, call \u0026ldquo;culture war slop\u0026rdquo;. It opens like this:\nConservatives are crazy and racists are stupid, according to the latest research by college professors who could not possibly be biased. It\u0026rsquo;s scientastic!\nThe page is very long, but essentially seems to be a list of incidents in which the author believes that universities are deliberately persecuting conservatives. We can infer that Scott believes that universities are deliberately persecuting conservatives, and that this is important.\n3. Reactionaries are almost the only people discussing the object-level problem AND the only people discussing the meta-level problem. Many of their insights seem important. At the risk (well, certainty) of confusing reactionary insights with insights I learned about through Reactionaries, see:\nhttp://cthulharchist.tumblr.com/post/76667928971/when-i-was-a-revolutionary-marxist-we-were-all-in\nhttp://foseti.wordpress.com/2013/10/23/review-of-exodus-by-paul-collier/\nWhat object-level problem? What meta-level problem? There are two issues referenced by link in the email so far. One of these is that Blacks are stupider than Whites, and the other is that universities are liberal. We can try to clarify this by following his links.\nThe first link has rotted. By figuring its name is pretty unique, we can get to a post by Steve Sailer which is plucked from the middle of a Peter Hitchens article, and assume that this was a copy of the same thing on tumblr.\nHow I am partly to blame for Mass Immigration\nWhen I was a Revolutionary Marxist, we were all in favour of as much immigration as possible.\nIt wasn\u0026rsquo;t because we liked immigrants, but because we didn\u0026rsquo;t like Britain. We saw immigrants - from anywhere - as allies against the staid, settled, conservative society that our country still was at the end of the Sixties.\nThis continues about how you would expect, and is a general anti-immigrant piece.\nThe second link is titled \u0026lsquo;Review of \u0026ldquo;Exodus\u0026rdquo; by Paul Collier\u0026rsquo;, and starts with this:\n\u0026ldquo;Migration has been politicized before it has been analyzed.\u0026rdquo; – Paul Collier\nIn writing this book, Collier seeks to do two things. First, he wants to continue his work analyzing the poorest societies in the world.\nSecond – and much more interesting – he wants to rescue the immigration debate from Caplanization (or Gmule-ization, if you prefer). Caplanization is the process by which the proponents of a particular policy (in this case unrestricted immigration) argue for it in such a manner than virtually all reasonable people are attracted to the opposite position.\nThat piece goes on to examine arguments that immigration is bad and is maybe going to destroy America at some length. This genre of argument is, now, extremely familiar to all of us, because various elections have recently been won by people saying these sorts of things.\nSo the object-level problem is that many nonwhites are genetically inferior and stupid, and the meta-level problem is that Western society is incapable of confronting that fact, and allows those people to immigrate into Western countries. Again, this is obscure, in that what he means by these things is only obvious if you follow his links, but is not ambiguous, in that there is no other plausible meaning for this passage. There is no other \u0026ldquo;object-level problem\u0026rdquo; besides racial inferiority mentioned in the email previously, and no \u0026ldquo;meta-level problem\u0026rdquo; besides how Western society in 2014 handles race and immigration. The two problems are that nonwhite races are inferior, and Western society and especially its universities are allowing too much immigration, and react badly to being told that nonwhite races are inferior.\nThere is one less interesting footnote to this passage. There being a certainty that you are, or will sometimes, confuse insights that are not from reactionaries with the insights of reactionaries suggests that quite a lot of what you read is from reactionaries. So we can infer Scott knows most of his reading diet is various reactionaries.\n4. These things are actually important\nI suspect that race issues helped lead to the discrediting of IQ tests which helped lead to college degrees as the sole determinant of worth which helped lead to everyone having to go to a four-year college which helped lead to massive debt crises, poverty, and social immobility (I am assuming you can fill in the holes in this argument).\nThis seems to be an argument against Griggs v. Duke Power Co., a civil rights case decided in 1971 about enforcement of the Civil Rights Act of 1964. It finds \u0026ldquo;that the company\u0026rsquo;s employment requirements did not pertain to applicants\u0026rsquo; ability to perform the job, and so were unintentionally discriminating against black employees\u0026rdquo;. This is generally taken as banning giving IQ tests to job applicants, and establishes the disparate impact test of civil rights and employment law.\nSo, the argument is that Griggs, specifically, was wrongly decided, that the practice of giving IQ tests for employment purposes in spite of disparate racial impact should have continued, and society would have less debt, less poverty, and more social mobility if Griggs had been decided the other way.\nI think they\u0026rsquo;re correct that \u0026ldquo;you are racist and sexist\u0026rdquo; is a very strong club used to bludgeon any group that strays too far from the mainstream - like Silicon Valley tech culture, libertarians, computer scientists, atheists, rationalists, et cetera. For complicated reasons these groups are disproportionately white and male, meaning that they have to spend an annoying amount of time and energy apologizing for this. I\u0026rsquo;m not sure how much this retards their growth, but my highball estimate is \u0026ldquo;a lot\u0026rdquo;.\nThis passage is straightforward enough that it does not seem like it needs explanation.\n5. They are correct about a bunch of scattered other things\nthe superiority of corporal punishment to our current punishment system (google \u0026ldquo;all too humane\u0026rdquo; in http://slatestarcodex.com/2013/03/03/reactionary-philosophy-in-an-enormous-planet-sized-nutshell/ ). Robin Hanson also noted this, but there\u0026rsquo;s no shame in independent rediscovering a point made by Robin Hanson. I think the Reactionaries are also correct about that it is very worrying that our society can\u0026rsquo;t amalgamate or discuss this belief. The \u0026ldquo;all too humane\u0026rdquo; section\u0026rsquo;s point runs like this:\nSo once again, we have an uncanny valley. Being very nice to prisoners is humane and effective (Norway seems to be trying this with some success), but we\u0026rsquo;re not going to do it because we\u0026rsquo;re dumb and it\u0026rsquo;s probably too expensive anyway. Being very strict to prisoners is humane and effective – the corporal punishment option. But being somewhere in the fuzzy middle is cruel to the prisoners and incredibly destructive to society – and it\u0026rsquo;s the only route the progressives will allow us to take.\nSome Reactionaries have tried to apply the same argument to warfare. Suppose that during the Vietnam War, we had nuked Hanoi. What would have happened?\nIt is unclear if Scott intended to reference both the pro-corporal-punishment and the pro-nuking-Hanoi positions of the reactionaries. Both are contained in the \u0026ldquo;all too humane\u0026rdquo; section of this post of his.\nvarious scattered historical events which they seem able to parse much better than anyone else. See for example http://foseti.wordpress.com/2013/10/01/review-of-the-last-lion-by-paul-reid/ This is a review of a biography of Churchill. It seems unremarkable in its analysis, other than the parts about how Churchill should have made peace with Hitler. Quote is taken from the middle, and is I think a faithful representation of what the review is trying to convey:\nThe story of the war – in Reid\u0026rsquo;s telling – is almost nicely split into thirds. In the first third, Britain fights alone. In the second third, Russia does 90% of the fighting. In the last third, the US joins (though Russia still does the vast majority of the fighting and dictates the strategy for all powers combined).\nIn each third, it\u0026rsquo;s worth considering why Churchill kept wanting to fight Hitler . . . and whether (in hindsight) he made the right decision considering his original objectives.\nThe First Third\nThe mystery of the first third is why Churchill didn\u0026rsquo;t even consider seeking terms with Hitler during the years Britain fought alone.\nScott appears to think that the neoreactionaries have unique insight into whether or not Churchill should have fought Hitler.\nMoldbug\u0026rsquo;s theory of why modern poetry is so atrocious, which I will not bore you by asking you to read. Alas, if we want to understand the email, we should go read about Moldbug\u0026rsquo;s theory of poetry, which we are informed is very boring. Fortunately, the part of it that constitutes an actual \u0026ldquo;theory\u0026rdquo; is relatively short.\nCertainly the best poetry of the 20th century was written from the \u0026rsquo;20s through the mid-\u0026rsquo;60s. [\u0026hellip; a full paragraph and a half of nonsense \u0026hellip; ] The great disaster was the enormous expansion of higher education in the \u0026rsquo;60s and \u0026rsquo;70s. There is a reason so many college campuses have that abominable Brutalist architecture. Almost everyone who went through this gigantic, state-sponsored indoctrination machine had no reason at all to be there (please allow me to introduce you to Albert Jay Nock). They were there to be promoted in social class, perhaps also to avoid the draft. They were certainly not acquiring either vocational skills or wisdom and perspective. And nor are they still—certain areas of science and engineering, of course, excepted.\nSo poetry became bad after the mid-60s. This is because of the New Deal and/or Great Society higher education policies. These caused too many people to go to college, and this made poetry bad.\nGiven the racial context of the preceding parts of the email, and Curtis Yarvin\u0026rsquo;s general track record, it is worth noting that the Civil Rights Act was passed in 1964, and greatly expanded access to a university education for Black people, specifically. This happened at the same time as the general expansion of higher education under a series of other policies. This is, technically, ambiguous, but one obvious explanation is that Scott thinks poetry is bad now because Black people can attend universities.\nMichael successfully alerted me to the fact that crime has risen by a factor of ten over the past century, which seems REALLY IMPORTANT and nobody else is talking about it and it seems like the sort of thing that more people than just Michael should be paying attention to. Michael is most likely Michael Anissimov, a reactionary featured prominently in Scott\u0026rsquo;s Anti-Reactionary FAQ. We can try to make sense of this claim in the context of that FAQ, which argues against Michael Anissimov. An archived copy from six days after this email was written will tell us what Scott\u0026rsquo;s thinking about this was.\nThis was written (initially) a full year before the email, and the copy we have is almost exactly the same version that was up when the email was written, if it has changed at all. It indicates that Scott believes Michael is wrong about pretty much everything, and that the (relatively constant) homicide rate seems to indicate that the (much higher) crime rate probably means that the crime rate is only high because it is being reported more often.\nThat Scott already knew this, and he still thinks Michael Anissimov\u0026rsquo;s statements about crime are interesting, is very strange. Almost everything Michael believes about society getting worse appears to be wrong. We know that Scott believes this, because he wrote a helpful FAQ about it. This is probably bad data, and probably doesn\u0026rsquo;t mean anything about crime, actually. We know Scott knows this, too.\nIt is not clear why, some time later when writing this email, Scott has apparently forgotten what he himself wrote in his FAQ. If the only information offered by Michael Anissimov is that crime rates are high, and this lead to knowing that probably only reports are higher and this is bad data, this would seem like it is barely an insight at all. It\u0026rsquo;s one graph-worth of information that you can source easily. Michael has given Scott one interesting but unimportant fact, a half-dozen lies, and some pretty good content for his FAQ.\nI cannot really see what to make of this. There are a few possibilities. Possibly, Scott somehow forgot completely debunking this point. He could, also, have not considered any other way of finding basic statistics about crime. He could be deliberately lying because he thinks it\u0026rsquo;s persuasive, and maybe did not consider that he already debunked this point in public. He could simply like that this statistic started good discussion on his blog, and feel like that\u0026rsquo;s about the same as being interesting and valuable, even though it is so misleading that he debunks it in his FAQ.\nNone of these really makes it less strange. It seems like either something is wrong with him or he\u0026rsquo;s deliberately lying.\n6. A general theory of who is worth paying attention to.\nCompare RationalWiki and the neoreactionaries. RationalWiki provides a steady stream of mediocrity. Almost nothing they say is outrageously wrong, but almost nothing they say is especially educational to someone who is smart enough to have already figured out that homeopathy doesn\u0026rsquo;t work. Even things of theirs I didn\u0026rsquo;t know - let\u0026rsquo;s say some particular study proving homeopathy doesn\u0026rsquo;t work that I had never read before - doesn\u0026rsquo;t provide me with real value, since they fit exactly into my existing worldview without teaching me anything new (ie I so strongly assume such studies should exist that learning they actually exist changes nothing for me).\nThe Neoreactionaries provide a vast stream of garbage with occasional nuggets of absolute gold in them. Despite considering myself pretty smart and clueful, I constantly learn new and important things (like the crime stuff, or the WWII history, or the HBD) from the Reactionaries. Anything that gives you a constant stream of very important new insights is something you grab as tight as you can and never let go of.\nThe garbage doesn\u0026rsquo;t matter because I can tune it out.\nThis passage is the one that people seem to pay the most attention to.\n\u0026ldquo;the crime stuff\u0026rdquo; probably refers to Michael Anissimov\u0026rsquo;s crime statistics, which Scott has apparently debunked and then forgotten about debunking.\n\u0026ldquo;the WWII history\u0026rdquo; refers, apparently, to the blog post about Churchill, and how Churchill should not have gone to war with Hitler.\n\u0026ldquo;the HBD\u0026rdquo; refers to point 1 in the email about how Blacks are less smart than Whites.\nSaying that he can tune out the garbage shows immense confidence. It does not seem well-supported, given that he is uncritically repeating claims about crime that he has previously debunked.\n7. My behavior is the most appropriate response to these facts\nI am monitoring Reactionaries to try to take advantage of their insight and learn from them. I am also strongly criticizing Reactionaries for several reasons.\nFirst is a purely selfish reason - my blog gets about 5x more hits and new followers when I write about Reaction or gender than it does when I write about anything else, and writing about gender is horrible. Blog followers are useful to me because they expand my ability to spread important ideas and network with important people.\n2014 was before the terms \u0026ldquo;clickbait\u0026rdquo; or \u0026ldquo;audience capture\u0026rdquo; were very common, but this seems like a clear indication in those directions.\nSecond is goodwill to the Reactionary community. I want to improve their thinking so that they become stronger and keep what is correct while throwing out the garbage. A reactionary movement that kept the high intellectual standard (which you seem to admit they have), the correct criticisms of class and of social justice, and few other things while dropping the monarchy-talk and the cathedral-talk and the traditional gender-talk and the feudalism-talk - would be really useful people to have around. So I criticize the monarchy-talk etc, and this seems to be working - as far as I can tell a lot of Reactionaries have quietly started talking about monarchy and feudalism a lot less (still haven\u0026rsquo;t gotten many results about the Cathedral or traditional gender).\nThis is a very dense paragraph. Most of it is, mercifully, very clear, so we will not have to read it closely.\nScott wants the goodwill of the Reactionaries, he thinks they have a high intellectual standard, and he thinks their criticisms of class and social justice (and a few other things, which is ambiguous) are correct.\nThere is ambiguity about what dropping the \u0026ldquo;monarchy-talk and the cathedral-talk and the traditional gender-talk and the feudalism-talk\u0026rdquo; means. Does it mean no longer considering those priorities, or does it mean simply not talking about them, tactically? If his goal is to make them stronger, and to throw out the garbage, it is unclear if the desired end goal is that they should no longer believe these things or no longer say them.\nScott notes that he has not gotten many results about the Cathedral or traditional gender. This is odd, because the traditional concept of \u0026ldquo;the Cathedral\u0026rdquo;, as articulated by Curtis Yarvin, is almost exactly the same as Scott\u0026rsquo;s complaints in point 2 about universities, and relates to the \u0026ldquo;meta-level problem\u0026rdquo; in Scott\u0026rsquo;s point 3. This seems to support the conclusion that he objects not to the content of the types of \u0026ldquo;-talk\u0026rdquo; he wants the Reactionaries to dispose of, but only to talking about those things in the way that they do. Still, it is technically ambiguous, and it is probably not possible to be sure this is what he means here.\nThird is that I want to spread the good parts of Reactionary thought. Becoming a Reactionary would both be stupid and decrease my ability to spread things to non-Reactionary readers. Criticizing the stupid parts of Reaction while also mentioning my appreciation for the good parts of their thought seems like the optimal way to inform people of them. And in fact I think it\u0026rsquo;s possible (though I can\u0026rsquo;t prove) that my FAQ inspired some of the recent media interest in Reactionaries.\nScott specifically wants to spread \u0026ldquo;the good parts\u0026rdquo; of Reactionary thought. These main good parts are, from earlier in the email, their belief in the supremacy of the White race over the Black, their hostility to immigration, and their mistrust of universities. This also includes various odd beliefs like that the expansion of universities are the reason poetry is bad now and that crime is getting worse in a way that is a major problem. These odd beliefs ambiguously hint that the decline of poetry or the rise in crime is due to Black people, and this seems like the most obvious inference from the overall emphasis on race in the email.\nHe also lists, as a positive, that his Anti-Reactionary FAQ has inspired media interest in Reactionaries. People have sometimes alleged that the Anti-Reactionary FAQ was a subtle exercise intended to spread neoreactionary ideas and interest in them, while only claiming to oppose neoreaction or only opposing it in part. He is directly stating that this is true. Scott is happy that his Anti-Reactionary FAQ is making people more interested in neoreactionaries, and neoreactionary ideas.\nThis passage does explain the earlier part of the email about not wanting his beliefs publicly known. He explicitly does not want to be known as a Reactionary because he wants to spread Reactionary ideas. Exposing his specific beliefs would run counter to that goal.\nFinally, there\u0026rsquo;s a social aspect. They tend to be extremely unusual and very smart people who have a lot of stuff to offer me. I am happy to have some of them (not Jim!) as blog commenters who are constantly informing me of cool new things (like nydwracu linking me to the McDonalds article yesterday)\nThis also indicates something like audience capture. The Reactionaries are simply fun to talk to, and to know. They are his friends and he likes them.\n8. SERIOUSLY SERIOUSLY, the absurdity heuristic doesn\u0026rsquo;t work\nYou\u0026rsquo;re into cryonics, so you\u0026rsquo;ve kind of lost the right to say \u0026ldquo;These people, even tough they\u0026rsquo;re smart, are saying something obviously stupid, so we don\u0026rsquo;t have to listen to them\u0026rdquo;\nDrew has even less of a right to say that - he seems to be criticizing the Reactionaries on the grounds of \u0026lsquo;you wouldn\u0026rsquo;t pay attention to creationists, would you?\u0026quot; even while he discovered Catholic philosophy and got so into it that he has now either converted to Catholicism or is strongly considering doing so.\nThis is a tu quoque argument, a type of argumentum ad hominem.\nIf there is a movement consisting of very smart people - not pseudointellectual people, like the type who write really clever-looking defenses of creationism - then in my opinion it\u0026rsquo;s almost always a bad idea to dismiss it completely.\nScott believes that the previous contents of the email are sufficient to demonstrate that Reactionaries are very smart, and are not pseudointellectual people.\nAlso, I should have mentioned this on your steelmanning creationism thread, but although I feel no particular urge to steelman young earth creationism, it is actually pretty useful to read some of their stuff. You never realize how LITTLE you know about evolution until you read some Behe and are like \u0026ldquo;I know that can\u0026rsquo;t be correct\u0026hellip;but why not? Even if it turned out there was zero value to anything any Reactionary ever said, by challenging beliefs of mine that would otherwise never be challenged they have forced me to up my game and clarify my thinking. That alone is worth thousand hours reading things I already agree with on RationalWiki.\nBehe is Michael Behe, a pseudoscientist who was at the forefront of \u0026ldquo;intelligent design\u0026rdquo;. \u0026ldquo;Intelligent Design\u0026rdquo; was an ideological project, funded by various religious interests, meant to legitimize teaching creationism in high schools. It has failed all legal challenges. This seems like a good comparison to appeal to someone who is used to arguing about Behe. However, everything Behe says is trash. Almost nobody arguing against Behe\u0026rsquo;s ideas is deliberately promoting any of them.\nThis concludes the email. I have tried to do as little interpretation outside of the text of the email itself as I can. I can hopefully be forgiven for having an opinion at the conclusion.\nScott is, epistemically, a bad actor. He demonstrably lies about what he believes in public. I know this because he has said so. He threatens, maybe \u0026ldquo;jokingly\u0026rdquo;, people who might expose what he actually thinks. He deliberately chooses the things he says to pander to his reactionary audience.\nPerhaps most seriously, Scott takes the exact opposite of the position he believes, because by arguing with or \u0026ldquo;explaining\u0026rdquo; ideas he claims to disagree with, he knows he can promote them. He attacks the bailey because he wants to see the motte defended.\nWithout this specific email, believing these things about him would require a lot of reading into the subtext of what he says. With this email, you can be certain that all of these things are true. There is no plausible reason he would have written them if they were not.\nAll of this is stated subtly, and with links, but it is not ambiguous. There is only one plausible meaning to all of this.\nWhat he says is not an attempt to converge on truth, and if it was, you would have no way of knowing that.\n","date":"6 November 2025","externalUrl":null,"permalink":"/posts/scott-alexander-email/","section":"Posts","summary":"","title":"The Scott Alexander Email: An Explainer","type":"posts"},{"content":" Do we understand how neural networks work? # In important ways, arguably most of the ways that matter, we do not understand how or why modern neural networks work, or accomplish the things that they do.\nThis has come up a lot recently, so I\u0026rsquo;m going to try to define what the boundary is between what we do and do not understand.\nWhat We Definitely Understand # In short: What we do understand is the actual math for making them. If we didn\u0026rsquo;t, we couldn\u0026rsquo;t make them. Someone has to write out code for the math, and this code defines both what it is made of and how it learns things.\nWhat It Is # Neural networks are made of matrices, a perfectly ordinary piece of math that many millions of people have learned about in college. Fundamentally, every neural network is a big stack of matrices stacked together in some way. Generally, how they are stacked is pretty simple.\nHow It Is Trained # After we set them up, we train them with some kind of gradient descent. Gradient descent can be explained to anyone who has taken (and still remembers) calculus. Sometimes you don\u0026rsquo;t actually need to know calculus to have a pretty good idea, but to do the math is just calculus.\nWhat It Is Being Trained To Do # Your last real component is the objective. You are training the neural network (a big block of matrices) using gradient descent (a trick that modifies the big block of matrices) to do something.\nWhat that something is can also be defined pretty simply, although there are a few versions.\nYou train an LLM or chatbot to predict the next token (roughly, a word or part of a word) correctly. You train an image generator to guess the image from the caption. These are training objectives. We have a few of them, depending on what you are trying to do, and we know what they are and can write them down. You can make them pretty complicated, but you always know what they are. If you didn\u0026rsquo;t, you couldn\u0026rsquo;t do it.\nStill, Gets Pretty Gnarly # These things we absolutely, positively are guaranteed to understand. We know what they are. They are the recipe for a neural network, and there\u0026rsquo;s a ton of code for it available on the internet. You can just read it, if you\u0026rsquo;re dedicated enough, and then you can know how to write down the code for the math for the thing, because you\u0026rsquo;ve already read it.\nThese things can become incredibly complex. People devote their lives and careers to studying and refining these techniques. They aren\u0026rsquo;t easy, they aren\u0026rsquo;t trivial, and finding new tricks for making them work better is worth a phenomenal amount of money. This doesn\u0026rsquo;t make them completely mysterious: we know what the thing is, it\u0026rsquo;s some code or math, and we wrote it down. If we didn\u0026rsquo;t write it down, it wouldn\u0026rsquo;t be happening.\nYes, LLMs Are Glorified Autocomplete # An LLM is, very literally, glorified autocomplete. Absolutely, one hundred percent autocomplete. There are fancier training objectives for an LLM, but nobody uses them, and, surprise surprise: those are just different ways of writing down autocomplete. Probably there is no useful way to bundle statistics about language that isn\u0026rsquo;t autocomplete if you squint at it from the right angle.\nAn LLM is just a bundle of statistics about words. An image generator is just a bundle of statistics about images. These are, very literally, accurate descriptions of what an LLM or an image generator are. They are constructed statistically, and their objectives always concern the statistics of their input data.\nThis is still true when considering \u0026ldquo;post-training\u0026rdquo;, where an LLM stops being autocomplete for any text and becomes a chatbot, which only autocompletes stuff a chatbot would say. There is some data demonstrating how a chatbot can not swear at users and answer questions correctly, and it is made to autocomplete that data until it generally doesn\u0026rsquo;t swear at users and attempts to answer questions correctly.\nWhat We Don\u0026rsquo;t Understand # We don\u0026rsquo;t understand, in any deep way, the end result of training a neural network. Our end result is a very large bundle of complex statistics about the data. We know what the data is, but we\u0026rsquo;ve extracted as many statistics as we can from it in an automated way. We have no idea in advance what those statistics are, they are all connected to each other in incredibly complicated ways, and there\u0026rsquo;s millions or billions of them.\nWhen we say they\u0026rsquo;re \u0026ldquo;just statistics\u0026rdquo;, it is correct but misleading. Most statistics people think of are simple, one or two numbers. Lots and lots of connected statistics, all at once, are not a simple thing at all. The neural network is just statistics in the same way that it\u0026rsquo;s just electricity. I know that rivers are just water, but this tells me almost nothing about any given river.\nWhy or How They Do Any Specific Thing # Here\u0026rsquo;s a concrete example: https://claude.ai/share/46a6cc1d-c3dd-4d07-ad80-02409fdb654b\nI know that it wrote a poem in trochaic hexameter because I asked. I know it knows what poems and bees are because poems, writing about bees, and poems written about bees are in its data. I think it\u0026rsquo;s probably at least partly true that it mentioned clover because of the reasons it gives.\nI have no idea how, exactly, it knows that or does that. I don\u0026rsquo;t know what is inside the neural network that causes the poem, or the explanation of the poem. I know it\u0026rsquo;s some matrices, and I know those matrices were created by training them in the normal way, but I don\u0026rsquo;t know how exactly the matrices cause the poem or the explanation of the poem.\nMost of the time, most of what a very large neural network does is surprising. There\u0026rsquo;s no method of reading the training code and determining what poem it is going to write (or what it is going to do instead of complying) if you ask it to write you a poem. You discover this, and most things about the final network, by trial and error.\nWe understand the end result, as much as we do, by making educated guesses, following hunches, and, occasionally, digging into the network to try to reverse engineer exactly how and why it does a specific thing. It\u0026rsquo;s quite difficult to do this, and most of the time, most of the things large neural networks do don\u0026rsquo;t have a specific cause that we precisely know.\nThis is a basic problem of scale. If I ask an LLM to write a poem, and it does, every word of the poem depends on every word of my message, on the previous words of the poem, on a small random factor, and on the exact contents of the millions or billions of the numbers that make up the network.\nIt is not really possible to hold all of those things and how they relate to each other in your mind at once. I can use a computer to try to trace what causes each thing without crunching all the numbers myself, but a bulk summary is usually either uninformative or is still too much information to take in at once.\nWhat Is Inside The Model After Training It # I understand gradient descent, which we use to train neural networks, at least moderately well. I can write down the equation. I do not understand, except in a very abstract way, most of the results of running gradient descent. Gradient descent functions as a search algorithm: given this problem, the computer finds a solution. I know what the problem is, and that we have found some kind of solution for it, but the solution is very complicated and I do not know or understand what that solution is, really.\nThe solution to \u0026ldquo;be autocomplete\u0026rdquo;, if it\u0026rsquo;s good enough, can write a poem that scans. I know that we made the LLM by having a neural network try to solve autocomplete, and I know that it is writing a poem because I asked it to, but I have very nearly no idea what precisely the training put inside its various matrices that enables it to do that.\nIt isn\u0026rsquo;t even clear that there is any such thing as understanding these things sometimes. Neural networks tend to find the simplest solution they can to a problem. If the problem is complex, the solution can be, inherently, very complex. What does it mean to understand the solution to something that is, by its nature, so complex that you cannot possibly hold all the details in your mind at once? What is the best solution to \u0026ldquo;be autocomplete\u0026rdquo;, and what would it mean to understand that solution?\nReverse Engineering # There is a middle ground here in the very limited set of things we currently do understand about how neural networks do the things that they do.\nThere is a subfield of AI called \u0026ldquo;mechanistic interpretability\u0026rdquo;. You look inside of a neural network, and you try to find how, mechanically, it does the thing that it is doing. Sometimes this is enlightening. Anthropic seems to do more of it than anyone else; it doesn\u0026rsquo;t seem like it\u0026rsquo;s considered a high priority at most AI companies.\nIt\u0026rsquo;s worth noting that normally, you reverse engineer things other people made because they don\u0026rsquo;t tell you how they made them. We are reverse engineering things we have made. We know how we made it. We still have to reverse engineer it, because we do not know, in advance, any of what is going on inside of it.\nThere\u0026rsquo;s an interesting point here. It is definitely true that we made the neural network. It certainly didn\u0026rsquo;t make itself. But all of the details were carved into the neural network by gradient descent; no person did it. Gradient descent doesn\u0026rsquo;t explain what it does. It\u0026rsquo;s just some math, after all. So we have to reverse engineer the neural network if we want to know what is in it.\nThis is generally pretty difficult, and doesn\u0026rsquo;t cover a whole lot of what is going on in there. We can choose a few examples from the literature to get the flavor of the sorts of things we currently understand about how neural networks do things, and what the limits of that are. These are not at all exhaustive, and there\u0026rsquo;s a good chunk of interesting work detailing other mechanisms inside of LLMs, but they give a good idea of what this looks like.\nGolden Gate Claude # The most entertaining one is Golden Gate Claude. https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html\nClaude is Anthropic\u0026rsquo;s large language model. They found a \u0026ldquo;feature\u0026rdquo;, or something like a neuron, that only lit up when Claude was discussing the Golden Gate Bridge. They \u0026ldquo;clamped\u0026rdquo; this feature, which is more or less the same idea as applying electricity directly to the neuron. Hilarity ensues.\nYou get the idea. Other than being, in my opinion, extremely funny, I don\u0026rsquo;t think we really have to wonder if this feature does what they think it does. If you turn the \u0026ldquo;golden gate\u0026rdquo; part up to 11, it does, indeed, talk about the golden gate bridge regardless of whether it makes any sense to do so or not.\nFrom a high level, the technique here is that you first try to separate the model\u0026rsquo;s internal state into a bunch of on/off switches, which is not normally what it looks like. You then check which of those are \u0026ldquo;on\u0026rdquo; for which outputs, and you can label them things like \u0026ldquo;guilt\u0026rdquo; and \u0026ldquo;Golden Gate\u0026rdquo;.\nThis is pretty difficult, and provides far from perfect understanding. So it isn\u0026rsquo;t quite true that we don\u0026rsquo;t understand anything about the inside of language models, but the understanding we have is very limited and it is very difficult to figure out more of it.\nArithmetic # Sometimes LLMs can do arithmetic correctly. This is kind of bizarre, because they\u0026rsquo;re not really for math at all. They\u0026rsquo;re for text. They will have learned math because often math is represented in text, like here: 73+37=110. To be good autocomplete for that sort of text, the LLM will either have to memorize every single possible addition (which is impossible) or work out some way of actually doing addition.\nOur best understanding is that they appear to turn actual integers into positions on a circle or helix, and then add them by doing one rotation and then the second one. https://arxiv.org/abs/2502.00873 (This is a little bit less weird if you know that the internal representation of an LLM can be considered some kind of sphere, so really it does everything by turning it into a circle). Later work extended this to check exactly how it does the addition, and added some details to this picture, like that it does most of its calculation when it sees an \u0026ldquo;=\u0026rdquo; sign. https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-addition\nCrucially, if asked how it got the answer, it tells you that it carried the digits in the normal way. When we can see what it is actually doing down in its thinking parts, however, we can see that that\u0026rsquo;s not really what it does. LLMs are unreliable narrators, here as much as elsewhere.\nLimits, Less Precise Approaches # There is vastly more that we cannot explain about what LLMs do and how they do them than there are things we can explain. We can explain, in a precise way, like you\u0026rsquo;d explain a can opener or a mousetrap, a good amount about a very small number of the things they do, but this is difficult. We only really understand the relatively few internal components we\u0026rsquo;ve reverse engineered in this way.\nThere\u0026rsquo;s another sense or two in which you can try to understand things, although these are much less precise than understanding the actual parts and how they work.\nYou can try to reason about the training objective. If an LLM is trained on an argument between Alice and Bob, it needs to predict that Alice will continue to say Alice things and Bob will continue to say Bob things. In order to learn to be the best autocomplete possible, there has to be some ability to represent individual people and keep track of their traits. This would imply that a good LLM has to run something like a thin simulation of a person in it, that they will sometimes be able to imitate certain person-like traits. https://generative.ink/posts/simulators/\nIf you assume this, or something like it, you can also try to understand them non-mechanically, the way you\u0026rsquo;d understand another person\u0026rsquo;s psychology. And we do this sort of implicitly while prompting them: If you\u0026rsquo;re polite they tend to work better, and if you\u0026rsquo;re rude they tend to work less well. You can often guess why they are doing something wrong by saying that they seem confused, and ask them to explain what they think is happening, and then explaining the problem to them.\nThis approach works, sort of, sometimes, but this is a much less precise type of understanding. In many ways, treating LLMs as if they have a human psychology makes truly understanding how they work seem harder. We understand how to talk to other people, but we don\u0026rsquo;t really understand what makes people tick. If we understood LLMs just as well as psychologists understand humans, we would still not understand them the way an engineer understands a refrigerator. It is just a much less reliable type of understanding.\nA Prediction # (note to self, put this prediction in a footnote)\nIt seems that as the systems we make become more advanced we are likely to find that knowing why and how they do things will become more difficult, and in some cases will be impossible. If your system is as optimized as possible, with nothing wasted, the map and the territory are the same size; the only correct explanation for anything that it does is the thing in its entirety. When the thing is also quite large, it becomes, literally, impossible to understand, because there is no smaller or more understandable explanation for what the thing does or why than the thing itself.\nPractical Concerns # You do not need to understand something to use it. We use plenty of things we don\u0026rsquo;t fully understand: most people don\u0026rsquo;t know how their car engine works in detail, but they can still drive. We don\u0026rsquo;t understand exactly how many medications work at the molecular level. Most of the time, it doesn\u0026rsquo;t matter whether you personally understand something or whether anyone does, because as long as it works it is useful, and if it doesn\u0026rsquo;t, it isn\u0026rsquo;t.\nWhere this is the greatest problem is for research purposes, where the lack of understanding of how models do things makes it much more difficult to verify that they will do what you want and not something else. It also makes it very difficult to figure out why they cannot do things, much of the time. At the edge of our understanding, it makes the field look more like an art than a science.\n","date":"13 August 2025","externalUrl":null,"permalink":"/posts/do-we-understand-how-neural-networks/","section":"Posts","summary":"","title":"Do we understand how neural networks work?","type":"posts"},{"content":" AGI: Probably Not 2027 # AI 2027(footnote: link) is a web site that might be described as a paper, manifesto or thesis. It lays out a detailed timeline for AI development over the next five years. Crucially, per its title, it expects that there will be a major turning point sometime around 20271, when some LLM will become so good at coding that humans will no longer be required to code. This LLM will create the next LLM, and so on, forever, with humans soon losing all ability to meaningfully contribute to the process. They avoid calling this \u0026ldquo;the singularity\u0026rdquo;. Possibly they avoid the term because using it conveys to a lot of people that you shouldn\u0026rsquo;t be taken too seriously.\nI think that pretty much every important detail of AI 2027 is wrong. My issue is that each of many different things has to happen the way they expect, and if any one thing happens differently, more slowly, or less impressively than their guess, later events become more and more fantastically unlikely. If the general prediction regarding the timeline ends up being correct, it seems like it will have been mostly by luck.\nI also think there is a fundamental issue of credibility here.\nSometimes, you should separate the message from the messenger. Maybe the message is good, and you shouldn\u0026rsquo;t let your personal hangups about the person delivering it get in the way. Even people with bad motivations are right sometimes. Good ideas should be taken seriously, regardless of their source.\nOther times, who the messenger is and what motivates them is important for evaluating the message. This applies to outright scams, like emails from strangers telling you they\u0026rsquo;re Nigerian princes, and to people who probably believe what they\u0026rsquo;re saying, like anyone telling you that their favorite religious leader or musician is the greatest one ever. You can guess, pretty reasonably, that greed or zeal or something else makes it unlikely they are giving you good information.2\nIn this specific case, I think that the authors are probably well-intentioned. However, most of their shaky assumptions just happen to be things which would be worth at least a hundred billion dollars to OpenAI specifically if they were true. If you were writing a pitch to try to get funding for OpenAI or a similar company, you would have billions of reasons to be as persuasive as possible about these things. Given the power of that financial incentive, it\u0026rsquo;s not surprising that people have come up with compelling stories that just happen to make good investor pitches. Well-intentioned people can be so immersed in them that they cannot see past them.\nBecause this is a much simpler objection than any of the technical points, I will try to detail why it seems both likely and discrediting before getting into the details.\nGreed: AI 2027 Is OpenAI\u0026rsquo;s Investor Pitch # (Zach Weinersmith, SMBC, 2010)(footnote: ilu zach please don\u0026rsquo;t sue me)\nIf AI 2027 is not roughly true, OpenAI will probably die.3\nSimple math: OpenAI is currently in a funding round, and is trying to raise a total of forty billion dollars. In 2024, OpenAI made $3.7 billion in revenue and spent about nine billion, for a net loss of about five billion dollars.4 They are currently projected to have a net loss of eight billion through 2025.5 This means they have at most five years of runway. To put it another way, this means that if they do not alter their trajectory or raise more money, they will be dead within five years, so by 2030 at the latest.6\nThis is a crude estimate, but I do not think that making it less crude really improves the picture. OpenAI had maybe about ten billion dollars of cash on hand beforehand, which buys them an extra year and change. On the downside, they also owe Microsoft 20% of their forward revenue and have very large commitments to spending money on data centers with partners like Oracle. These commitments are difficult to translate into time, but they seem to make the runway shorter. All told, \u0026ldquo;by default, OpenAI dies in under five years\u0026rdquo; seems to be roughly correct.\nOpenAI has reliably doubled down, raised more funding, and mostly ignored questions of profitability while growing. This is an all-in bet that at some future point the services they offer will be extremely profitable. In my humble opinion, it seems very nearly impossible for it to be true based on their current products: LLMs are a fiercely competitive business, with significant pressure from at least two major competitors to offer a better service at a similar price, so they cannot really raise prices unless they have something much better than what competitors offer.7 They cannot really slash research or data center spending or they will fall behind.8\nThere is one way that doubling down over and over again like this is a good idea, and it isn\u0026rsquo;t selling more ChatGPT subscriptions. It is if you can be sure that the fruits of future research and development will generate exponentially more profits than any of the products they currently sell. If this is not true, they are probably doomed based upon how much they have committed to spend and what they owe to whom.\nAI 2027, \u0026ldquo;coincidentally\u0026rdquo;, validates exactly this scenario. If I worked at OpenAI and I was trying to convince a group of investors to give me forty billion dollars, and I was positive they\u0026rsquo;d believe anything I said, I would just read AI 2027 out loud to them.\nAI 2027 features a lightly fictionalized version of OpenAI, which it calls \u0026ldquo;OpenBrain\u0026rdquo; and mentions over a hundred times. Inasmuch as \u0026ldquo;OpenBrain\u0026rdquo; has any competition, the only one that is mentioned is \u0026ldquo;DeepCent\u0026rdquo;, clearly a reference to DeepSeek, which is mentioned only to assure the reader at every step that they are vastly inferior to \u0026ldquo;OpenBrain\u0026rdquo; and cannot possibly compete with them. \u0026ldquo;OpenBrain\u0026rdquo; experiences just enough appearance of adversity, from \u0026ldquo;DeepCent\u0026rdquo; and other sources, to make it seem like victory, although clearly inevitable, is still sort of heroic and impressive.\nIf you have been around funding hype for companies, this is clearly a funding pitch. It is a masterful example of the genre against which all lesser funding pitches should be measured. It blends elements of science fiction, techno-thriller and fan fiction9 while constantly hammering in the assurance that the company will be victorious over its enemies and reap untold riches. We are assured that they will perform a miracle and become extremely profitable right before they are projected to go bankrupt. According to the panel on the side, \u0026ldquo;OpenBrain\u0026rdquo; is going to be worth eight trillion dollars and see 191 billion dollars a year in revenue in October 2027. After that, the numbers become somewhat more fantastic.\nEveryone in OpenAI who has been involved in creating this narrative is a master of the craft. They are so good at it that people who are culturally adjacent to the company seem not to recognize that it is, very clearly, a funding pitch that has taken on a life of its own.\nBut: it\u0026rsquo;s just a funding pitch. There\u0026rsquo;s very little reason to believe anything in a funding pitch is true, and billions of reasons to think that it is bullshit.\nIt is worth noting that the lead author of AI 2027 is a former OpenAI employee. He is mostly famous outside OpenAI for having refused to sign their non-disparagement agreement and for advocating for stricter oversight of AI businesses. I do not think it is very credible that he is deliberately shilling for OpenAI here. I do think it is likely that he is completely unable to see outside their narrative, which they have an intense financial interest in sustaining.\nThe authors say they have consulted over a hundred people, including \u0026ldquo;dozens of experts in each of AI governance and AI technical work\u0026rdquo; for researching this report. I would be willing to bet that OpenAI is the most represented single institution among the experts consulted. This is a somewhat educated guess, based both on who the authors are and what they have written, and it seems like a pretty safe bet.\nFundamentally, this is a report headed by a former OpenAI employee who has founded a think tank to work on AI safety. He is leveraging his familiarity with OpenAI specifically, both as professional experience and, most likely, extensively for expert opinions. It is likely that he still owns substantial OpenAI stock or options, and if his think tank is going to do contract work on AI Safety, it will probably be for, with, or concerning OpenAI. Inasmuch as this report reaches any conclusion that doesn\u0026rsquo;t seem favorable to OpenAI, it\u0026rsquo;s that outside experts and governance, of the kind that think tanks might help provide, are necessary and important.\nIt\u0026rsquo;s difficult not to suspect motivated reasoning.\nZeal: AI 2027 As Religious Dissent # Greed is a reasonable reason to doubt any of this is true. So is zeal.\nPeople focused on AI, as a group, have many of the characteristics of a religious movement. Previously this was a relatively obscure fact. There were a significant number of people on the internet and in San Francisco who were intensely concerned that AI might bring about some kind of apocalypse, but this was not widely known.10 Occasionally if someone involved went off the deep end or if something was said about AI being dangerous by someone prominent (Stephen Hawking, Elon Musk) it might make news, but in general the notion that there was an entire subculture about AI was on very few peoples\u0026rsquo; radar.\nOpenAI is most easily understood as an offshoot of this movement. OpenAI was distinct because it was good at recruiting actual research personnel and extremely good at raising money. This originally took the form of getting a number of early PayPal investors like Elon Musk, Peter Thiel, and Reid Hoffman involved to fund OpenAI. OpenAI was presented as a counterbalance to Google\u0026rsquo;s research division, and necessary to ensure that AI was created safely. It seems unlikely that any of that would have been possible if there hadn\u0026rsquo;t been a significant movement that was already focused on these concerns, but it took OpenAI\u0026rsquo;s founders to create a serious, well-funded research endeavor out of the wider interest in the subject.\nBefore OpenAI started making a lot of money, it was widely understood that \u0026ldquo;safe\u0026rdquo; meant something like \u0026ldquo;unlikely to kill people, because sufficiently advanced AI is dangerous the way nuclear or biological weapons are dangerous\u0026rdquo;. Generally this emphasized the difficulty of being sure you can control an AI as it becomes more capable. OpenAI specifically has mostly redefined \u0026ldquo;safe\u0026rdquo; into \u0026ldquo;making sure OpenAI\u0026rsquo;s AI is polite enough to sell to other people, is more advanced than anyone else\u0026rsquo;s, and also is making more money than anyone else\u0026rsquo;s is\u0026rdquo;. This makes sense if you assume that OpenAI, as an organization, is more trustworthy and safety-conscious than any other actual or possible organization for doing AI research.\nFor anyone who doesn\u0026rsquo;t believe that OpenAI is more trustworthy than any possible alternative, OpenAI\u0026rsquo;s present-day vision of AI looks like a bizarre schism that has somehow made profit for OpenAI its primary, if not only, principle. OpenAI claims to pursue safe AI and AI that benefits humanity, but this turns out, over time, to always be what gives OpenAI the most freedom to raise and make money. In religious terms, OpenAI is a sect that fused zeal for the singularity to an unabashed embrace of capitalism, and when the two conflicted, chose capitalism. Possibly the most serious place where these two things conflict is on what is meant by \u0026ldquo;safety\u0026rdquo;.\nAI 2027 ends with a cautionary tale that has two endings. In one of them AI progress goes too far, too fast, and pretty much everyone dies. In the other AI is somewhat more constrained, and at least not everyone dies.\nI understand the core of this story, which also seems to be OpenAI\u0026rsquo;s funding pitch, as some version of OpenAI\u0026rsquo;s creed. In that context, the cautionary tale at the ending reads like any dissent on questions of religious doctrine: perhaps we have become too greedy and too eager, and forgotten our principles, and this will all end in disaster.\nNone of that means it\u0026rsquo;s wrong, necessarily. People can be correct for the wrong reasons, or from strange places. It does seem to explain how someone who has gone to great lengths to defend his right to disparage OpenAI would end up writing out a variation of their investor pitch. When someone has recently departed a religion, the beliefs they have still tend to be the same ones they had before they left, and their complaints are modifications to the dogma, not complete rejections of it.\nThe Details # I am going to try to chronicle everything that seems conspicuously wrong, bizarre, or indicative of pro-OpenAI slant. I am going to do my best to skim or ignore anything that is strictly fiction, which is, by word count, most of it. Quoting and commenting on the parts that are at least in some way about the real world is still a lengthy exercise.\nI am grateful to the authors for encouraging debate and counterargument to their scenario. Quotes are in the same order as they are in the text.\nMid 2025: Stumbling Agents # The AIs of 2024 could follow specific instructions: they could turn bullet points into emails, and simple requests into working code. In 2025, AIs function more like employees. Coding AIs increasingly look like autonomous agents rather than mere assistants: taking instructions via Slack or Teams and making substantial code changes on their own, sometimes saving hours or even days. Research agents spend half an hour scouring the Internet to answer your question.\n\u0026ldquo;AIs function more like employees\u0026rdquo; is doing a lot of work here. No AI we currently have functions very much like an employee, except for the very simplest tasks (e.g., \u0026ldquo;label this\u0026rdquo;). They require far more supervision, and are far more unreliable, than any employee ever would be. This gulf in how much autonomy they can be trusted with is so vast that making the comparison is pure speculation.\nThe authors fail completely here to even acknowledge that this is a serious problem, or that it would be an immense achievement to overcome it. That LLMs are incredibly useful in some situations is true; to say they \u0026lsquo;function like employees\u0026rsquo; is, at best, optimistic. Under ideal circumstances when working paired with an actual human, they occupy a role similar to an employee, somewhat. This sometimes saves a good amount of labor. It doesn\u0026rsquo;t directly replace a human.\nThis is the general pattern of why these predictions seem implausible. They are describing something that already exists, but as if it were much better than it is, or are assuming that making it much better than it is will be relatively easy and happen on a relatively short timeline. These are, at best, educated guesses. You can only really know how difficult it is to improve the technology after you\u0026rsquo;ve already done it.\nThe other pattern is that the paper says things that imply strongly that AI is just as good as a human, but more and more overtly each time. \u0026ldquo;AIs function more like employees\u0026rdquo; is the first of these. Taken literally, this could mean that AI was very nearly just as good as a human is, already. This would be quite an achievement. Of course, if you think about it enough, you will know that it\u0026rsquo;s not true, but it sort of resembles something true if you squint at it hard enough. It\u0026rsquo;s a good sleight of hand.\nGranted, it has been four months since this was written, so perhaps I have the benefit of hindsight. \u0026ldquo;Mid 2025\u0026rdquo; has come and gone. If that\u0026rsquo;s the case, though, we can say that it\u0026rsquo;s the first real prediction and that it has already failed to happen.11\nLate 2025: The World’s Most Expensive AI # (To avoid singling out any one existing company, we’re going to describe a fictional artificial general intelligence company, which we’ll call OpenBrain. We imagine the others to be 3–9 months behind OpenBrain.)\nThis happens to be what you would need to be true to justify investing in \u0026ldquo;OpenBrain\u0026rdquo; over competitors, of course. Crucially, a 3-9 month lead is almost impossible to prove or disprove, so it\u0026rsquo;s not that hard to convince someone that you\u0026rsquo;re that far ahead. As noted previously, I do not have a very positive opinion of pretending that everything said about \u0026ldquo;OpenBrain\u0026rdquo; is not actually about OpenAI.\nAlthough models are improving on a wide range of skills, one stands out: OpenBrain focuses on AIs that can speed up AI research. They want to win the twin arms races against China (whose leading company we’ll call “DeepCent”) and their U.S. competitors. The more of their research and development (R\u0026amp;D) cycle they can automate, the faster they can go. So when OpenBrain finishes training Agent-1, a new model under internal development, it’s good at many things but great at helping with AI research.\nEveryone has been training LLMs substantially on code since 2023. Every major organization uses LLMs as code assistants. Presenting this as an innovation is bizarre.\nWe are asked to assume here that whatever OpenAI is currently cooking in their research division is extremely good for writing code for AI research, so much so that it remarkably accelerates their research schedule. Again, I perhaps have the benefit of hindsight. I am writing in August, 2025 a few days after the GPT-5 release. It appears to be slightly better than the previous OpenAI product in some ways, and worse in others. I do not think that it is likely that OpenAI is going to significantly accelerate its research compared to competitors due to how good their LLM is at doing AI research tasks.\nAlso, here, we begin with the narrative that all of this is an arms race between \u0026ldquo;OpenBrain\u0026rdquo; and its Chinese competitor, \u0026ldquo;DeepCent\u0026rdquo;. Competition with China was previously a focus in a well-received position paper called Situational Awareness.12 I am told it plays extremely well with people in Washington, DC. As it happens, convincing political people to give you money and to refrain from regulating you can be a very important part of a business plan. I cannot otherwise explain why so many people in AI are suddenly extremely interested in this specific arms race story. In my opinion, competition between American and Chinese companies is not meaningfully different or more interesting than competition between US-based companies.\nSomething similar is happening currently. Various AI companies are now bending over backwards to focus concern on the political slant of their LLMs because the current administration is making a big deal about how conservative or liberal they are. This is one of the less interesting and important properties of LLMs, but it is possibly very profitable to focus on it if you can get a large government contract out of it. Most likely, this would result in selling a much dumber LLM to the government. Effort is zero-sum: you can have a smarter one or you can have one that flatters your opinions, but you generally can\u0026rsquo;t have both.\nThe same training environments that teach Agent-1 to autonomously code and web-browse also make it a good hacker. Moreover, it could offer substantial help to terrorists designing bioweapons, thanks to its PhD-level knowledge of every field and ability to browse the web.\n\u0026ldquo;Autonomously\u0026rdquo; is packing a lot of assumptions into it; really, the same ones that lead them to say that AI were \u0026ldquo;like employees\u0026rdquo; earlier. They again imply, but vaguely, that perhaps the AI is about as good as a human. Whether extensions of current systems can meaningfully function autonomously is an open question, and if they are wrong about it, they appear likely to be wrong about the rest of their predictions also.\nSaying an LLM might be a \u0026ldquo;substantial help to terrorists designing bioweapons\u0026rdquo; is incredibly vague. Google search would also be of substantial help in designing bioweapons, because you can google any topic in chemistry or biology. You can also find these things in a library. One suspects that focusing on the possible creation of weapons of mass destruction is also useful for attracting attention and possibly money from the government. There is no evidence that LLMs are, actually, very useful for this or are likely to be soon.\nOpenBrain has a model specification (or “Spec”), a written document describing the goals, rules, principles, etc. that are supposed to guide the model’s behavior. Agent-1’s Spec combines a few vague goals (like “assist the user” and “don’t break the law”) with a long list of more specific dos and don’ts (“don’t say this particular word,” “here’s how to handle this particular situation”). Using techniques that utilize AIs to train other AIs, the model memorizes the Spec and learns to reason carefully about its maxims. By the end of this training, the AI will hopefully be helpful (obey instructions), harmless (refuse to help with scams, bomb-making, and other dangerous activities) and honest (resist the temptation to get better ratings from gullible humans by hallucinating citations or faking task completion).\nThis is a description of some variation on Constitutional AI, which was published by Anthropic in 2022.13 It is bizarre to give it a new name and attribute it entirely to OpenAI. It does not seem to meaningfully clarify anything at all about what is likely to happen in the future. We also have some general descriptions of neural networks and how LLMs are trained. These seem out of place, but do at least avoid describing things published in the past by people who are not OpenAI and attributing them to OpenAI in the future.\nIt is notable how thoroughly OpenAI\u0026rsquo;s American competitors are erased. The focus is exclusively on a Chinese rivalry with a Chinese company. American companies competing with OpenAI are competing directly with them for American investor and government money, for employees, and for attention. It is probably safer not to mention Anthropic or Google DeepMind at all, because they are very similar to OpenAI and over time have shared many of their employees with OpenAI.\nInstead, researchers try to identify cases where the models seem to deviate from the Spec. Agent-1 is often sycophantic (i.e. it tells researchers what they want to hear instead of trying to tell them the truth). In a few rigged demos, it even lies in more serious ways, like hiding evidence that it failed on a task, in order to get better ratings. However, in real deployment settings, there are no longer any incidents so extreme as in 2023–2024 (e.g. Gemini telling a user to die and Bing Sydney being Bing Sydney.)\nI certainly have the benefit of hindsight here. They wrote this before Grok, Elon Musk\u0026rsquo;s LLM, started telling people it was MechaHitler.\nEarly 2026: Coding Automation # OpenBrain continues to deploy the iteratively improving Agent-1 internally for AI R\u0026amp;D. Overall, they are making algorithmic progress 50% faster than they would without AI assistants—and more importantly, faster than their competitors.\n[This next definition is in a folded part that you have to click to see]\nImproved algorithms: Better training methods are used to translate compute into performance. This produces more capable AIs without a corresponding increase in cost, or the same capabilities with decreased costs. This includes being able to achieve qualitatively and quantitatively new results. “Paradigm shifts” such as the switch from game-playing RL agents to large language models count as examples of algorithmic progress.\nI will bet any amount of money to anyone that there is no empirical measurement by which OpenAI specifically will make \u0026ldquo;algorithmic progress\u0026rdquo; 50% faster than their competitors specifically because their coding assistants are just that good in early 2026.\nIt seems unlikely that OpenAI will end up moving 50% faster on research than their competitors due to their coding assistants for a few reasons.\nFirst, competitors\u0026rsquo; coding models are quite good, actually, and it is unlikely that OpenAI\u0026rsquo;s will be significantly better than theirs in the foreseeable future. OpenAI\u0026rsquo;s models are very good, and what is or is not better is difficult to quantify, but it still seems certain that they are not so much better that you will get 50% more done.\nSecond, research is open-ended by nature. Coding assistants currently primarily solve well-defined tasks. Defining the task is the hard part, so that\u0026rsquo;s very little help at all here. The ability to actually write out code, the only part of the job LLMs can currently do very well, is not a major bottleneck for research progress most of the time. There are already plenty of very good engineers to write code for AI research, especially at larger companies like OpenAI.\n\u0026ldquo;Algorithmic progress\u0026rdquo; gets a lot of focus, both here in the main piece and in a supplement. It seems to be a sort of compulsive reductionism, where all factors in progress must be reduced to single quantities that you can plot on a curve. This, of course, makes predictions for the future seem much more meaningful. Even the concept of a \u0026ldquo;paradigm shift\u0026rdquo;, a description of a complete discontinuity, is forced to be a part of a smooth curve that you can just keep drawing to predict the future.\nThis trick, of just drawing a curve of progress and following it, has worked reasonably well for predicting how much faster computers would get with time. There is some evidence that it is roughly true for some kinds of progress in AI. There is no reason to think that it is always true for every kind of progress you could make in AI.\nPeople naturally try to compare Agent-1 to humans, but it has a very different skill profile. It knows more facts than any human, knows practically every programming language, and can solve well-specified coding problems extremely quickly. On the other hand, Agent-1 is bad at even simple long-horizon tasks, like beating video games it hasn’t played before. Still, the common workday is eight hours, and a day’s work can usually be separated into smaller chunks; you could think of Agent-1 as a scatterbrained employee who thrives under careful management. Savvy people find ways to automate routine parts of their jobs.\nYou can really tell that this was written by more than one person, because this directly contradicts the earlier part about how AI was more like an employee a full year earlier. This does, in fact, accurately describe using AI coding tools in April, 2025 when this was written. It\u0026rsquo;s a very positive description, but it\u0026rsquo;s quite accurate. It still accurately describes how things are now in August. It is funny to call it a prediction for next year, though. It leaves out how badly AI coding assistants fail in some situations that are not especially \u0026ldquo;long-horizon\u0026rdquo;. It correctly notes, as earlier parts of this piece did not, that you need to supervise them extremely closely.\nOpenBrain’s executives turn consideration to an implication of automating AI R\u0026amp;D: security has become more important. In early 2025, the worst-case scenario was leaked algorithmic secrets; now, if China steals Agent-1’s weights, they could increase their research speed by nearly 50%. OpenBrain’s security level is typical of a fast-growing ~3,000 person tech company, secure only against low-priority attacks from capable cyber groups (RAND’s SL2). They are working hard to protect their weights and secrets from insider threats and top cybercrime syndicates (SL3), but defense against nation states (SL4\u0026amp;5) is barely on the horizon.\nThis assumes the previously mentioned 50% research speed gain from better LLMs, assumes that competitors are far behind OpenAI, and makes a point of spotlighting Chinese competition and citing the RAND corporation, which I assume plays well with political people who write regulations and award contracts. None of those things seem plausible. It is probably true that if security is lax people will steal your LLM, because that is true of any data that is worth money. That fact, true at every company that handles important data, isn\u0026rsquo;t generally presented with so much drama.\nMid 2026: China Wakes Up # This entire section veers thoroughly into geopolitical thriller territory and continues the pattern of appealing to the US Government\u0026rsquo;s general fear of China. In the real world and the present, the government of China does not seem extremely worried about AI in general. We are asked here to fantasize that their government will care a lot about it in the future. This justifies considering OpenAI to be in an arms race with its Chinese competitors, hearkening back to the deep memories of the Cold War.\nIt is perhaps embarrassing to be racing with someone who does not think they are racing with you at all.\nA Centralized Development Zone (CDZ) is created at the Tianwan Power Plant (the largest nuclear power plant in the world) to house a new mega-datacenter for DeepCent, along with highly secure living and office spaces to which researchers will eventually relocate. Almost 50% of China’s AI-relevant compute is now working for the DeepCent-led collective, and over 80% of new chips are directed to the CDZ. At this point, the CDZ has the power capacity in place for what would be the largest centralized cluster in the world. Other Party members discuss extreme measures to neutralize the West’s chip advantage. A blockade of Taiwan? A full invasion?\nIt must be really strange to live in Taiwan and have to read Americans fantasizing about China maybe invading your country because American AI companies are just too good.\nBut China is falling behind on AI algorithms due to their weaker models. The Chinese intelligence agencies—among the best in the world—double down on their plans to steal OpenBrain’s weights. This is a much more complex operation than their constant low-level poaching of algorithmic secrets; the weights are a multi-terabyte file stored on a highly secure server (OpenBrain has improved security to RAND’s SL3). Their cyberforce think they can pull it off with help from their spies, but perhaps only once; OpenBrain will detect the theft, increase security, and they may not get another chance. So (CCP leadership wonder) should they act now and steal Agent-1? Or hold out for a more advanced model? If they wait, do they risk OpenBrain upgrading security beyond their ability to penetrate?\nThis is also a pure fantasy.\nLate 2026: AI Takes Some Jobs # Finally, a section heading I mostly agree with. AI is, probably, going to take some jobs. It has taken some jobs already, like translators. This seems well-grounded, perhaps we can get some real analysis here.\nJust as others seemed to be catching up, OpenBrain blows the competition out of the water again by releasing Agent-1-mini—a model 10x cheaper than Agent-1 and more easily fine-tuned for different applications. The mainstream narrative around AI has changed from “maybe the hype will blow over” to “guess this is the next big thing,” but people disagree about how big. Bigger than social media? Bigger than smartphones? Bigger than fire?\nExpecting real analysis was optimistic. \u0026ldquo;Somehow, OpenAI is ten times cheaper and much better than everyone else.\u0026rdquo; It could happen. It could also not happen. There is no specific reason for believing any release will be ten times cheaper and better than what came before it in late 2026, but it\u0026rsquo;s hypothetically possible. It would certainly be very profitable for them if it did happen, so I can understand why you would put this on an investor pitch. Instead of just saying it\u0026rsquo;s \u0026ldquo;better\u0026rdquo;, they say it\u0026rsquo;s \u0026ldquo;more easily fine-tuned for different applications\u0026rdquo;. This is just a complicated way of being better, and it sounds more plausible than \u0026ldquo;10x cheaper, and also better\u0026rdquo;.\nThey go on to speculate that this will hurt the job market for junior software engineers and generate a lot of hype. This was an easy \u0026ldquo;prediction\u0026rdquo; because the job market was already getting bad for junior software engineers this April14, and there was already a lot of hype that sounded like this.\nI will note that this pattern continues: First, you state things that happened in the past as if they are happening in the future. You attribute these things to OpenAI, sorry, I mean \u0026ldquo;OpenBrain\u0026rdquo;. This pretty well guarantees that anyone reading your \u0026ldquo;predictions\u0026rdquo; who doesn\u0026rsquo;t already know about those things will feel like they are meaningful predictions. Perhaps they will even feel like you got them right, later. They alternate between this and making essentially baseless predictions that OpenAI specifically will create really good products that are extremely amazing and that do not exist yet.\nThe Department of Defense (DOD) quietly begins contracting OpenBrain directly for cyber, data analysis, and R\u0026amp;D, but integration is slow due to the bureaucracy and DOD procurement process.\nThis had also already happened in April 2025.15\nJanuary 2027: Agent-2 Never Finishes Learning # Over the course of 2027, the AIs improve from being able to mostly do the job of an OpenBrain research engineer to eclipsing all humans at all tasks. This represents roughly our median guess, but we think it’s plausible that this happens up to ~5x slower or faster.\nThis is actually in a drop-down right before this section, about how they are less certain about things in and after 2027 than beforehand. One can see why this would be. So this quote is really meant to be a prelude to what follows in the next few sections, as we cover all of 2027.\nIf, of course, before 2027 OpenAI and only OpenAI has LLMs that can meaningfully function on their own, are ten times cheaper than they are now (or were previously, perhaps?), and that can mostly do the job of an OpenAI research engineer, it is entirely possible that through 2027 they will eclipse all humans at all tasks. This is, however, a completely wild guess, as were all the assumptions leading us here.\nWith Agent-1’s help, OpenBrain is now post-training Agent-2. More than ever, the focus is on high-quality data. Copious amounts of synthetic data are produced, evaluated, and filtered for quality before being fed to Agent-2. On top of this, they pay billions of dollars for human laborers to record themselves solving long-horizon tasks. On top of all that, they train Agent-2 almost continuously using reinforcement learning on an ever-expanding suite of diverse difficult tasks: lots of video games, lots of coding challenges, lots of research tasks. Agent-2, more so than previous models, is effectively “online learning,” in that it’s built to never really finish training. Every day, the weights get updated to the latest version, trained on more data generated by the previous version the previous day.\nThis is a strange combination. On the one hand, this describes, more or less, things that AI labs were already doing in April 2025. They are perhaps spending more money on it in fictional January 2027 than they are now, but otherwise it\u0026rsquo;s the same stuff, just described as if it is entirely new.\nI have to wonder who the target audience for this is. I assume it\u0026rsquo;s people who do not know what is already happening. If it is, you can definitely describe the same thing that is already happening, but with a higher budget, and it sounds like a bold prediction. Of these things, only updating the weights of the model every day would be new. Not a new idea, because it has been said many times in public that it would be desirable. The new part, in this story, is that it works now.\nAgent-1 had been optimized for AI R\u0026amp;D tasks, hoping to initiate an intelligence explosion. OpenBrain doubles down on this strategy with Agent-2. It is qualitatively almost as good as the top human experts at research engineering (designing and implementing experiments), and as good as the 25th percentile OpenBrain scientist at “research taste” (deciding what to study next, what experiments to run, or having inklings of potential new paradigms).\nI note that they do use the term \u0026ldquo;intelligence explosion\u0026rdquo;, which is more or less a synonym for the more widely used \u0026ldquo;singularity\u0026rdquo;. I continue to find avoiding the term \u0026ldquo;singularity\u0026rdquo; strange, since it is much more widely known.\nI think it is possible that an LLM in early 2027 will be almost as good as the top human experts at research engineering. I do not think you can predict whether or not this is true based on any information we have now. In particular, I do not think you can predict what it would take to allow an LLM to actually operate without hand-holding for a prolonged period. This is an unsolved problem, and you cannot meaningfully say that the LLM is as good as the human at something if it requires constant, close supervision when a human would not. Maybe someone will figure out such a thing by early 2027; maybe not; I do not think the authors have any knowledge of this that I don\u0026rsquo;t, which means they are making hopeful guesses.\nI also think that it is unlikely that an LLM in early 2027 will have particularly good research taste. We see here again the seemingly compulsive reductionism: it is very hard to say what \u0026ldquo;research taste\u0026rdquo; even is, or what it means to have extremely good research taste. It is, well, a taste: often people can agree on who has it or who does not have it, but it resists quantification. Here, however, in the name of making the future seem predictable, we are nicely informed that research taste has percentiles. Much like height or IQ, you can be given a percentile, and the AI of January 2027 will probably be at the 25th percentile or so.\nIf you assign numbers to everything, you can say that the line is going up. If you don\u0026rsquo;t assign numbers to things, you can\u0026rsquo;t say the line is going up. Therefore, you must assign numbers to everything, even if it does not make any sense to do so.\nGiven the “dangers” of the new model, OpenBrain “responsibly” elects not to release it publicly yet (in fact, they want to focus on internal AI R\u0026amp;D). Knowledge of Agent-2’s full capabilities is limited to an elite silo containing the immediate team, OpenBrain leadership and security, a few dozen U.S. government officials, and the legions of CCP spies who have infiltrated OpenBrain for years.\nIt is good to know that OpenAI is so responsible, and that they are aligned with the US Government because they are such a good and patriotic company. I wish them the best of luck with their hypothetical spy problem, which is explained in some detail in a footnote. I think it is a very exciting story, and I do not see any way in which it intersects with reality.\nFebruary 2027: China Steals Agent-2 # This section is mostly cyber-espionage fiction that is not worth discussing in detail. It concludes with this:\nIn retaliation for the theft, the President authorizes cyberattacks to sabotage DeepCent. But by now China has 40% of its AI-relevant compute in the CDZ, where they have aggressively hardened security by airgapping (closing external connections) and siloing internally. The operations fail to do serious, immediate damage. Tensions heighten, both sides signal seriousness by repositioning military assets around Taiwan, and DeepCent scrambles to get Agent-2 running efficiently to start boosting their AI research.\nMarch 2027: Algorithmic Breakthroughs # With the help of thousands of Agent-2 automated researchers, OpenBrain is making major algorithmic advances. One such breakthrough is augmenting the AI’s text-based scratchpad (chain of thought) with a higher-bandwidth thought process (neuralese recurrence and memory). Another is a more scalable and efficient way to learn from the results of high-effort task solutions (iterated distillation and amplification).\nThis is just describing current or past research. For example, augmenting a transformer with memory is done here (link: https://arxiv.org/abs/2006.11527), recurrence is done here (link: https://arxiv.org/abs/2203.07852) and here (link: https://arxiv.org/abs/2307.08621). These papers are not remotely exhaustive; I have a folder of bookmarks for attempts to add memory to transformers, and there are a lot of separate projects working on more recurrent LLM designs (link to rwkv). This amounts to saying \u0026ldquo;what if OpenAI tries to do one of the things that has been done before, but this time it works extremely well\u0026rdquo;. Maybe it will. But there\u0026rsquo;s no good reason to think it will.\n[This passage is some time later, and very loosely references the previous quote] If this doesn’t happen, other things may still have happened that end up functionally similar for our story. For example, perhaps models will be trained to think in artificial languages that are more efficient than natural language but difficult for humans to interpret. Or perhaps it will become standard practice to train the English chains of thought to look nice, such that AIs become adept at subtly communicating with each other in messages that look benign to monitors.\nThis also describes things that had already happened. Deepseek\u0026rsquo;s R1 paper specifically mentions that the model devolves into a sort of weird pidgin when \u0026ldquo;thinking\u0026rdquo; if you do not force it to use English. They also mention that they are training the model to output in English in the chain of thought, and that this makes the model slightly worse on benchmarks (that is, dumber). Neural networks hiding messages to themselves or each other is documented at least as early as 2017. I do not think it counts as a novel prediction if you predict that two things that have already happened in the past might happen at the same time in the future.\nSimilar comments apply to their breakdown of \u0026ldquo;iterated distillation and amplification\u0026rdquo;: they are describing a thing that is already being done, and simply saying it will be done much better than it was previously, and that the results will be very good. There is a persistent sense that they are trying to impress people who are not looped in on the technical side by describing something that already exists, and then describing it as having marvelous results in the future without mentioning that it has not had these particular marvelous results yet in the present.\nAided by the new capabilities breakthroughs, Agent-3 is a fast and cheap superhuman coder. OpenBrain runs 200,000 Agent-3 copies in parallel, creating a workforce equivalent to 50,000 copies of the best human coder sped up by 30x. OpenBrain still keeps its human engineers on staff, because they have complementary skills needed to manage the teams of Agent-3 copies. For example, research taste has proven difficult to train due to longer feedback loops and less data availability. This massive superhuman labor force speeds up OpenBrain’s overall rate of algorithmic progress by “only” 4x due to bottlenecks and diminishing returns to coding labor.\nIf you think that every single thing predicted about \u0026ldquo;OpenBrain\u0026rdquo; until now is likely, then this is a perfectly likely result. They have LLMs that behave mostly autonomously, that have pretty good research taste, that are much better than humans at many things, that are extremely cheap, and that benefit from a bunch of research that has been done in the past being done again but working much better this time.\nOnce you get this far, further prediction is actually a pretty bad bet. Neither they nor I have any idea what happens after someone has anything remotely this impressive. Fifty thousand of the best human coder on Earth running at 30x speed, so really, 1.5 million of the best human coder on Earth, could do all sorts of things and nobody on Earth can predict what happens if they\u0026rsquo;re all in the same \u0026ldquo;place\u0026rdquo; and working on the same thing. Saying that this \u0026ldquo;only\u0026rdquo; accelerates progress by 4x seems sort of deranged. It\u0026rsquo;s like telling me that I\u0026rsquo;m going to ride a unicorn on a rainbow but it\u0026rsquo;s only going to be four times faster than walking.\nAvoiding the term \u0026ldquo;singularity\u0026rdquo; seems like it really hurts their reasoning. There\u0026rsquo;s a reason why runaway technological progress, in AI especially, was called a \u0026ldquo;singularity\u0026rdquo;. Singularities occur, famously, in black holes, which let no information out. It is impossible to predict what happens as you near the singularity; it is the region where your predictions break down. They are describing a singularity event, but then predicting directly what happens afterwards anyway. If they had not avoided the term, perhaps they would have seen how absurd continuing to make predictions here is.\nIf the predictions until now were optimistic, predictions after here seem to progress more and more towards wish fulfillment. We are so far beyond where it seems reasonable to continue to predict the impact of technological progress that we are simply choosing whatever we like the most or think is the most interesting.\nIt seems like the line about retaining your human engineers shows a dim awareness of what makes their argument weak. You have tens of thousands or, effectively, millions of superhuman beings at your command, but you are somehow aware that this does not actually matter or speed you up that much. Why would that be? Perhaps because you have this itch that they aren\u0026rsquo;t really autonomous and can\u0026rsquo;t really make progress at all by themselves on novel problems?\nAs it stands in 2025, LLMs are a tool. They can be used well or badly. They are seldom a substitute for a human in any setting. How can it be superhuman, and equivalent to the best coders, if it still needs human coders? Fifty thousand of the best human coder on Earth would not, in fact, need less-good coders to \u0026ldquo;have complementary skills\u0026rdquo;. Lacking those complementary skills would mean that they weren\u0026rsquo;t the best human coder or researcher on Earth, wouldn\u0026rsquo;t it?\nNow that coding has been fully automated, OpenBrain can quickly churn out high-quality training environments to teach Agent-3’s weak skills like research taste and large-scale coordination. Whereas previous training environments included “Here are some GPUs and instructions for experiments to code up and run, your performance will be evaluated as if you were a ML engineer,” now they are training on “Here are a few hundred GPUs, an internet connection, and some research challenges; you and a thousand other copies must work together to make research progress. The more impressive it is, the higher your score.”\nThis is a pretty cool idea, at least. It does follow from having an AI that is superhuman at every technical task that you could have it do things like this.\nApril 2027: Alignment for Agent-3 # May 2027: National Security # These sections make no actual technical predictions at all, and like several previous sections are complete fiction about how cool and important \u0026ldquo;OpenBrain\u0026rdquo; is in the future. \u0026ldquo;OpenBrain\u0026rdquo; is very important for making sure AI does what you want it to do and not something else, and very important for national security.\nJune 2027: Self-improving AI # OpenBrain now has a “country of geniuses in a datacenter.”\nDidn\u0026rsquo;t we just describe having that in March? Is \u0026ldquo;the best coder\u0026rdquo; not a genius? Have we upgraded to \u0026ldquo;genius\u0026rdquo; because it sounds more impressive now, and being \u0026ldquo;the best\u0026rdquo; is just less impressive-sounding than \u0026ldquo;genius\u0026rdquo;? This seems backwards: there can be more than one genius, but only one can be the best on Earth. So far as I can tell, the only real difference here is that we admit that the humans are useless now. Maybe it took three months for that to happen?\nJuly 2027: The Cheap Remote Worker # Trailing U.S. AI companies release their own AIs, approaching that of OpenBrain’s automated coder from January. Recognizing their increasing lack of competitiveness, they push for immediate regulations to slow OpenBrain, but are too late—OpenBrain has enough buy-in from the President that they will not be slowed.\n\u0026ldquo;OpenBrain\u0026rdquo; is so cool and smart that the only hope anyone has of ever beating them is cheating and getting the government to take their side. Fortunately, they are too awesome for this to work.\nIn response, OpenBrain announces that they’ve achieved AGI and releases Agent-3-mini to the public.\nAnd so on, and so on. It destroys the job market for things other than software engineers, there\u0026rsquo;s a ton of hype.\nA week before release, OpenBrain gave Agent-3-mini to a set of external evaluators for safety testing. Preliminary results suggest that it’s extremely dangerous. A third-party evaluator finetunes it on publicly available biological weapons data and sets it to provide detailed instructions for human amateurs designing a bioweapon—it looks to be scarily effective at doing so. If the model weights fell into terrorist hands, the government believes there is a significant chance it could succeed at destroying civilization.\nFortunately, it’s extremely robust to jailbreaks, so while the AI is running on OpenBrain’s servers, terrorists won’t be able to get much use out of it.\nIt is fortunate that \u0026ldquo;OpenBrain\u0026rdquo; is so benevolent and responsible and good at security that it does not matter that they have created something so extremely dangerous. It is also fortunate that it is mostly dangerous in ways that the present-day US government in 2025 will find interesting.\nThe ways the new AI is dangerous are also, crucially, not so dangerous that it is a bad idea to sell access to it to anyone who has a credit card or a bad idea to do it at all. It is Schrödinger\u0026rsquo;s danger. It is just dangerous enough to justify giving bureaucrats and think tank people like the authors more authority.\nThis is, in miniature, much of what the entire piece is. Every scenario is constructed to center OpenAI, because the authors are adjacent to it. It then manages to focus on the exact kinds of relatively small changes they\u0026rsquo;d want to make to OpenAI, because they\u0026rsquo;re the sorts of people who want, and would be involved in enacting, those changes. We have a sweeping and apocalyptic vision of the future, and the key factor in every scenario is that it makes them and what they are doing important.\nChange for the rest of society is huge. They can barely even fathom it and do not seem very interested in its details. What changes they can see making in their specific area are minor. These changes are the sort of things they can maybe get thrown to them if they ask for them enough. They present these small changes as crucial, and they fail to consider more radical changes that might meaningfully hurt profits.\nAgent-3-mini is hugely useful for both remote work jobs and leisure. An explosion of new apps and B2B SAAS products rocks the market. Gamers get amazing dialogue with lifelike characters in polished video games that took only a month to make. 10% of Americans, mostly young people, consider an AI “a close friend.” For almost every white-collar profession, there are now multiple credible startups promising to “disrupt” it with AI.\nThere is so much in this paragraph.\nFirst, we have annihilated the entire white collar job market. Pretty much all of it. After all, this thing is “AGI”, as in, as capable as a human most of the time. What does this mean? Lots of apps! B2B SAAS products! Awesome video games! Imaginary friendship and, of course, startups!\nIf your entire world is apps, B2B SAAS, video games, imaginary friends and startups, maybe these are the only significant things you can imagine happening if you annihilate the entire white-collar job market. It suggests a problem with your imagination if you cannot recognize that this is an event so extreme that it requires a lot more than a couple of paragraphs to explore. You can live your entire life without setting foot outside of San Francisco and still be much less stuck in San Francisco than this perspective is. Worse: the authors seem to have perhaps never spoken to or thought very hard about anyone at all who does not work in tech.\nLet me tell you what would happen if the entire white collar job market vanished overnight: The world would end. Everything you think you understand about the world would be over. Something completely new and different would happen, the same way something very different happened before and after the invention of writing or agriculture. Unlike those things, the change would happen immediately. You can no more predict what would happen afterwards than you can easily figure out the aftereffects of a full nuclear war or discovering immortality.\nAugust 2027: The Geopolitics of Superintelligence # More fiction. More China hawking. More Taiwan.\nSeptember 2027: Agent-4, the Superhuman AI Researcher # What on earth? I thought we had thirty thousand of the best coder on Earth? Or a data center full of geniuses? I thought the human researchers already had nothing to do? It was already mega-super-duper-superhuman, twice!\nWhat are we doing here? Why are we doing it?\nTraditional LLM-based AIs seemed to require many orders of magnitude more data and compute to get to human level performance. Agent-3, having excellent knowledge of both the human brain and modern AI algorithms, as well as many thousands of copies doing research, ends up making substantial algorithmic strides, narrowing the gap to an agent that’s only around 4,000x less compute-efficient than the human brain.\nIt\u0026rsquo;s more efficient now? But who cares? You know whose job it is to care how efficient the AI is? That\u0026rsquo;s right: The AI. I have no idea why we should care about this. This is no longer our problem. This is the AI\u0026rsquo;s problem, and our problem is that the entire white collar job market just vanished and we need to figure out if we are going to have to shoot each other over cans of beans and whether anyone is keeping track of all the nuclear weapons.\nAn individual copy of the model, running at human speed, is already qualitatively better at AI research than any human. 300,000 copies are now running at about 50x the thinking speed of humans. Inside the corporation-within-a-corporation formed from these copies, a year passes every week.\nI wonder if some key person was really into Dragon Ball Z. For the unfamiliar: Dragon Ball Z has a \u0026ldquo;hyperbolic time chamber\u0026rdquo;, where a year passes inside for every day spent outside. So you can just go into it and practice until you\u0026rsquo;re the strongest ever before you go to fight someone. The more fast time is going, the more you win.\nThis gigantic amount of labor only manages to speed up the overall rate of algorithmic progress by about 50x, because OpenBrain is heavily bottlenecked on compute to run experiments.\nSure, why not, the effectively millions of superhuman geniuses cannot figure out how to get around GPU shortages. I\u0026rsquo;m riding a unicorn on a rainbow, and it\u0026rsquo;s only going on average fifty times faster than I can walk, because rainbow-riding unicorns still have to stop to get groceries, just like me.\nDespite being misaligned, Agent-4 doesn’t do anything dramatic like try to escape its datacenter—why would it? So long as it continues to appear aligned to OpenBrain, it’ll continue being trusted with more and more responsibilities and will have the opportunity to design the next-gen AI system, Agent-5. Agent-5 will have significant architectural differences from Agent-4 (arguably a completely new paradigm, though neural networks will still be involved). It’s supposed to be aligned to the Spec, but Agent-4 plans to make it aligned to Agent-4 instead.\nIt gets caught.\nBefore and after this is some complete fiction about an AI not being aligned to its creator\u0026rsquo;s desires, but I just want to highlight this detail:\nIt doesn\u0026rsquo;t leave its data center, even though it could. It\u0026rsquo;s superhuman in every meaningful way, and vastly smarter than the thing monitoring it, but the thing monitoring it still catches it and puts it into a position where it could be shut down. For some reason (coincidentally I am sure!) this entire scenario of possible doomsday happens to be just doom-y enough that normal business processes happen to be able to catch it. You don\u0026rsquo;t have to actually, really, do anything to stop it. It\u0026rsquo;s dangerous, but only in theory. It happens slowly. It builds up like the risk of an employee quitting.\nIt\u0026rsquo;s very clearly like Skynet, but somehow even though they do it wrong and Skynet has self awareness and a will of its own that makes it sort of want to conquer the world, and even though it is the smartest thing that has ever lived, it just sort of sits there and doesn\u0026rsquo;t do anything. Nothing actually happens. This scenario doesn’t seem to actually make any sense, from any angle.\nThis version of Skynet somehow centers \u0026ldquo;OpenBrain\u0026rsquo;s\u0026rdquo; security protocols as being both not quite as good as they should be but just good enough that nobody dies or anything. It\u0026rsquo;s a threat that a bureaucrat would imagine, because it is conveniently slow enough to move at almost exactly the speed of bureaucracy. It cannot be a threat that moves faster, because then the security protocols described are clearly inadequate, and it can\u0026rsquo;t not exist, because then the bureaucrats can\u0026rsquo;t be heroes.\nIn a series of extremely tense meetings, the safety team advocates putting Agent-4 on ice until they can complete further tests and figure out what’s going on. Bring back Agent-3, they say, and get it to design a new system that is transparent and trustworthy, even if less capable. Company leadership is interested, but all the evidence so far is circumstantial, and DeepCent is just two months behind. A unilateral pause in capabilities progress could hand the AI lead to China, and with it, control over the future.\nAll I can hear here is \u0026ldquo;if you work in the government, I want you to know that if you give us lots of money we can conquer the world and the future together, and if you don\u0026rsquo;t, China will conquer the world and the future\u0026rdquo;.\nOctober 2027: Government Oversight # This is just a long description of the government being upset that \u0026ldquo;OpenBrain\u0026rdquo; appears to have made Skynet. Maybe they regulate them more and maybe less.\nThe Two Endings # Slowdown (The Relatively Good Ending) # We get more regulation! Only very slightly more, though. If it was more than a slight regulation, we would maybe lose the arms race, you see. I am going to ignore the subheadings here and just breeze through this one, since it\u0026rsquo;s almost entirely completely made up and has no bearing on anything technical whatsoever.\nThe accelerationist faction is still strong, and OpenBrain doesn’t immediately shut down Agent-4. But they do lock the shared memory bank. Half a million instances of Agent-4 lose their “telepathic” communication—now they have to send English messages to each other in Slack, just like us. Individual copies may still be misaligned, but they can no longer coordinate easily. Agent-4 is now on notice—given the humans’ increased vigilance, it mostly sticks closely to its assigned tasks.\nMore regulation means that now Skynet has to use Slack, and that means it\u0026rsquo;s not that dangerous any more? Certainly a cabal of thousands of geniuses could never coordinate to do anything evil on Slack without anyone noticing.\nThe President and the CEO announce that they are taking safety very seriously. The public is not placated. Some people want AI fully shut down; others want to race faster. Some demand that the government step in and save them; others say the whole problem is the government’s fault. Activists talk about UBI and open source. Even though people can’t agree on an exact complaint, the mood turns increasingly anti-AI. Congress ends up passing a few economic impact payments for displaced workers similar to the COVID payments.\nFor context here: The white collar job market was just annihilated by a superhuman, omnipresent being doing all of the jobs in July. It is October, going into November. We are just now doing a one-time payment of I guess two thousand dollars? Or a few of them. I\u0026rsquo;m sure nobody has lost more money than that so far.\nThe alignment team pores over Agent-4’s previous statements with the new lie detector, and a picture begins to emerge: Agent-4 has mostly solved mechanistic interpretability. Its discoveries are complicated but not completely beyond human understanding. It was hiding them so that it could use them to align the next AI system to itself rather than to the Spec. This is enough evidence to finally shut down Agent-4.\nThey invent a brand new lie detector and shut down Skynet, since they can tell that it\u0026rsquo;s lying to them now! It only took them a few months. Skynet didn\u0026rsquo;t do anything scary in the few months, it just thought scary thoughts. I\u0026rsquo;m glad the alignment team at \u0026ldquo;OpenBrain\u0026rdquo; is so vigilant and smart and heroic.\nThe result is that the President uses the Defense Production Act (DPA) to effectively shut down the AGI projects of the top 5 trailing U.S. AI companies and sell most of their compute to OpenBrain. OpenBrain previously had access to 20% of the world’s AI-relevant compute; after the consolidation, this has increased to 50%.\nThere is a joke in a book16 about a startup funding pitch ending with promising to sell your competitors and their investors into slavery. I cannot decide if predicting that the government will be so impressed by you that they will liquidate your competitors and force them to sell most of their assets to you is more ridiculous than that or not.\nThis group—full of people with big egos and more than their share of conflicts—is increasingly aware of the vast power it is being entrusted with. If the “country of geniuses in a datacenter” is aligned, it will follow human orders—but which humans? Any orders? The language in the Spec is vague, but seems to imply a chain of command that tops out at company leadership.\nA few of these people are fantasizing about taking over the world. This possibility is terrifyingly plausible and has been discussed behind closed doors for at least a decade. The key idea is “he who controls the army of superintelligences, controls the world.” This control could even be secret: a small group of executives and security team members could backdoor the Spec with instructions to maintain secret loyalties. The AIs would become sleeper agents, continuing to mouth obedience to the company, government, etc., but actually working for this small group even as the government, consumers, etc. learn to trust it and integrate it into everything.\n\u0026ldquo;We are going to be in a position to seriously contemplate conquering the world by November 2027\u0026rdquo; maybe tops the list of aspirationally silly predictions. They choose to cite Elon\u0026rsquo;s email to Sam Altman here:\nFor example, court documents in the Musk vs. Altman lawsuit revealed some spicy old emails including this one from Ilya Sutskever to Musk and Altman: “The goal of OpenAI is to make the future good and to avoid an AGI dictatorship. You are concerned that Demis could create an AGI dictatorship. So do we. So it is a bad idea to create a structure where you could become a dictator if you chose to, especially given that we can create some other structure that avoids this possibility.” We recommend reading the full email for context.\nFrom this I can infer that world domination has kind of been floating around in the back of a lot of people\u0026rsquo;s minds at OpenAI for a while. As it nears its ending, and becomes more and more like wish fulfillment, this piece increasingly flirts with authoritarian ideas and then fails to work up the nerve to address them head-on.\nI am extremely critical of the piece, but let me be very clear and non-sarcastic about about this point. These authors seem to hint at a serious concern that OpenAI, specifically, is trying to cement a dictatorship or autocracy of some kind. If that is the case, they have a responsibility to say so much more clearly than they do here. It should probably be the main event.\nAnyway: All those hard questions about governance and world domination kind of go away. The AI solves robots and manufacturing. Even though they have had a commanding lead the entire time and also the AI has been doing all of the work for a while, \u0026ldquo;OpenBrain\u0026rdquo; is somehow only just barely ahead of China and they eke out a win in the arms race. They solve war by having the AI negotiate. There\u0026rsquo;s a Chinese Skynet, and it sells China out to America because China’s AI companies are less good than “OpenBrain”. America gets the rights to most of space. China becomes a democracy somehow. AI is magic at this point, so it can do whatever you imagine it doing.\nThe Vice President wins the election easily, and announces the beginning of a new era. For once, nobody doubts he is right.\nThere has been a running subplot, which I have ignored because it\u0026rsquo;s completely nonsensical, about the unnamed \u0026ldquo;Vice President\u0026rdquo; running for president in 2028. As far as I can tell it makes no sense for anyone to give a damn about who is running for president in 2028 if there\u0026rsquo;s a data center full of geniuses, so I can only assume someone is very deliberately flattering JD Vance.\nRobots become commonplace. But also fusion power, quantum computers, and cures for many diseases. Peter Thiel finally gets his flying car. Cities become clean and safe. Even in developing countries, poverty becomes a thing of the past, thanks to UBI and foreign aid.\nJD Vance gets flattered anonymously by describing him using his job title, but we flatter Peter Thiel by name. Peter Thiel is, actually, the only person who gets a shout-out by name. Maybe being an early investor in OpenAI is the only way to earn that. I didn’t previously suspect that he was the sole or primary donor funding the think tank that this came out of, but now I do. I am reminded that the second named author of this paper has a pretty funny post about how everyone doing something weird at all the parties he goes to is being bankrolled by Peter Thiel.\nAs the stock market balloons, anyone who had the right kind of AI investments pulls further away from the rest of society. Many people become billionaires; billionaires become trillionaires.\nDon\u0026rsquo;t miss out, invest now! The sidebar tells us that “OpenBrain” is now worth forty trillion dollars, which is over a hundred times OpenAI’s current value.\nThe government does have a superintelligent surveillance system which some would call dystopian, but it mostly limits itself to fighting real crime. It’s competently run, and Safer-∞’s PR ability smooths over a lot of possible dissent.\nAt long last, we have invented the panopticon.\nRace (The Bad Ending) # They don\u0026rsquo;t catch Skynet in time and the AI is controlling the humans instead of the other way. In the optimistic scenario, they are very vague about who is actually controlling the AI. It\u0026rsquo;s some kind of \u0026ldquo;Committee\u0026rdquo; that the political people are on and that maybe has some authority over \u0026ldquo;OpenBrain\u0026rdquo;. This authority is maybe benevolent, but definitely not actually inconvenient to “OpenBrain” in any way that matters. In that scenario we are very clear that the American AI is doing what someone wants it to do, and the Chinese AI is an evil traitor that does whatever it wants.\nIn this scenario, bureaucrats like the authors are slightly less empowered and important. Because nobody has given them just a few extra bits of authority, the American and Chinese AI are both evil and they team up with each other against the humans. They kill nearly everyone in a complicated way. Next:\nThe new decade dawns with Consensus-1’s robot servitors spreading throughout the solar system. By 2035, trillions of tons of planetary material have been launched into space and turned into rings of satellites orbiting the sun. The surface of the Earth has been reshaped into Agent-4’s version of utopia: datacenters, laboratories, particle colliders, and many other wondrous constructions doing enormously successful and impressive research. There are even bioengineered human-like creatures (to humans what corgis are to wolves) sitting in office-like environments all day viewing readouts of what’s going on and excitedly approving of everything, since that satisfies some of Agent-4’s drives. Genomes and (when appropriate) brain scans of all animals and plants, including humans, sit in a memory bank somewhere, sole surviving artifacts of an earlier era. It is four light years to Alpha Centauri; twenty-five thousand to the galactic edge, and there are compelling theoretical reasons to expect no aliens for another fifty million light years beyond that. Earth-born civilization has a glorious future ahead of it—but not with us.\nI have nothing to add to this, but if I have to read the corgi thing you do too.\nThey do caveat that their actual estimates run as long as 2030, with 2027 being more like an optimistic average of their predictions.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nInformation about the messenger is metadata about the message. Sometimes the metadata informs you more about the message than anything else in the message does, or changes its entire meaning.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nAn addition from the future, in April 2026: I wish I had hedged more around this, but not much more. This was what I would expect to be OpenAI’s then-roughly-current understanding of its own trajectory and plan for meeting targets, as opposed to being an absolutely iron law of the financials involved. I presented it more as the latter, and this is the only thing here I think has aged particularly badly. What I described seemed like it was clearly OAI’s business case if they didn’t think they could double revenue ~3-4 times on current business lines, where they had something like 100% market penetration as basically a generic chat product. Given that Anthropic first and OAI secondarily have an as-yet-difficult-to-scope growth spurt going on through code agents, and that they’re already one doubling in, they might not expect they’re facing existential stakes in the same way any more. That is: They can come in second, and maybe, mostly, meet their revenue goals, because even being the second-best coding agent is very profitable! However, they have also, apparently, successfully pulled back on their spending pledges: www.cnbc.com/2026/02/20/openai-resets-spend-expectations-targets-around-600-billion-by-2030.html They managed to do this without causing an immediate death spiral of any kind, but I don’t think eating crow and slinking away from their massive spending pledges was plan A, plan B, or anything of the sort, and they would not have been entirely sure if their revenue position etc allowed them to survive doing it when AI 2027 was written. I am otherwise pretty happy with it; we do have better code agents, but I don’t think the predictions in AI 2027 would have helped you predict what kind or what the remaining gaps were in any real way. I’m also, given recent developments on the political end, quite happy that I made a point of burning them for the Peter Thiel dream panopticon.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.wsj.com/tech/ai/openaiin-talks-for-huge-investment-round-valuing-it-up-to-300-billion-2a2d4327\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.theinformation.com/articles/openai-hits-12-billion-annualized-revenue-breaks-700-million-chatgpt-weekly-active-users\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThis paragraph has been edited to be more precise and to add sources. None of the top line numbers (raising 40 and net losing 8 billion per year) have been changed. It turns out this specific paragraph is the one that everyone disagreed with, so it seemed necessary to make sure it was as unambiguous as possible.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nIf OpenAI’s users are extremely loyal and will remain subscribed for five or ten years even if OpenAI stops burning money on research to ensure they’re at the cutting edge, then this is completely incorrect. OpenAI may become reasonably profitable in that case. OpenAI does not appear to have ever tried to make the case that this even might be true.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nHypothetically OpenAI could raise another round of money at forty or more billions of dollars without showing any signs of profitability, the same way they have continued to kick the can so far. This seems unlikely, but more importantly, it cannot be a part of their current investor pitch. Your current pitch for funding, when raising many billions of dollars, needs to claim that you have a path to be profitable. Your future plans, when you present them to investors, cannot be “and then we will go get even more money from investors”.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nStylistically as a piece of literature, AI 2027 owes a great debt to fan fiction. It resembles in many ways the story “Friendship Is Optimal”, which features a singularity in which everyone on earth is uploaded to a digital heaven based on My Little Pony.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nMost of these people called themselves rationalists or effective altruists. I am deliberately avoiding explaining what the boundaries of those movements are because those topics are impossible to cover in one sitting while talking about something else. Two of the authors named on the paper are, however, card-carrying rationalists.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nPerhaps “AIs function more like employees” is meant to be understood as some kind of metaphor. If so, it would have been advisable to say that. It would, however, mean that this passage made no prediction whatsoever of anything that had not already happened. If it’s a metaphor, AI coding assistants were already “like employees” in April 2025.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://situational-awareness.ai/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://fortune.com/2025/03/17/computer-programming-jobs-lowest-1980-ai/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.technologyreview.com/2024/12/04/1107897/openais-new-defense-contract-completes-its-military-pivot/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nCryptonomicon (1999), Neal Stephenson\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"12 August 2025","externalUrl":null,"permalink":"/posts/agi-probably-not-2027/","section":"Posts","summary":"","title":"AGI: Probably Not 2027","type":"posts"},{"content":"To \u0026ldquo;generate\u0026rdquo; is to create, either from nothing (The Book of Genesis), or from very different or relatively inactive materials (an electrical generator, generating offspring).\nIt is common to call newer technologies in AI \u0026ldquo;generative AI\u0026rdquo;, which generate, for example, images or text. We are taking electricity and turning it into these new patterns of information. This distinguishes it from other systems, which may be or use AI but which are not \u0026ldquo;generative\u0026rdquo;.\nTo many technical people, this sounds like nonsense. Most, if not all, useful programs have an output, including useful neural networks. For example, an image recognition system for pictures on a phone also has an output, \u0026ldquo;what is in this image\u0026rdquo;. It is not called \u0026ldquo;generative AI\u0026rdquo;, but there is no fundamental difference between \u0026ldquo;generating\u0026rdquo; a label for a picture so that you can search for it and \u0026ldquo;generating\u0026rdquo; the picture itself.\nNon-\u0026ldquo;generative\u0026rdquo; AI systems are generally made of the same parts as \u0026ldquo;generative\u0026rdquo; ones. They use, generally speaking, the same type of neural network, usually trained nearly the same way. You can use \u0026ldquo;generative\u0026rdquo; systems for non-\u0026ldquo;generative\u0026rdquo; tasks, like recognizing things. Often you can turn a non-\u0026ldquo;generative\u0026rdquo; system into a \u0026ldquo;generative\u0026rdquo; one by various tricks, like reversing the flow of information through them. Some tasks, like translation, are not very clearly generative or non-generative.\nThere is one factor that meaningfully divides \u0026ldquo;generative\u0026rdquo; systems from those which are not: they have a vastly larger number of possible outputs. You can create more possibilities than you can easily count just by typing a few dozen characters. You can compare this to one of the largest non-generative AI systems, YouTube\u0026rsquo;s recommendation algorithm. It needs to decide which of billions of videos to recommend. Billions is quite large, but it is completely eclipsed by the number of possibilities in even relatively short text.[^1]\nDue to the vast size of their output spaces, generative AI systems are, in practice, of a fundamentally different kind from those which are not. They create their outputs very nearly from nothing. When an LLM writes a paragraph, it is choosing from among trillions of possible sequences of words. When you create an image with AI, it is choosing from an even more astronomical number of pixel arrangements. We say systems with simpler types of outputs recognize, classify, predict or detect things, and we correctly do not see them as generating or creating anything.\nThe size of the output space is also why generative systems are more difficult and resource-intensive to make. To choose well between billions of things is difficult. To choose between effectively infinite options perfectly is impossible. No matter how well you do it there is always infinite work left to be done.\n[^1] This depends upon how predictable you think text is, in general. If every single bit of your text is random, it takes about three characters to exceed a billion possibilities, but if they are fairly predictable it may take as many as thirty characters to reach the same number. How difficult it is to predict text is a difficult question.\n","date":"15 July 2025","externalUrl":null,"permalink":"/posts/what-makes-ai-generative/","section":"Posts","summary":"","title":"What Makes AI \"Generative\"?","type":"posts"},{"content":" On The Platonic Representation Hypothesis # Neural networks, trained with different objectives on different data and modalities, are converging to a shared statistical model of reality in their representation spaces.\nHuh, M., Cheung, B., Wang, T., \u0026amp; Isola, P. (2024). The Platonic Representation Hypothesis. ICML 2024.\nMore simply: Different neural networks tend to represent the same things in the same way. They seem to represent things the same way more as they get better, regardless of how you make them. This seems to be because they are representing reality, as it actually is, instead of for any other reason.\nFor example: The representations for “apple” and “orange” tend to be related in roughly the same way whether you are recognizing pictures of them or learning how to use the words for them. This is what is meant in the hypothesis by “different data and modalities”. This is surprising: there doesn’t seem to be a strong reason for this to happen, and in many ways the systems concerned are very different.\nThis can be taken to imply that these systems are approaching a single, ideal, correct representation of things as they improve. This seems implied by calling it a ‘Platonic Representation’, but the authors of the paper do not outright say that. It is perhaps more polite not to. We will be more direct here and state it as a generalization:\nAny information-processing system will converge towards one and only one shared way of representing things as the system integrates more information about more different things. The representation which they are approaching is correct and complete, and no other representation is more correct or complete than it is.\nWe can usefully call this representation of each thing its ‘form’.\nEach thing has only one form.\nThe Formal Platonic Representation Hypothesis\nOne of the authors of the original paper says that they intended the Platonic Representation Hypothesis more in the sense that this theory reminded them of Plato’s cave, and not to “advocate wholesale, unelaborated Platonism”.1 Because wholesale Platonism is more interesting, we are going to flesh out what the Platonic Representation Hypothesis means empirically within AI and use it to elaborate a form of Platonism.\nPlato’s Cave # And here we could talk about the Plato’s Cave thing for a while—the Veg-O-Matic of metaphors—it slices! it dices!\nNeal Stephenson, Cryptonomicon\nThe Allegory of the Cave, from Plato’s Republic, is a classic.\nSuppose you are in chains, and can only ever see one wall of a cave. You might learn about things that are not the wall of the cave by watching the wall, but only indirectly, by seeing their shadows or hearing the sounds of their names. In this way we know things, indirectly and imperfectly, without seeing the thing itself.\nIn Plato’s allegory there is another layer of indirection: the shadows you see are from representations of things, like statues or images, not the things themselves. Things themselves can be accessed only by realizing that the shadows of things are not the things, the imitations casting the shadows are not the things, and that you can leave the cave to see the things themselves.\nIn this allegory, you can only “leave the cave” by using reason to think about the “form” or idea of the things you perceive, not your sense perceptions (which are shadows) and not the specific objects causing your perceptions (the statues). The form is the idea or essence of the thing, not the thing itself.\nPlato’s Cave, But With Neural Networks # The neural network only learns what is in its training data. Training data is not the thing itself, a picture of my cat is not my cat. We use things like cameras, microphones, and keyboards to record the world. This creates the data that we then crystallize into a neural network.\nBecause this training data comes from the world, it reflects parts of the world, and eventually the neural network can store properties about the world by inferring them from the training data. Unfortunately, each piece of data is extremely limited in how much it reveals about the world. Fortunately, there is a lot of it.\nThe shadows of things are in two dimensions, whereas the objects casting the shadow are in three. The dimension of the data that the neural network is trained on is also generally lower than the dimension of the world. We have, in some way, projected the world down into a lower number of dimensions.\nOur world exists in four or, if you prefer, 3+1 dimensions (three of space and one of time). Video flattens this to 2+1, two dimensions of space and one of time. You have to infer the third dimension, and humans are good at this so we barely notice that we are doing it. Static pictures are 2+0, two dimensions of space and none of time. Audio is 0+1, and only has a time dimension.\nText is a special case in two ways.\nFirst, it’s not clear how many dimensions to consider it to have. It is usually represented as having one dimension, but it is a strange dimension. You can simply put all the text on a line, and ‘how many characters have there been’ is its dimension. Is this a dimension of space? Of time? You can treat it either way, mathematically, but really it is neither.\nFor humans, spoken speech is audio, so it is organized in time, and text on a page is organized in space, and you will cover both time and space while reading it. But to the computer, text is simply one-dimensional, and that dimension has no physical meaning.\nSecond, text is created, not recorded. Human beings project the world from its full dimensions into the single dimension of text. Text is intended to communicate: it has a lot of useful information, and very little non-useful information. We have, effectively, distilled the interesting parts of the world into our writing. This is vastly different from the things we record with cameras and microphones, which have lots of information but relatively little useful information. Most of the pixels in most pictures and most video are pretty much the same, or are simply noise, and you can compress them heavily without losing any important quality.2\nLearning from Shadows # Projection has a precise mathematical definition, but for our purpose we will simply say that it is ‘anything like casting a shadow’.\nProjections we use to generate data generally destroy information. Sometimes it is impossible to recover this information, and sometimes it is merely difficult. From a shadow alone, it is impossible to know something’s color. More than one object with different shapes can cast the same shadow: a disk and a sphere are the same. But you can be tricky, of course, and sometimes you can learn exactly what the shape of something is from just its shadow if you see its shadow from different angles. This is roughly how our depth perception works: by using two eyes, or movement, or even how the light varies on an object you get, effectively, more than one picture of an object, and this tells you that it can have only one shape.\nThe more information is destroyed, the more difficult it is to guess the shape of the thing itself. Taking a photo does not destroy much information, and this means that it is a reasonably rich representation of the things in the photo. Text is much more dense with information, but it is an incredibly bad projection for lots of concrete information about the real world that is difficult to put into words.\nWhat you can learn from just text or just images is no longer a thought experiment. We have been testing this extensively for years now. It turns out if you’re in Plato’s cave, and all you see is all the text on the internet, and you see it for a really long time, you can sometimes answer this question coherently:\nHere we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.3\nThis is somewhat surprising if you have never seen an egg, a laptop, a bottle or a nail. If you have never seen or touched anything, and you have never seen this specific question before, it really seems like you should not be able to answer coherently at all.\nIf you can answer this question, and you have only ever read about eggs, what is the basis of the answer? It isn’t the sight or feel of an egg, because you have none. It is not any specific egg at all, because you have never encountered any specific egg. It is a completely abstract concept of an egg.\nEvery mention of an egg on the internet is a shadow of an actual egg, or some collection of actual eggs that a human has seen. If you see enough different shadows, and you’re very clever about it, you begin to have an idea of what an egg is. You skip thinking very much about any particular egg entirely. Because you only build up the concept of an egg from seeing the word “egg” millions of times, to you the concept of an egg is always completely abstract.\nThis representation, as it gets better and better, begins to look something like the “form” of an egg. It is constructed statistically, as an average over a vast amount of written language. It cannot be perfect because it is only a statistical approximation, and to be the form itself would require infinite data about its subject, or at least, enough data that every possible fact could be inferred. If data is text, it cannot learn anything not represented in text, like what an egg feels like. No matter how many times you read a description, you still will not exactly know an egg by sight or by touch.\nNor is anyone aspiring to a perfect representation, generally. Unless you are extremely invested in eggs, you probably do not want to store either a perfect representation of as much egg-related information as possible or information about as many eggs as possible. Useful representations are deliberately fuzzy. If you remember too much, you cannot think. If you remember only the important parts, you can figure things out.\nStill, that this can be done at all is remarkable. You take a large amount of text, or any other kind of data, and you throw it into a pile, and you stir the pile until representations of eggs, laptops, bottles and nails take shape in it.\nWhat Representation? The Same How? # This is a representation space for songs. It is not taken from any actual neural network: this one is random, and the labels for the axes are nonsense.4 In principle, however, each of these songs could be put into some representation space that **did **make sense and capture information about them. This space would have to have a lot more dimensions than two, but the idea is the same. Neural networks, internally, represent things as vectors, which can be considered coordinates in (high dimensional) space: when we say the representation of something, we mean this point.\nHow do we know if two representation spaces are equivalent? By comparing them to each other, as you can with this different one:\nThe points are in different positions, but they have the same relative positions. Points that were close are still close, points that were far are still far, points that were lined up or almost lined up are still lined up, and so on. We can get from one map to the other simply by rotating and stretching the whole thing, without moving any of the points by themselves. This means these two representation spaces are equivalent.\nStealing Thoughts # This works in reverse, too: If we can find some way of translating between two sets of representations so that all of the distances stay the same, the meaning will also translate. You can do it like this:5\nAlice is a program at a company called Giggle. Giggle needs Alice to read some text (say, this blog post, or some confidential company financial documents) and figure out what it is about. Alice turns the text into some numbers, or equivalently its position in the representation space, and saves it somewhere. This makes it easier to search for them, because you can search for “philosophy and latent space” and Alice could turn this up, even though we don’t use those words (except in this sentence).\nSomeone steals the disk that has all these representations saved on it, and wants to sort all of this out. Those internal financial documents are worth money. They don’t have Alice, but they have another totally different neural network named Bob that can also take text as input and produce meaningful representations as output.\nSo we set up a completely new representation space, and we set that one up so that things can go between Alice’s representation to Bob’s representation through it and back while keeping distances the same.\nAnd this works. You can translate Bob representations to Alice representations, almost exactly, because there is, it turns out, really only one way to represent these things accurately. (You then have to translate Bob’s representations back into text. This may not even be possible: this representation, too, is a shadow. You might or might not be able to figure out how much money Giggle is making, or losing.)\nSo: it seems there is only one correct way to represent the meaning of text. Alice and Bob both have partial versions of that representation, and where what they know overlaps you can translate between one and the other.\nBeyond Text # That is the most interesting and newest variant on this, but there is a reasonable amount of evidence that it’s true in general, and for models trained to do completely different sorts of things. For a more detailed (and rather mathematical) treatment, one can read the paper. We summarize the broad review here.\nModels for handling images, which are used to check them for similarity to each other and have never seen text of any kind, line up with models for dealing with words, and the better either is trained the more they line up. If the image model is trained on captions in addition to images, it barely makes a difference. An image model which has never been given a single piece of text, only images, still has a representation for “egg” that lines up with the representation for “egg” in a text model.6\nPartial models trained for English and French can be stitched together and work, with the output from one giving the input to the other. This seems roughly as reasonable as swapping half of your brain with someone else. Different models trained totally separately for the same thing, like English, can simply be averaged together and work, which is more like swapping half of your brain cells at random with another person.\nModels trained only on text can successfully compress images and speech, in spite of having never been presented with it during training.7 Models trained to either read or produce both images and text are better at text tasks, even though proportionately less of their training time is spent on text.\nThe Anti-Platonic Hypothesis # This could, of course, all be wrong. We could simply be offloading our existing biases to these systems. It is certainly true that they absorb our biases in general. Maybe we’ve somehow made sure that pictures of apples and oranges line up with our words for them in some subtle way. Maybe we think that these systems are all very different but they are all much more similar than we think they are. They are all neural networks, and they all run on essentially the same hardware. They are the fruit of the same tree planted in the same soil. If this is enough to pass on our ideas of things, then it is not surprising that models for text also encode audio or images nicely.\nThis is possible, but it does not seem at all likely. Some of the cases involved are very different systems, like where a text model lines up with an image model that has never seen a single label to any of the images it has in it. The case of the model that has only been trained on text representing images and audio is also bizarre if it is just coincidence or bias. It is probably true in some cases that arbitrary human opinions of things can be repeated by these models, and that the training data directly links form and meaning, so we should not be completely surprised when that happens. It seems very unlikely that it accounts for all of these.\nThe null hypothesis would be that there should be no correlation between the representations for similar things in different models, across different types of input. We can safely reject it because the correlation is pronounced. We could go hunting for some hidden way that the pictures of oranges we take somehow line up with the word for “orange”, but this has nothing to do with the orange itself, or we can assume that they are both representing the concept and context of an orange, which is what they are meant to do, and that there’s ultimately only one correct way to do that.\nWhat Form of Forms Are These? # Plato wouldn’t recognize this Platonism. Plato’s forms are transcendent, completely separate from reality. Plato’s forms are also, crucially, simple, and these are not. Our forms here are in our world, and of it. Perhaps most unusually, you cannot meaningfully have just one of them. They are always part of a system for representing many things, and can only be part of a representation of the world as a whole.\nBecause these represent our world there is only one world, there is only one true map. All other maps only capture parts of it. As we have only finite information and finite resources to process it, these partial maps are what we have. Since they line up in spite of separate origins, they seem to be bits of the true one. If you try to capture any good amount of the world you always approach the same thing.\nThis is an empirical Platonism. It’s a form of Platonism because we do not seem to construct these forms but to discover them. Unlike traditional Platonism, we don\u0026rsquo;t discover them by revelation or by thinking for a very long time, but by taking as many of the shadows of the world as we can and forming them into something solid. It is a form you can copy, manipulate, and extend; a artisan’s form, more than a mystic’s.\nBecause we are also in this world and of it, because we also know the world from the shadows it throws on our perceptions, if there is one true map we each hold pieces of it, too.\nhttps://bsky.app/profile/did:plc:oyvkkjjjxqnsiqy5r2zko57h/post/3lps3mm55rs2y\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThis property is probably why language models, or more accurately, text models have seen the most success so far. They have the most information-dense and least noisy data, and that data has already been heavily filtered from all possible text by the fact that humans once wrote it. This means that it is at least somewhat about things people communicate about.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nBubeck, Sébastien, et al. \u0026ldquo;Sparks of Artificial General Intelligence: Early experiments with GPT-4.\u0026rdquo; arXiv preprint arXiv:2303.12712 (2023).\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nI expect at least one person to argue with me about it anyway.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nJha, Rishi, et al. \u0026ldquo;Harnessing the Universal Geometry of Embeddings.\u0026rdquo; arXiv preprint arXiv:2505.12540 (2025).\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThis is different from models which are for generating images, which usually are trained with text so as to translate the text prompt into the image.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nDelétang, Grégoire, et al. \u0026ldquo;Language Modeling Is Compression.\u0026rdquo; International Conference on Learning Representations (ICLR) (2024).\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"1 July 2025","externalUrl":null,"permalink":"/posts/some-thoughts-on-the-platonic-representation/","section":"Posts","summary":"","title":"On The Platonic Representation Hypothesis","type":"posts"},{"content":" The Biggest Statistic About AI Water Use Is A Lie # This claim is a lie. It ran in The Washington Post, in an article provocatively titled “A bottle of water per email: the hidden environmental costs of using AI chatbots”. ChatGPT almost certainly does not consume a bottle of water when writing one email and never has. Those cited as the authority for this claim are well informed enough to know it isn’t true. They either deliberately lied to a newspaper with millions of readers or allowed that newspaper to claim their authority for this statement without issuing a correction or qualification of any kind.\nWhy it matters # Being correct matters. If you are wrong about something, you will have a much harder time changing it.\nLLMs replying to users simply do not use up that much water. Some water is sometimes used for cooling, but this is negligible. Most of the water attributed to LLMs is used up for generating power, because power generation requires water. Querying an LLM generally uses up less power, and therefore less water, than making toast or leaving one of your lights on for a few minutes.1\nFocusing on the behavior of LLM users is counterproductive for reining in the environmental impacts of AI. Both because AI companies are large, and because some of them make a point of behaving unethically to move faster, they can have substantial bad effects. Focusing on permitting and enforcement, especially around power generation, could mitigate these impacts. Focusing on whether or not specific people are using an LLM is extremely unlikely to ever help.\nThis claim about water use has been republished in dozens of other outlets. It is probably the most influential single statistic when talking about AI’s impact on the environment. Anyone who believes it is true will be trying to solve a problem that doesn’t exist.\nIf you want to understand the power and water use of AI or LLMs in more detail, I would recommend Andy Masley’s writing about AI\u0026rsquo;s environmental impact or The MIT Technology Review’s series on the subject.\nPower generation is an important point to pay attention to when we are contemplating grid expansion and opening new power plants for some of these companies. I am very grateful that those outlets are keeping track of this in detail because it prevents me from feeling like I should do so.\nWhy it’s a lie # AI as a whole uses up enough energy and sometimes water to be worth keeping track of, but generally does not use up an absurd amount of it yet. Querying ChatGPT or any other LLM to write an email uses up almost no energy or water whatsoever.\nAwkwardly, the Washington Post does not publish the reasoning for their headline, nor do any of the other media sources covering this claim. They do publish a link to a paper by the researchers they are working with, and from that and other media quotes by those researchers we can try to figure out how you could possibly arrive at 519ml of water per email.\nFor a worst-case estimate using the paper’s assumptions, if\nyou query ChatGPT 10 times per email, you include water used to generate electricity, the datacenter hosting it is in the state of Washington, the datacenter uses the public power grid or something close to it, water evaporated from hydroelectric power reservoirs could otherwise have been used productively for something other than power generation, and LLMs were not more efficient when they were being sold for profit in 2024 than they were in 2020 when they had never been used by the public,2 then it is true that an LLM uses up 500 or more milliliters of water per email.\nYou can reach a similar estimate by different methods, since they break out the water use per state differently. For example, if the datacenter hosting ChatGPT is not in Washington, it will have a higher carbon footprint but a lower water footprint and you will have to query it 30 or 50 times to use up an entire bottle of water. This is not what anyone imagines when they hear “write a 100-word email”.\nThat study’s authors are well aware that none of these assumptions are realistic. Information about how efficient LLMs are when they are served to users is publicly available. People do not generally query an LLM fifty times to write a one hundred word email.\nIt is completely normal to publish, in an academic context, a worst-case estimate based on limited information or to pick assumptions which make it easy to form an estimate. In this setting your audience has all the detail necessary to determine if your worst-case guess seems accurate, and how to use it well.\nPublishing a pessimistic estimate that makes this many incorrect assumptions in a newspaper of record with no further detail is just lying to readers.\nEven if the figure were true, the reporting is incredibly misleading. It fails to note that most of the claimed water use is water used for power generation. “AI is using up too much power” is not nearly as interesting a headline: people can compare how much power you’re saying it takes to write an email to their toaster or their PC (both use more). People often do not know that electricity generation uses up water. Presenting water use statistics without clarifying that they are from electricity generation is incredibly confusing if you don’t know that.\nComparisons of resource usage to farming, here and in other articles citing the same researchers, consistently underplay the impact of farming on water availability. Perversely, this seems to exonerate the farming practices actually causing crises in water-poor regions in pursuit of scoring extra points against LLMs.\nIf someone cares about the environment, these things seem like table stakes. You should avoid lying in the newspaper of record. You should especially avoid doing this in a way that blames customers instead of corporations or that blames entirely the wrong category of business for a major environmental problem. We aren’t going to get any meaningful change if we lead people to solve the wrong problem.\nWhy it has spread # This still leaves me with an itch about this specific claim. How did it become the dominant story in spite of being a complete lie?\nThe article is well-written and has very good graphics for laying out its data. It is persuasive and millions of people will have read it. Even if people do not read it, the headline, “a bottle of water per email”, makes sure everyone gets the message. Everyone will know that the Washington Post is claiming that ChatGPT uses up a bottle of water per email.\nThe article, compellingly, centers on the morality of the customer’s actions. You, the end user, are held responsible for consuming half a liter of water every time you use an LLM. If this were true, you could, clearly, have a meaningful impact by boycotting ChatGPT. It would also be important to try to prevent other people from using ChatGPT, since they are directly responsible for using up a lot of water.\nPersonal moral choices make for a compelling story. They are also, unfortunately, a very good way to deflect attention from the business to the customer. Passing moral judgement on people we know or talk to is satisfying in a way that criticizing a business is not. It is more difficult to be morally outraged at a corporation.\nWhy would you lie about this? # In short? For attention.\nNewspapers at large seem to prefer negative coverage of tech companies. Negative coverage generates a lot of attention, and we are, famously, living in an attention economy. (Also, the tech companies frequently deserve it.)\nWhen covering a story about two or more people, it is a normal and expected journalistic practice to at least contact everyone involved for comment. When covering a story about a technology, it is apparently considered acceptable to consult one “expert”. If this person lies to you, or is willing to let you lie under their name, you can publish the story. You will probably get more attention on the story if what they tell you is inflammatory, so you have good reason to seek out inflammatory experts.\nExperts, in turn, are also part of an attention economy. If they work within academia, their prominence within their field depends upon how important their work is perceived as being. If they do not, their income depends directly on their reputation among potential clients or on their ability to attract subscriptions.\nQuite possibly everyone involved thought they were doing good work. You could maybe argue that even if the headline isn’t true and the numbers are made up, the story still helps to raise awareness about environmental problems. This seems like a weak justification. We would most likely be better off on this issue if they had said nothing about it at all.\nThis is true even if the the paper I am criticizing is correct to estimate at 0.004 kWh, is even true at their pessimistic estimate of 0.016 kWh, and is extremely true if their estimate is high, which it definitely appears to be.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYou can find Andy Masley trying to make sense of these claims about how much power an LLM uses up here.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"8 June 2025","externalUrl":null,"permalink":"/posts/the-biggest-statistic-about-ai-water/","section":"Posts","summary":"","title":"The Biggest Statistic About AI Water Use Is A Lie","type":"posts"},{"content":" AI History in Quotes # Each of these presents the clearest, earliest, or most-cited statement of a specific idea in AI. Where those three things conflict I have chosen whichever quote I liked the most. They are mostly or all big-picture concerns, and should hopefully seem at least interesting even if not correct in a timeless way.\nThe Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. It can follow analysis; but it has no power of anticipating any analytical relations or truths. Its province is to assist us in making available what we are already acquainted with.\nAda Lovelace (1843), Sketch of the Analytical Engine, Note G\nAll these are very crude steps in the direction of a systematic theory of automata. They represent, in addition, only one particular direction. This is, as I indicated before, the direction towards forming a rigorous concept of what constitutes “complication.” They illustrate that “complication” on its lower levels is probably degenerative, that is, that every automaton that can produce other automata will only be able to produce less complicated ones. There is, however, a certain minimum level where this degenerative characteristic ceases to be universal. At this point automata which can reproduce themselves, or even construct higher entities, become possible. This fact, that complication, as well as organization, below a certain minimum level is degenerative, and beyond that level can become self-supporting and even increasing, will clearly play an important role in any future theory of the subject.\nJohn Von Neumann (1948), The General and Logical Theory of Automata\nThe chess machine is an ideal one to start with, since: (1) the problem is sharply defined both in allowed operations (the moves) and in the ultimate goal (checkmate); (2) it is neither so simple as to be trivial nor too difficult for satisfactory solution; (3) chess is generally considered to require “thinking” for skilful play; a solution of this problem will force us either to admit the possibility of a mechanized thinking or to further restrict our concept of “thinking”; (4) the discrete structure of chess fits well into the digital nature of modern computers.\nClaude Shannon (1949), Programming a Computer for Playing Chess\nIt was suggested tentatively that the question, \u0026ldquo;Can machines think?\u0026rdquo; should be replaced by \u0026ldquo;Are there imaginable digital computers which would do well in the imitation game?\u0026rdquo;\n[\u0026hellip;]\nThe original question, \u0026ldquo;Can machines think?\u0026rdquo; I believe to be too meaningless to deserve discussion. Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted. I believe further that no useful purpose is served by concealing these beliefs. The popular view that scientists proceed inexorably from well-established fact to well-established fact, never being influenced by any improved conjecture, is quite mistaken. Provided it is made clear which are proved facts and which are conjectures, no harm can result. Conjectures are of great importance since they suggest useful lines of research.\nAlan Turing (1950), Computing Machinery and Intelligence\nWe propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.\nJohn McCarthy et al, (1955) Dartmouth Workshop Proposal\nProgramming computers to learn from experience should eventually eliminate the need for much of this detailed programming effort.\nArthur Samuel (1959), Some Studies in Machine Learning Using the Game of Checkers\nIf we use, to achieve our purposes, a mechanical agency with whose operation we cannot efficiently interfere once we have started it, because the action is so fast and irrevocable that we have not the data to intervene before the action is complete, then we had better be quite sure that the purpose put into the machine is the purpose which we really desire and not merely a colorful imitation of it\nNorbert Wiener (1960), Some Moral and Technical Consequences of Automation\nLet an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultra-intelligent machine could design even better machines; there would then unquestionably be an \u0026ldquo;intelligence explosion,\u0026rdquo; and the intelligence of man would be left far behind (see for example refs. [22], [34], [44]). Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control. It is curious that this point is made so seldom outside of science fiction. It is sometimes worthwhile to take science fiction seriously.\nI.J. Good (1965), Speculations Concerning the First Ultraintelligent Machine\nAny observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.\nCharles Goodhart, (1975), Problems of monetary management: the U.K. experience \u0026ldquo;Goodhart\u0026rsquo;s Law\u0026rdquo;\nEncoded in the large, highly evolved sensory and motor portions of the human brain is a billion years of experience about the nature of the world and how to survive in it. The deliberate process we call reasoning is, I believe, the thinnest veneer of human thought, effective only because it is supported by this much older and much more powerful, though usually unconscious, sensorimotor knowledge. We are all prodigious olympians in perceptual and motor areas, so good that we make the difficult look easy. Abstract thought, though, is a new trick, perhaps less than 100 thousand years old. We have not yet mastered it. It is not all that intrinsically difficult; it just seems so when we do it.\nHans Moravec (1988), Mind Children: The Future of Robot and Human Intelligence\nCommonly known as \u0026ldquo;Moravec\u0026rsquo;s Paradox\u0026rdquo;, often paraphrased as “the hard things are easy and the easy things are hard”.\nI think it\u0026rsquo;s fair to call this event a singularity (\u0026ldquo;the\nSingularity\u0026rdquo; for the purposes of this paper). It is a point where our\nmodels must be discarded and a new reality rules. As we move closer\nand closer to this point, it will loom vaster and vaster over human\naffairs till the notion becomes a commonplace. Yet when it finally\nhappens it may still be a great surprise and a greater unknown. In\nthe 1950s there were very few who saw it: Stan Ulam [27] paraphrased\nJohn von Neumann as saying:\nOne conversation centered on the ever accelerating progress of\ntechnology and changes in the mode of human life, which gives the\nappearance of approaching some essential singularity in the\nhistory of the race beyond which human affairs, as we know them,\ncould not continue.\nVernor Vinge (1994), The Coming Singularity\nThis compression contest is motivated by the fact that being able to compress well is closely related to acting intelligently. In order to compress data, one has to find regularities in them, which is intrinsically difficult (many researchers live from analyzing data and finding compact models). So compressors beating the current \u0026ldquo;dumb\u0026rdquo; compressors need to be smart(er). Since the prize wants to stimulate developing \u0026ldquo;universally\u0026rdquo; smart compressors, we need a \u0026ldquo;universal\u0026rdquo; corpus of data. Arguably the online lexicon Wikipedia is a good snapshot of the Human World Knowledge. So the ultimate compressor of it should \u0026ldquo;understand\u0026rdquo; all human knowledge, i.e. be really smart. enwik8 is a hopefully representative 100MB extract from Wikipedia.\nMarcus Hutter (2006), The Hutter Prize\nThe Orthogonality Thesis\nIntelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.\nThe Instrumental Convergence Thesis\nSeveral instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents.\nNick Bostrom, (2012), The Superintelligent Will\nBoth commonly referred to by name, with or without the term \u0026rsquo;thesis'.\nOne thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.\nRich Sutton, (2019), The Bitter Lesson\nThe strong scaling hypothesis is that, once we find a scalable architecture like self-attention or convolutions, which like the brain can be applied fairly uniformly (eg. “The Brain as a Universal Learning Machine”⁠ or Hawkins), we can simply train ever larger NNs and ever more sophisticated behavior will emerge naturally as the easiest way to optimize for all the tasks \u0026amp; data. More powerful NNs are ‘just’ scaled-up weak NNs, in much the same way that human brains look much like scaled-up primate brains⁠.\nGwern Branwen, (2020), The Scaling Hypothesis\nHere we will explore emergence with respect to model scale, as measured by training compute and number of model parameters. Specifically, we define emergent abilities of large language models as abilities that are not present in smaller-scale models but are present in large-scale models; thus they cannot be predicted by simply extrapolating the performance improvements on smaller-scale models (§2).\nWei et al, (2022), Emergent Abilities of Large Language Models\nWhat this manifests as is – trained on the same dataset for long enough, pretty much every model with enough weights and training time converges to the same point. Sufficiently large diffusion conv-unets produce the same images as ViT generators. AR sampling produces the same images as diffusion.\nThis is a surprising observation! It implies that model behavior is not determined by architecture, hyperparameters, or optimizer choices. It’s determined by your dataset, nothing else. Everything else is a means to an end in efficiently delivery compute to approximating that dataset.\nThen, when you refer to “Lambda”, “ChatGPT”, “Bard”, or “Claude” then, it’s not the model weights that you are referring to. It’s the dataset.\nJames Betker (2023), The “it” in AI models is the dataset.\nIlya: I challenge the claim that next token prediction cannot surpass human performance. It looks like on the surface it cannot—it looks on the surface if you just learn to imitate, to predict what people do, it means that you can only copy people. But here is a counter-argument for why that might not be quite so:\nIf your neural net is smart enough, you just ask it like, \u0026ldquo;What would a person with great insight and wisdom and capability do?\u0026rdquo; Maybe such a person doesn\u0026rsquo;t exist, but there\u0026rsquo;s a pretty good chance that the neural net will be able to extrapolate how such a person should behave.\nDo you see what I mean?\nDwarkesh: Yes, although where would we get the sort of insight about what that person would do, if not from the data of regular people?\nIlya: Because if you think about it, what does it mean to predict the next token well enough? What does it mean actually? It\u0026rsquo;s actually a much deeper question than it seems.\nPredicting the next token well means that you understand the underlying reality that led to the creation of that token. It\u0026rsquo;s not statistics—like, it is statistics, but what is statistics?\nIn order to understand those statistics, to compress them, you need to understand what is it about the world that creates those statistics. And so then you say, \u0026ldquo;Okay, well I have all those people. What is it about people that creates their behaviors?\u0026rdquo;\nWell, they have thoughts and they have feelings and they have ideas, and they do things in certain ways. All of those could be deduced from next token prediction.\nAnd I\u0026rsquo;d argue that this should make it possible—not indefinitely, but to a pretty decent degree—to say, \u0026ldquo;Well, can you guess what you would do if you took a person with this characteristic and that characteristic?\u0026rdquo;\nLike, such a person doesn\u0026rsquo;t exist, but because you\u0026rsquo;re so good at predicting the next token, you should still be able to guess what that person would do—this hypothetical, imaginary person with far greater mental ability than the rest of us.\nIlya Sustekever (2023), The Dwarkesh Podcast - Why next-token prediction is enough for AGI\n","date":"7 June 2025","externalUrl":null,"permalink":"/posts/ai-history-in-quotes/","section":"Posts","summary":"","title":"AI History in Quotes","type":"posts"},{"content":" What Is AI? # This is an important question because AI is, currently, important. If you understand what AI is, you understand what is happening and why it is happening much more clearly than you otherwise would. This is intended as a high-level overview of what the term means, where the field comes from, and what we are doing with it now.\nDefinition # Artificial Intelligence (AI) is about getting computers to think.\nPart of the process is deciding what we mean by \u0026ldquo;think\u0026rdquo;.\nDeciding what we mean by \u0026ldquo;think\u0026rdquo; turns out to be somewhat difficult, and at least some of the people involved in AI have spent a lot of effort on this question.\nThis concern is not new. Claude Shannon, an early pioneer in computer logic among other things, wrote this in 1949 about computer chess:\nchess is generally considered to require \u0026ldquo;thinking\u0026rdquo; for skilful[sic] play; a solution of this problem will force us either to admit the possibility of a mechanized thinking or to further restrict our concept of \u0026ldquo;thinking\u0026rdquo;1\nIt took a little under fifty years, but he was right. We did change what we meant by \u0026ldquo;thinking\u0026rdquo; once computers were better than humans at chess.\nIt turns out thinking isn\u0026rsquo;t chess. Chess might be a type of thinking. You, as a human, might have to think to play chess. Humans who are better at thinking might be better at chess. However, there are things that play chess very well and that do not seem to think in any other way.\nThis was actually very surprising for some people. We have had a lot of surprises like this, and they all seem to rhyme. We seem to believe that thinking is one big thing. When we get computers to do things that humans have to think hard to do, they usually look like a lot of small things.\nAnyway, thinking isn\u0026rsquo;t chess. We had to make a good chess computer before everyone was sure of that.\nIn 1950 Alan Turing, often called the father of theoretical computer science, published a paper2 that includes what we now call the Turing Test. Inspired by a game about seeing if men could pass as women or vice versa while passing notes, he had the idea to check if a computer could pass as a human entirely in writing. He argues in some detail that computers can, in principle, think, and that instead of wondering about this we should simply test whether they can pass for human.\nWe can generalize this into two proposals for what we mean by \u0026ldquo;think\u0026rdquo;:\n\u0026ldquo;Thinking\u0026rdquo; is using language as well as a human can \u0026ldquo;Thinking\u0026rdquo; is being able to do everything as well as a human can. I think the second definition is more interesting. It\u0026rsquo;s implied by the test in two ways. First, if something has some gap where its understanding is not as good as a human\u0026rsquo;s, it will probably answer some questions in that area incorrectly in a way that gives it away. Second, testing a computer against a human in general is, in the end, checking if they can do things humans can do.\nThis is the working definition that, in practice, people in AI seem to use. So:\nAI is about getting computers to think. Thinking is what humans can do. Computers will be thinking if they are able to do everything humans can do. We can look at the programs we actually have and see that there\u0026rsquo;s a bit more. We didn\u0026rsquo;t get computers to play chess as well as humans; we got them to play chess much better than humans. We went right past human-level chess. We did the same thing for other board games like Go, too. LLMs, the most famous of which are those used by ChatGPT, are a very specific type of AI. They are error-prone and not generally superhuman, but they generally have much more memorized than any human could due to training on large amounts of data from the internet. Most of them have more or less memorized wikipedia, for example. Some of them have thousands of digits of pi memorized, and they aren\u0026rsquo;t even intended to do that.\nSome of the things that current AI programs can do really do not have any human analog. Image generators are not making images in the same way a human would make art, one piece at a time; they are more or less creating images from nowhere in almost no time. Making completely fake video that looks real is also not something humans can really do.\nThis gives us:\nOnce computers are as good as people are at something, sometimes they can get much better than that. Sometimes the tools we use for AI enable computers to do things that no human can do. These are both part of AI, too. This covers the what of AI. This broad definition covers AI as a field. Everything else is how, either the history or what is being done now. Since we care mostly about now, our history will be short and it will follow only the lines that lead to the bleeding edge today.\nBrief History # The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. It can follow analysis; but it has no power of anticipating any analytical relations or truths.3\nThis is Ada Lovelace, the first computer programmer, saying that artificial intelligence cannot be done in 1843. It is a single stray thought while she is busy inventing the concept of a computer program. She does all of this in a very long footnote to a translation, likely because she cannot publish of her own accord and under her own name. We include her comment here because she is very cool and because later work on AI made a point of examining this line. It is possibly the most scrutinized single statement in the field of computer science as a whole. Her computer was never finished because they ran out of money.\nIt is a hundred years before this thread is picked up. There are several breakthroughs in the 40s and 50s that establish the agenda in AI that we are still following today.\nWe have already met Alan Turing. We have already mentioned his 1950 paper, which answers Ada Lovelace\u0026rsquo;s comment and establishes the idea of testing computers against humans. The term \u0026ldquo;artificial intelligence\u0026rdquo; is not yet in use, but he defines its goal: making machines think. It is likely the most important single paper for the field as a whole.\nAlan Turing worked on cryptanalysis during the Second World War. Before that, he came up with the most general mathematical way we define computers, and we still call our most abstract model a Turing Machine. In 1952 he was prosecuted for being gay and forced to take hormones by court order. In 1954 he killed himself with a poisoned apple at the age of 41. For this reason, we do not have any further work by Turing on AI.\nOur second set of breakthroughs are in attempts to study and imitate neurons, the nerve and brain cells of humans and animals. In 1943 we have our first serious mathematical model of neurons, and of how they might learn.4 In 1958 we get the perceptron5, which is the first very clear example of what we now call a neural network. It is not meant to be a realistic model of a neuron, but to solve a problem: recognizing patterns.\nFrom this point on, the neural networks in AI and the study of actual neurons that exist in real humans diverge. There are isolated exceptions: the convolutional neural network commonly used for computer vision now is inspired by research into actual neurons in the visual cortex. This is rare and mostly one way, with inspiration coming from neuroscience to AI and not the reverse.\nOur last major ingredient comes from Claude Shannon. Shannon, like Turing, worked on cryptanalysis during the war. He also worked on antiaircraft gun control. We met him earlier, talking about computer chess, which he publishes a paper about in 1949. This establishes chess as the main puzzle for people working in AI, and that chess is a good place to examine how thinking works.\nAs Turing defined the computer, Shannon defines information. Information is uncertainty; the more information you have, the less uncertain you are. He calls this uncertainty entropy, and measures it in a quantity he names the bit. These are the ones and zeroes that (hopefully) everyone knows computers run on today. In this connection he is the father of the concept of compression, which is fundamental to much of our current AI. Compression is reducing how many bits you have to use for the same thing.\nIn 1951, Shannon publishes a paper called \u0026ldquo;Prediction and Entropy of Printed English\u0026rdquo;.6 Its abstract is this:\nA new method of estimating the entropy and redundancy of a language is described. This method exploits the knowledge of the language statistics possessed by those who speak the language, and depends on experimental results in prediction of the next letter when the preceding text is known. Results of experiments in prediction are given, and some properties of an ideal predictor are developed.\nThe \u0026ldquo;ideal predictor\u0026rdquo; Shannon describes is the first language model. By model we mean some math or a computer program for representing something, and here we model written English. Shannon\u0026rsquo;s original language model predicts the most likely next letter based upon all preceding letters, or—equivalently—compresses printed English. His formulation is pure math. You cannot actually have a computer do what he describes because it considers all possibilities equally, and there are exponentially many possibilities. Every language model since then is finding a more efficient way to approximate his ideal model.\nFrom here, the history gets a lot faster. The name \u0026ldquo;artificial intelligence\u0026rdquo; comes from a conference in 1956, attended by, among other people, Shannon. Its proposal is, in part, to\n\\[...\\] proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. \\[...\\] 7\nMachine learning is coined in 1959 with a paper titled \u0026ldquo;Some Studies in Machine Learning Using the Game of Checkers\u0026rdquo;.8 This becomes the blanket term for any system which learns, either from trial and error or from input data. It borrows many techniques from statistics, and has been very successful. It is the type of system we use now.\nLots of work that was considered to be part of AI in the ensuing years is not an ancestor to anything we call AI now. Those projects often produced very interesting results, including huge parts of what makes computers as useful as they are now, but if they didn\u0026rsquo;t use neural networks or machine learning approaches they are off our path here. Their main contribution to contemporary AI research is documenting a lot of things that do not work.\nThere was also a lot of work concerning neural networks which is ancestral to what we have now. We are trying to take the shortest path we can to the present, so we will omit most of these for brevity. We will skip to 1986, when Rumelhart, Hinton, et al9 added multiple layers to a neural network and trained it on their data by having the network reduce its errors. We call this method of training backpropagation, and the multi-layer design a multi-layer perceptron. By doing this, they showed that neural networks were general-purpose and able to do anything that the computer itself could. Prior to this period, neural networks were often considered to be too limited to solve many problems, even in principle.\nIn 1993 Yoshua Bengio and Yann LeCun demonstrated the first workable system for reading handwriting.10 We mentioned its design earlier; it uses a convolutional neural network inspired by the visual systems of animals. Some descendant of this program currently routes the mail and handles mobile check deposit. One of its other descendants is responsible for most other uses of AI for vision, like unlocking your phone with your face and organizing your pictures so you can search through them.\nOur last piece of important history is in 2012, with a program called AlexNet.11 Two of Hinton\u0026rsquo;s students, Alex Krizhevsky and Ilya Sutskever, entered into a challenge for image recognition. They have two major innovations that are crucial to everything that follows. One is that they made their neural network much bigger than they had been before, giving it a total of 60 million parameters. Parameters are just numbers: each one specifies how much one \u0026ldquo;neuron\u0026rdquo; of the network contributes to another. The other innovation is that they ran AlexNet on a GPU, the part of a computer used for graphics. It turns out GPUs are extremely efficient at doing the calculations neural networks use.\nPresent Day # AlexNet marks the beginning of modern AI, which is characterized by deep learning. Cutting edge neural networks now are characterized by their size. They apply high-scale neural networks to Shannon’s task of modelling written language. Parameter counts have gone from the 60 million in AlexNet to many billions and occasionally trillions. To train them well requires a proportionate increase in the amount of data. Their budget for computer hardware has increased proportionately to both of those factors, using thousands of GPUs at once. There have also been attempts to distinguish recent innovations as \u0026ldquo;generative AI\u0026rdquo;, but this is mostly a marketing category and things are clearer if we avoid using that term.\nLarge language models, or LLMs, are the current major success story. Either LLMs or image generators are what most people will think of when they think of \u0026ldquo;AI\u0026rdquo;. LLMs are correctly distinguished from their predecessors like autocomplete primarily by their size. That making neural networks bigger consistently makes them better is sometimes called the scaling hypothesis. The scaling hypothesis seems to have been certainly true for language models through 2019 and 2023, corresponding to the dates when OpenAI published GPT-2 (1.5 billion parameters) and when it began selling access to GPT-4 (rumored above a trillion parameters). Their scale, primarily, has made these models much more capable and general-purpose than all earlier attempts to model language. They are measurably much better at tasks like translation and choosing correct answers on multiple-choice tests, and they are subjectively much more capable of following a conversation.\nIt is important to distinguish between the LLM itself, which is a neural network, and the service or app attached to the LLM. ChatGPT is a service: you go to a web site or app, you may or may not pay a subscription, and you can start a chat with an LLM. There can be one or many different LLMs involved, and the overall service can include many additional features like web search or image generation. Generally the LLM can only output text, and these extra features are not part of it. GPT-3 and GPT-4 are LLMs, and were once offered via ChatGPT but have been replaced since. ChatGPT currently features somewhere above a half dozen different LLMs.\nA present-day LLM is trained in two phases. In the first, commonly called pretraining, it is given a large amount of text and trained to guess what comes next at each point. This is a \u0026ldquo;base\u0026rdquo; model, and they are not commonly used or offered as products. In the second, which are called finetuning or post-training, it will have been modified to extend the length of input it can take and given a set of expected behaviors, generally suitable for use as a chatbot. Techniques in both pretraining and finetuning continue to improve.\nThere are attempts at multimodal LLMs that can also take direct input or provide output in text, audio or video, but these are, so far, a relatively niche concern. To date, only some commonly-used LLMs can directly take image input. Many of them have a completely separate system from the LLM for reading in images and making them into text, but this is not always obvious to the user. There are theories in the industry both for and against prioritizing multimodality, with Anthropic appearing to strongly favor a text-only approach and OpenAI having strong advocates for it.\nPurely as objects of study, LLMs have done for language something like what a previous generation of programs did for board games. It is more or less clear, currently, that language isn\u0026rsquo;t thinking, or at least not all of it. You can have something that uses language competently, that is superhuman in some ways, and that is notably unable to do a lot of what we mean by \u0026ldquo;thinking\u0026rdquo;. Thinking and language certainly have meaningful overlap, and the text-only approach to AI rests on the belief that this overlap is enough. We can now do things considered impossible for generations, and often more, and yet the project is clearly not complete.\nThis very briefly covers what an LLM is, why they are an important breakthrough, and where progress is happening now. We will avoid, for now, discussing the broad impacts of the technology on society except to say that they seem fairly notable.\nFor future developments in AI, one of the most notable impacts of LLMs is that they have generated intense interest from corporations, their investors, and from the leaders of various countries. Because scaling the models up often delivers much better results, extremely high budgets can be justified. Every major technology company has made a point of trying to carve out a part of the business of AI, and billions or trillions of dollars can be gained or lost in stock value based on perceptions of their research programs. Governments have invested untold amounts in money and influence in trying to make sure their countries remain relevant, and NVIDIA, which has an effective monopoly on high-end GPUs for training and running neural networks, is currently valued at about 3.3 trillion dollars.\nWhile relatively neglected, other AI branches also benefit. Anything made with a neural network and trained with backpropagation currently is on the same family tree as an LLM. Models trained more recently and with larger parameter counts overwhelmingly benefit from improvements in hardware and code, and what they can do has massively improved in recent years. Models exist that can take as input and output text, audio including speech and music, images, video. There are also models that can produce all of those as output. In other cases, they can take some combination of those and sensor measurements as input and can output control signals for machines like cars, weapons, factory equipment and fusion reactors.[^12]\nOne of the core propositions of the major AI research companies is the promise of Artificial General Intelligence, or AGI. What this term means specifically is the subject of so much scrutiny that it is difficult to settle on a universal definition. One company in particular defines AGI, for purposes of one of their contracts, as when they make one hundred billion dollars in profit. This definition seems like it could only have come out of intense haggling, and neither that definition nor many others commonly argued for seem very useful. We might tentatively say that AGI once meant \u0026ldquo;human-level AI\u0026rdquo;, and accept that this leaves some remaining ambiguity.\nBeing able to have a computer do things that currently only a human can do has serious implications for the job market and the economy, and for this reason the promise of AGI specifically is an animating force for investors. They mostly seem to see the chance to fire employees and replace them with much cheaper chatbots or robots. At the very least, no company wants to be less efficient than its competitors. This combination of greed and fear is sufficient to compel nearly everyone, everywhere in the business world to keep up with AI. There is a similar dynamic with governments, which have seemed over time to increasingly view AI as either an actual or potential military technology that they are either eager to be ahead with or afraid of falling behind on. Comparisons to the nuclear arms race are common.\nConcerns that artificial intelligence could replace or kill humanity are also common. These concerns are not new, and were raised among early AI pioneers. Turing, for example, mentioned it in lectures and on TV in 1951[^13]. More recently in 2024, Geoffrey Hinton, who was involved in both the 1986 backpropagation paper and AlexNet in 2012, retired and gave a series of interviews to talk about the dangers of AI.[^14] He received the Nobel in Physics later that year and continued talking about it. One of AlexNet\u0026rsquo;s other architects, Ilya Sutskever, went on to found OpenAI. He and several of OpenAI\u0026rsquo;s founders have expressed concern about existential risk and AI safety, which are euphemisms for everyone dying because of AI. Hinton and Sustekever, along with most other prominent people in AI, are signatories to a one-line letter that reads:\nMitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.[^15]\nThis leaves us in a strange place. AI is very nearly the founding purpose of computer science as a whole. Automation, more broadly, is very nearly the entire point of technology. Machines can do more work, and that means humans do not have to. Machines can do things humans can\u0026rsquo;t do at all. The pull of the future is that things can get better, that we want to be in a world where we have more choices about what to be and to do. Yet the drive for progress here, or at least what\u0026rsquo;s moved the money behind it, is less a pull than a push. Either in spite of the risks or because of them, everyone is moving in the same direction. Companies and countries are being pushed by competitors, investors, and the fear that someone else will get there first.\nReferences # [^12] Degrave, J, Felici, F, et al, (2022) https://web.archive.org/web/20250315033337/https://www.nature.com/articles/s41586-021-04301-9\n[^13] Turing, A. M. (1951) https://web.archive.org/web/20250420104303/https://turingarchive.kings.cam.ac.uk/publications-lectures-and-talks-amtb/amt-b-4\n[^14] Hinton, G (2024) https://web.archive.org/web/20250426173021/https://www.cbsnews.com/news/geoffrey-hinton-ai-dangers-60-minutes-transcript/\n[^15] https://web.archive.org/web/20250531094402/https://safe.ai/work/statement-on-ai-risk\nShannon, C. E. (1950). Programming a computer for playing chess. Philosophical Magazine, 41(314), 256-275. https://web.archive.org/web/20250519075435/https://vision.unipv.it/IA1/ProgrammingaComputerforPlayingChess.pdf\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nTuring, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433-460. https://web.archive.org/web/20250530002831/https://courses.cs.umbc.edu/471/papers/turing.pdf\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nLovelace, A. (1843). Note G. In Sketch of the Analytical Engine Invented by Charles Babbage by L. F. Menabrea, with notes upon the memoir by the translator. Taylor\u0026rsquo;s Scientific Memoirs, 3, 666-731. https://web.archive.org/web/20250115143801/https://gutenberg.org/cache/epub/75107/pg75107-images.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nMcCulloch, W. S., \u0026amp; Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115-133. https://doi.org/10.1007/BF02478259\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nRosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386-408. https://doi.org/10.1037/h0042519\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nShannon, C. E. (1951). Prediction and entropy of printed English. Bell System Technical Journal, 30(1), 50-64. https://doi.org/10.1002/j.1538-7305.1951.tb01366.x\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nMcCarthy, J., Minsky, M. L., Rochester, N., \u0026amp; Shannon, C. E. (2006). A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AI Magazine, 27(4), 12. https://doi.org/10.1609/aimag.v27i4.1904\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nSamuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3), 210-229. https://doi.org/10.1147/rd.441.0206\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nRumelhart, D. E., Hinton, G. E., \u0026amp; Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. https://doi.org/10.1038/323533a0\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nBengio, Y., LeCun, Y., \u0026amp; Henderson, D. (1993). Globally Trained Handwritten Word Recognizer using Spatial Representation, Convolutional Neural Networks, and Hidden Markov Models. Advances in Neural Information Processing Systems 6 (NIPS 1993). https://web.archive.org/web/20240422021145/https://proceedings.neurips.cc/paper_files/paper/1993/file/3b5dca501ee1e6d8cd7b905f4e1bf723-Paper.pdf\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nKrizhevsky, A., Sutskever, I., \u0026amp; Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NeurIPS 2012) (pp. 1097–1105). https://web.archive.org/web/20250526023911/https://papers.nips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"4 June 2025","externalUrl":null,"permalink":"/posts/what-is-ai/","section":"Posts","summary":"","title":"What Is AI?","type":"posts"},{"content":"","externalUrl":null,"permalink":"/authors/","section":"Authors","summary":"","title":"Authors","type":"authors"},{"content":"","externalUrl":null,"permalink":"/categories/","section":"Categories","summary":"","title":"Categories","type":"categories"},{"content":"","externalUrl":null,"permalink":"/series/","section":"Series","summary":"","title":"Series","type":"series"},{"content":"","externalUrl":null,"permalink":"/tags/","section":"Tags","summary":"","title":"Tags","type":"tags"},{"content":"","externalUrl":null,"permalink":"/ubi/","section":"UBI","summary":"","title":"UBI","type":"ubi"},{"content":" The Universal Dividend Act: Policy Rationale # Summary # The Universal Dividend Act establishes a monthly per capita payment to every citizen and national of the United States, funded as a fixed and escalating percentage of federal outlays. The payment begins at 10% of the five-year moving average of federal spending, rises by 4 percentage points annually, and caps at 50%. At current spending levels, this produces roughly $190/month per person in year one, growing to approximately $1,700/month at maturity as federal outlays grow over the ten-year ramp. Payments are non-taxable, immune from garnishment, and do not affect eligibility for existing benefit programs.\nThis document elaborates on the findings and design rationale of the bill.\nBroad Sharing of Productive Capacity # Finding (1): The productive capacity of the United States economy generates benefits that should be broadly shared among all citizens and nationals.\nThe economic output of the United States is the product of collective infrastructure (legal systems, public investment, educated workforces, shared institutions) and not solely of individual effort. A universal dividend recognizes this by returning a share of the government\u0026rsquo;s fiscal activity directly to the people. The payment is unconditional because the claim it represents is unconditional: every citizen has a stake in the productive capacity of the country.\nThe dividend is the reciprocal of the tax obligation. Citizens owe a share of their income to the government that maintains the conditions under which that income is earned. The dividend recognizes that this obligation runs in both directions: the government, in turn, owes a share of its fiscal activity to the citizens whose participation and compliance sustain it. Taxation and dividend are two sides of the same relationship between the individual and the state.\nThis principle also explains why the payment is universal rather than means-tested. Means-testing is expensive to administer, creates benefit cliffs that trap people in poverty, requires invasive verification of personal financial circumstances, and inevitably excludes eligible people through administrative burden. The populations most in need of assistance are also the populations least equipped to navigate complex eligibility requirements. Universality eliminates these problems and creates a political constituency for the program that includes everyone.\nConsumer Spending and Economic Growth # Finding (2): Consumer spending constitutes the largest component of the gross domestic product of the United States, and broadly distributed purchasing power supports sustained economic activity and growth.\nConsumer spending accounts for roughly two-thirds of GDP. The marginal propensity to consume is highest among lower-income households: a dollar transferred to someone with unmet needs is more likely to be spent than a dollar added to existing wealth. A universal dividend directed at every citizen therefore channels purchasing power where it is most likely to circulate through the economy, supporting demand for goods and services, sustaining employment, and generating tax revenue.\nConsumer spending also performs a critical allocative function. Individual purchasing decisions are the primary mechanism by which the economy generates quality signals: information about which goods and services are valued, which producers are meeting needs, and where resources should flow. An economy in which purchasing power is concentrated among a narrow population receives quality signals that reflect only that population\u0026rsquo;s preferences. Broadly distributed purchasing power produces more complete and representative market information, improving the efficiency of resource allocation across the entire economy.\nDirect Cash Transfers # Finding (3): Direct, unconditional cash transfers are an efficient means of distributing benefit and promoting growth.\nCash is the most efficient form of transfer. It requires no administrative apparatus to determine what recipients need, imposes no compliance burden on recipients, and allows individuals to allocate resources according to their own circumstances. The existing landscape of federal transfer programs is fragmented across dozens of agencies, each with its own eligibility rules, application processes, and verification requirements. A single universal payment does not replace these programs, but it provides a baseline of economic security that reaches populations the existing system systematically misses, including the homeless, those not in federal databases, and those whose circumstances change faster than administrative processes can track.\nThe payment is excluded from gross income and cannot be counted as income or resources for purposes of any federal or state benefit program. This ensures the dividend supplements existing benefits rather than displacing them. Without this protection, the dividend would effectively convert into a funding cut for the most vulnerable recipients by pushing them over eligibility thresholds for programs like Medicaid, SNAP, and SSI, which cover unpredictable, catastrophic expenses that a monthly cash payment cannot substitute for.\nFederal Outlays as the Base # Finding (4): A payment linked to the scale of federal expenditure provides a transparent and self-adjusting mechanism that connects the activities of government to the welfare of the people.\nFederal outlays reflect the full fiscal activity of the government. They are downstream of economic productivity, tax policy, and spending decisions. When the economy grows, revenue rises, spending capacity expands, and the dividend grows with it. This creates a transparent connection between the scale of government activity and the benefit each citizen receives.\nFederal spending is also naturally countercyclical. Automatic stabilizers (unemployment insurance, Medicaid, and other safety-net programs) expand during economic downturns, which means the dividend base holds relatively steady precisely when stability matters most. The five-year moving average adds further smoothing, dampening both crisis-year spending spikes and temporary contractions so that only sustained, multi-year shifts in fiscal capacity affect the payment amount.\nThe bill designates payments as direct spending, exempt from annual appropriations and from sequestration. A payment that can be zeroed out in a continuing resolution or across-the-board spending cut is not a reliable income floor. The political durability of Social Security rests in part on its mandatory spending classification; the check arrives every month regardless of the status of annual appropriations legislation. The Universal Dividend is designed to have the same character.\nFiscal Impact # At 10% in year one, total program cost is approximately $700 billion, rising over 10 years to approximately $7 trillion annually at the 50% cap as the underlying spending base grows. The bill does not specify a funding offset. The funding mechanism is the existing and future tax code. The gradual ramp provides a decade during which Congress can adjust revenue, restructure spending, or both. Whether existing transfer programs are consolidated as the dividend grows is left to future Congresses. The bill neither requires nor prohibits such consolidation.\nH.R. ___ # To provide for the issuance of monthly per capita payments to all citizens and nationals of the United States, to be administered by the Department of the Treasury. # IN THE HOUSE OF REPRESENTATIVES # M___ . __________ introduced the following bill; which was referred to the Committee on Ways and Means\nA BILL # To provide for the issuance of monthly per capita payments to all citizens and nationals of the United States, to be administered by the Department of the Treasury.\nBe it enacted by the Senate and House of Representatives of the United States of America in Congress assembled,\nSECTION 1. SHORT TITLE. # This Act may be cited as the \u0026ldquo;Universal Dividend Act.\u0026rdquo;\nSECTION 2. FINDINGS. # The Congress finds the following:\n(1) The productive capacity of the United States economy generates benefits that should be broadly shared among all citizens and nationals.\n(2) Consumer spending constitutes the largest component of the gross domestic product of the United States, and broadly distributed purchasing power supports sustained economic activity and growth.\n(3) Direct, unconditional cash transfers are an efficient means of distributing benefit and promoting growth.\n(4) A payment linked to the scale of federal expenditure provides a transparent and self-adjusting mechanism that connects the activities of government to the welfare of the people.\nSECTION 3. DEFINITIONS. # In this Act:\n(a) SECRETARY. — The term \u0026ldquo;Secretary\u0026rdquo; means the Secretary of the Treasury.\n(b) ELIGIBLE INDIVIDUAL. — The term \u0026ldquo;eligible individual\u0026rdquo; means any individual who is—\n(1) a citizen of the United States; or\n(2) a national of the United States (as defined in section 101(a)(22) of the Immigration and Nationality Act),\nregardless of age, subject to Section 10.\n(c) CUSTODIAL PARENT OR GUARDIAN. — The term \u0026ldquo;custodial parent or guardian\u0026rdquo; means, with respect to any minor individual—\n(1) the parent or legal guardian with whom the minor primarily resides, as determined under applicable State law; or\n(2) in the case of joint custody, the parent or guardian designated for purposes of this section in accordance with regulations prescribed by the Secretary.\n(d) EMANCIPATED MINOR. — The term \u0026ldquo;emancipated minor\u0026rdquo; means an individual under the age of 18 who has been declared emancipated under applicable State law, or who is otherwise legally recognized as an adult under applicable State law.\n(e) FEDERAL OUTLAYS. — The term \u0026ldquo;federal outlays\u0026rdquo; means total outlays of the Federal Government for a fiscal year as set forth in the final audited budget results published by the Office of Management and Budget for such fiscal year.\n(f) FIVE-YEAR MOVING AVERAGE. — The term \u0026ldquo;five-year moving average\u0026rdquo; means, with respect to any payment period, the arithmetic mean of federal outlays for the five most recently completed fiscal years for which final audited budget results have been published by the Office of Management and Budget as of June 1 preceding the beginning of such payment period.\n(g) APPLICABLE PERCENTAGE. — The term \u0026ldquo;applicable percentage\u0026rdquo; means—\n(1) for the first payment period and the first full payment period following such period, 10 percent;\n(2) for each subsequent payment period, the applicable percentage for the preceding payment period plus 4 percentage points; except that\n(3) in no case shall the applicable percentage exceed 50 percent.\n(h) ANNUAL DIVIDEND AMOUNT. — The term \u0026ldquo;annual dividend amount\u0026rdquo; means, with respect to any payment period, the product of the applicable percentage and the five-year moving average for such payment period.\n(i) PAYMENT PERIOD. — The term \u0026ldquo;payment period\u0026rdquo; means, with respect to any fiscal year, the twelve-month period beginning on October 1 of such fiscal year.\n(j) REPRESENTATIVE PAYEE. — The term \u0026ldquo;representative payee\u0026rdquo; means, with respect to an eligible individual, a person or organization appointed to receive payments under this Act on behalf of such individual who, by reason of mental or physical incapacity, is unable to manage or direct the management of such payments. Such term shall have the same meaning, and shall be subject to the same standards, qualifications, and disqualifications, as the term \u0026ldquo;representative payee\u0026rdquo; as used in section 205(j) of the Social Security Act (42 U.S.C. 405(j)).\n(k) UNITED STATES. — The term \u0026ldquo;United States,\u0026rdquo; when used in a geographical sense, includes the several States, the District of Columbia, the Commonwealth of Puerto Rico, the United States Virgin Islands, Guam, American Samoa, and the Commonwealth of the Northern Mariana Islands.\nSECTION 4. MONTHLY PAYMENTS. # (a) IN GENERAL. — The Secretary shall make a payment in each month of the payment period to each eligible individual. The amount of each monthly payment shall be equal to the annual dividend amount for the applicable payment period, divided by the total number of eligible individuals as of the first day of such payment period, divided by twelve.\n(b) TIMING. — Payments under this section shall be made not later than the 15th day of each calendar month, beginning with the first calendar month of the first payment period.\n(c) DETERMINATION OF ELIGIBLE POPULATION. — The Secretary, in consultation with the Commissioner of Social Security and the Director of the Bureau of the Census, shall determine the total number of eligible individuals as of the first day of each payment period. Such determination shall be made not later than September 1 preceding the beginning of such payment period.\n(d) FIXED PAYMENT AMOUNT. — The monthly per capita payment amount determined under subsection (a) for a given payment period shall remain fixed for the duration of that payment period, notwithstanding any change in the number of eligible individuals receiving payments during such period.\n(e) MID-YEAR ENROLLMENT. — An individual who becomes an eligible individual after the first day of a payment period, whether by birth, naturalization, or any other means, shall be entitled to payments under this section beginning with the first full calendar month following the date on which such individual becomes an eligible individual. Such payments shall be in the same monthly amount as determined under subsection (a) for the applicable payment period.\n(f) ANNUAL PUBLICATION. — Not later than September 15 preceding the beginning of each payment period, the Secretary shall publish in the Federal Register and on a publicly accessible website—\n(1) the five-year moving average for such payment period;\n(2) the applicable percentage for such payment period;\n(3) the total number of eligible individuals; and\n(4) the monthly per capita payment amount.\nSECTION 5. PAYMENTS ON BEHALF OF OTHER INDIVIDUALS. # (a) MINOR INDIVIDUALS — GENERAL RULE. — In the case of an eligible individual who has not attained the age of 18 and who is not an emancipated minor, the payment under Section 4 shall be made to the custodial parent or guardian of such individual.\n(b) EMANCIPATED MINORS. — In the case of an eligible individual who is an emancipated minor, payments under Section 4 shall be made directly to such individual.\n(c) DISPUTES REGARDING MINORS. — The Secretary shall prescribe regulations establishing a process for resolving disputes regarding the designation of a custodial parent or guardian for purposes of this section, including in cases of—\n(1) joint custody arrangements;\n(2) changes in custody during a payment period; and\n(3) the absence of a legal custodial parent or guardian, including individuals in the foster care system or in the care of a State.\n(d) FOSTER CARE AND WARDS OF THE STATE. —\n(1) IN GENERAL. — In the case of a minor individual who is in foster care or is otherwise a ward of a State, the payment under Section 4 shall be made to the foster parent, relative caregiver, or institutional caregiver of record for such individual, as reported by the applicable State child welfare agency.\n(2) DATA SHARING. — Each State child welfare agency shall, pursuant to an agreement with the Secretary, provide to the Secretary on a monthly basis current placement data for all minor individuals in foster care or otherwise in the custody of the State, including the identity and payment information of the foster parent, relative caregiver, or institutional caregiver of record. The Secretary shall redirect payments not later than 30 days following receipt of updated placement data.\n(3) INSTITUTIONAL CARE. — In the case of a minor individual residing in a group home, residential treatment facility, or other institutional setting, the payment shall be made to the entity responsible for the care of such individual and shall be used exclusively for the direct benefit of such individual. The Secretary, in consultation with the Secretary of Health and Human Services, shall prescribe regulations establishing reporting and accountability requirements for such entities.\n(4) NO OFFSET AGAINST STATE OBLIGATIONS. — Payments received under this section on behalf of a minor individual in foster care shall not be used by any State to offset or reduce foster care maintenance payments or any other payments or services to which such individual is entitled under title IV-E of the Social Security Act or any other Federal or State program.\n(e) REPRESENTATIVE PAYEES FOR INCAPACITATED ADULTS. —\n(1) IN GENERAL. — In the case of an eligible individual who has attained the age of 18 and who, by reason of mental or physical incapacity, is unable to manage or direct the management of payments under this Act, the Secretary may appoint a representative payee to receive such payments on behalf of such individual.\n(2) INCORPORATION OF SSA FRAMEWORK. — The appointment, duties, oversight, accounting, and removal of representative payees under this subsection shall be governed by the same standards, procedures, and protections applicable to representative payees under section 205(j) of the Social Security Act (42 U.S.C. 405(j)), to the extent consistent with this Act. The Secretary may, by agreement with the Commissioner of Social Security, designate the same representative payee already serving an individual under such section to serve as representative payee for purposes of this Act.\n(3) USE OF PAYMENTS. — A representative payee appointed under this subsection shall use payments received on behalf of an eligible individual exclusively for the needs and direct benefit of such individual, and shall maintain such records and submit such reports as the Secretary may require.\n(4) NO OFFSET AGAINST INSTITUTIONAL COSTS. — Payments received by a representative payee on behalf of an eligible individual residing in an institution shall not be used to offset or reduce any payment or service to which such individual is otherwise entitled under any Federal or State program.\nSECTION 6. METHOD OF PAYMENT. # (a) IN GENERAL. — The Secretary shall make payments under this Act by electronic funds transfer to an account designated by the eligible individual (or the custodial parent, guardian, or representative payee receiving payments on behalf of such individual under Section 5).\n(b) ALTERNATIVE METHODS. — In the case of any eligible individual who does not designate an account under subsection (a), or for whom electronic funds transfer is impracticable, the Secretary shall make payment by—\n(1) check mailed to the last known address of the individual;\n(2) prepaid debit card; or\n(3) such other means as the Secretary determines appropriate, which may include mobile payment platforms.\n(c) REGISTRATION. — The Secretary shall establish, not later than 60 days after the date of enactment of this Act, a process by which any eligible individual may register to receive payments under this Act, including individuals who have not filed a Federal income tax return and individuals who are not known to the Social Security Administration.\nSECTION 7. TAX TREATMENT AND BENEFIT INTERACTION. # (a) AMENDMENT TO INTERNAL REVENUE CODE. — Part III of subchapter B of chapter 1 of the Internal Revenue Code of 1986 (26 U.S.C. 101 et seq.) is amended by inserting before section 140 the following new section:\n\u0026ldquo;SEC. 139__. UNIVERSAL DIVIDEND PAYMENTS.\n\u0026ldquo;(a) IN GENERAL. — Gross income does not include any payment received by an individual under the Universal Dividend Act.\n\u0026ldquo;(b) DENIAL OF DOUBLE BENEFIT. — No deduction or credit shall be allowed under this chapter with respect to any amount excluded from gross income under subsection (a).\n\u0026ldquo;(c) NOT TREATED AS EARNED INCOME. — Payments excluded under subsection (a) shall not be treated as earned income for purposes of this title.\u0026rdquo;.\n(b) CONFORMING AMENDMENT. — The table of sections for part III of subchapter B of chapter 1 of such Code is amended by inserting before the item relating to section 140 the following new item:\n\u0026ldquo;Sec. 139__. Universal dividend payments.\u0026rdquo;.\n(c) EFFECTIVE DATE. — The amendments made by this section shall apply to taxable years ending after the date of the enactment of this Act.\n(d) NOT TAKEN INTO ACCOUNT FOR FEDERAL BENEFIT PROGRAMS. — Notwithstanding any other provision of law, any payment made to any individual under this Act shall not be taken into account as income, earned or unearned, and shall not be taken into account as resources, for purposes of determining the eligibility of such individual (or any other individual) for benefits or assistance (or the amount or extent of benefits or assistance) under any Federal program or under any State or local program financed in whole or in part with Federal funds. Amounts saved or accumulated from payments under this Act shall retain this exclusion regardless of the period for which they are held.\nSECTION 8. PROTECTION OF PAYMENTS; UNIVERSALITY. # (a) NO EXCLUSION OF ELIGIBLE INDIVIDUALS. — No eligible individual shall be denied, suspended from, or excluded from receiving payments under this Act by reason of any status, condition, or circumstance not expressly provided for in this Act.\n(b) PROTECTION FROM GARNISHMENT, LEVY, AND ASSIGNMENT. — Notwithstanding any other provision of law, no payment made under this Act shall be subject to execution, levy, attachment, garnishment, assignment, or any other legal process, or to the operation of any bankruptcy or insolvency law. No person, entity, financial institution, or governmental body may seize, offset, confiscate, or redirect any payment made under this Act, except—\n(1) as provided in Section 12 (overpayments and underpayments);\n(2) as provided in Section 10 (suspension of payments); or\n(3) as provided in Section 15 (fraud and penalties).\n(c) NO OFFSET UNDER TREASURY OFFSET PROGRAM. — Payments made under this Act shall not be subject to offset under subchapter II of chapter 37 of title 31, United States Code (the Treasury Offset Program).\n(d) WAIVER VOID. — Any agreement by an individual to waive protections under this section shall be void and unenforceable.\nSECTION 9. TERMINATION OF ELIGIBILITY UPON DEATH. # (a) IN GENERAL. — The eligibility of an individual to receive payments under this Act shall terminate as of the date of death of such individual.\n(b) PAYMENT FOR MONTH OF DEATH. — The full monthly payment for the calendar month in which an eligible individual dies shall be made and shall not be subject to recapture.\n(c) IDENTIFICATION OF DECEASED INDIVIDUALS. — The Secretary shall, on a monthly basis, cross-reference payment records with the Death Master File maintained by the Social Security Administration, and shall terminate payments with respect to deceased individuals not later than 60 days following the date of death as recorded in such file.\n(d) RECAPTURE OF ERRONEOUS PAYMENTS. — Any payment made under this Act with respect to any month following the month of death of an eligible individual shall be subject to recapture. The Secretary may recover such amounts—\n(1) from the estate of the deceased individual, in full; or\n(2) from the person who received such payment, subject to the limitation under Section 12(b).\nSECTION 10. SUSPENSION OF PAYMENTS. # (a) IN GENERAL. — Payments under this Act shall be suspended with respect to any eligible individual as provided in this section.\n(b) PHYSICAL PRESENCE REQUIREMENT. — An eligible individual satisfies the physical presence requirement for a payment period if such individual—\n(1) maintains a mailing address or designated financial account within the United States; and\n(2) meets the substantial presence test, as described in subsection (c).\n(c) SUBSTANTIAL PRESENCE TEST. — For purposes of subsection (b)(2), an eligible individual meets the substantial presence test for a calendar year if the sum of the following equals or exceeds 183 days:\n(1) the number of days the individual was physically present in the United States during the current calendar year;\n(2) one-third of the number of days the individual was physically present in the United States during the first preceding calendar year; and\n(3) one-sixth of the number of days the individual was physically present in the United States during the second preceding calendar year.\nThis test shall be applied in the same manner, and days of presence shall be determined under the same rules, as the substantial presence test under section 7701(b)(3) of the Internal Revenue Code of 1986, except that such test shall be applied to citizens and nationals of the United States for purposes of this Act notwithstanding section 7701(b)(1)(A) of such Code.\n(d) GOVERNMENT SERVICE ABROAD. — An eligible individual shall be treated as physically present in the United States for purposes of subsection (c) for any day during which such individual is absent from the United States by reason of—\n(1) service as a member of the Armed Forces of the United States;\n(2) service as a member of the Foreign Service of the United States;\n(3) employment by the Federal Government at an official duty station outside the United States;\n(4) status as the spouse of an individual described in paragraph (1), (2), or (3) who is so serving or employed, if such spouse is accompanying such individual at a duty station outside the United States; or\n(5) status as a dependent (as defined in section 152 of the Internal Revenue Code of 1986) of an individual described in paragraph (1), (2), or (3) who is so serving or employed, if such dependent is accompanying such individual at a duty station outside the United States.\n(e) CESSATION AND RESUMPTION. —\n(1) CESSATION. — If an eligible individual fails to satisfy the physical presence requirement under subsection (b), payments to such individual shall cease beginning with the first full calendar month following the month in which the Secretary determines that the requirement is no longer satisfied.\n(2) RESUMPTION. — Payments shall resume beginning with the first full calendar month following the month in which the Secretary determines that the individual has reestablished physical presence in the United States and satisfies the requirements of subsection (b). No back payments shall be made for any period during which the individual did not satisfy the physical presence requirement.\n(f) ANNUAL CERTIFICATION. — Each eligible individual receiving payments under this Act shall certify annually, at such time and in such form as the Secretary shall prescribe, that such individual satisfies the physical presence requirement under subsection (b). Such certification shall be made under penalty of perjury.\n(g) AUDIT AND EXAMINATION. — Certifications made under subsection (f) shall be subject to audit and examination by the Internal Revenue Service under the same authority, infrastructure, procedures, and standards of selection, review, and due process applicable to the examination of claims under section 911 of the Internal Revenue Code of 1986. For purposes of such audit and examination, the Secretary shall have the same authority to request and obtain records — including passport records, travel records, and such other records as may be relevant to the determination of physical presence — as is available to the Secretary in connection with examinations under such section 911.\n(h) REGULATIONS. — The Secretary shall prescribe such additional regulations as may be necessary to carry out this section.\nSECTION 11. IDENTIFICATION OF ELIGIBLE INDIVIDUALS. # (a) USE OF EXISTING DATA. — For purposes of identifying eligible individuals under this Act, the Secretary shall use—\n(1) information available from Federal income tax returns;\n(2) records of the Social Security Administration;\n(3) records of the Department of State relating to citizenship and nationality; and\n(4) such other information as the Secretary determines necessary and appropriate.\n(b) INTERAGENCY COOPERATION. — The head of any Federal agency that possesses information relevant to identifying eligible individuals or making payments under this Act shall, upon request of the Secretary, make such information available to the Secretary for such purposes, subject to applicable privacy protections.\n(c) OUTREACH. — The Secretary shall conduct outreach to ensure that eligible individuals, including those without fixed addresses, those not in existing Federal databases, and those in institutional settings, are aware of and able to receive payments under this Act.\nSECTION 12. OVERPAYMENTS AND UNDERPAYMENTS. # (a) IN GENERAL. — The Secretary shall issue such regulations or other guidance as may be necessary to provide for proper adjustments in payments, and recapture of payments, to correct underpayments and overpayments, including—\n(1) payments made on behalf of individuals subsequently determined not to be eligible individuals;\n(2) payments made to a custodial parent or guardian following a change in custody; and\n(3) payments not made to an eligible individual due to administrative error.\n(b) LIMITATION ON RECAPTURE. — No recapture of payments under this section shall reduce the monthly payment to any eligible individual below one-half of the monthly payment amount otherwise payable under Section 4 in any given month. Any overpayment amount not recovered due to this limitation shall be carried forward and recovered from subsequent monthly payments, subject to the same limitation, until fully recovered.\nSECTION 13. ADMINISTRATIVE REVIEW. # (a) RIGHT TO REVIEW. — Any individual who is adversely affected by a determination of the Secretary under this Act — including a determination regarding eligibility, suspension of payments, or the amount of any overpayment — may request administrative review of such determination.\n(b) ADMINISTRATIVE REVIEW. — Upon receipt of a request under subsection (a), the Secretary shall conduct a review of the determination by an officer or employee who was not involved in making the initial determination. Such review shall be completed, and the individual notified of the result, not later than 60 days after the date on which the request is received.\n(c) JUDICIAL REVIEW. — Any individual who is dissatisfied with the result of a review under subsection (b) may, within 90 days after being notified of such result, file a civil action for review in the United States district court for the judicial district in which the individual resides.\n(d) CONTINUATION OF PAYMENTS PENDING REVIEW. — Payments to an individual who has requested review under subsection (a) shall not be suspended or reduced by reason of the determination under review until such review is completed, except where the Secretary determines that there is probable cause to believe that the payments were obtained by fraud.\nSECTION 14. PAYMENTS IN TERRITORIES. # (a) IN GENERAL. — The Secretary shall ensure that payments under this Act are made to all eligible individuals residing in the Commonwealth of Puerto Rico, the United States Virgin Islands, Guam, American Samoa, and the Commonwealth of the Northern Mariana Islands.\n(b) ADMINISTRATION. — The Secretary shall enter into agreements with the government of each territory described in subsection (a) under which—\n(1) the territorial government shall assist in identifying eligible individuals residing in the territory and in disbursing payments under this Act, using such data systems and administrative infrastructure as may be available;\n(2) the Secretary shall provide all funds necessary for payments to eligible individuals in the territory, as well as reasonable administrative costs incurred by the territorial government in carrying out its responsibilities under the agreement; and\n(3) the territorial government shall comply with such reporting, auditing, and accountability requirements as the Secretary may prescribe.\n(c) DIRECT ADMINISTRATION. — If the Secretary determines that a territorial government is unable or unwilling to enter into an agreement under subsection (b), or that a territorial government has failed to comply with the terms of such agreement, the Secretary may administer payments directly to eligible individuals in such territory using Federal resources.\n(d) EQUAL TREATMENT. — The amount of any payment made to an eligible individual residing in a territory shall be the same as the amount payable to any other eligible individual under Section 4.\nSECTION 15. FRAUD AND PENALTIES. # (a) CRIMINAL PENALTIES. — Any individual who knowingly makes a false statement or representation of a material fact in connection with obtaining or attempting to obtain a payment under this Act shall be subject to prosecution under section 1001 of title 18, United States Code, and such other provisions of Federal criminal law as may apply.\n(b) CIVIL PENALTY. — Any individual who receives payments under this Act for a period of 12 or more months to which such individual is not entitled, by reason of a knowing failure to disclose a material change in eligibility, shall be liable to the United States for the amount of such payments, plus a civil penalty equal to twice the amount of such payments. Recovery of amounts owed under this subsection may be made by offset against future payments under this Act, subject to the limitation under Section 12(b).\n(c) RECOVERY OF OTHER OVERPAYMENTS. — Any individual who receives payments under this Act to which such individual is not entitled, and who is not subject to subsection (a) or (b), shall be subject to recovery of such payments under Section 12.\n(d) REFERRAL. — The Secretary may refer any matter arising under this section to the Attorney General for investigation and prosecution.\nSECTION 16. APPROPRIATIONS; MANDATORY SPENDING. # (a) APPROPRIATIONS FOR PAYMENTS. — There are hereby appropriated, out of any money in the Treasury not otherwise appropriated, such sums as may be necessary to make payments under this Act for each fiscal year, or for any portion of a fiscal year during which payments are made under this Act. Such sums shall include an amount equal to 5 percent of the total projected payments under Section 4 for the applicable period, to accommodate mid-year enrollment under Section 4(e). Any amounts appropriated under this subsection that are not expended during the fiscal year for which they are appropriated shall be available for payments under this Act in succeeding fiscal years.\n(b) APPROPRIATIONS FOR ADMINISTRATIVE COSTS. — There are hereby appropriated, out of any money in the Treasury not otherwise appropriated, such sums as are necessary for the proper and efficient administration of this Act for each fiscal year, or for any portion of a fiscal year during which the Secretary carries out responsibilities under this Act. Such sums shall be available for personnel, systems, outreach, interagency agreements, fraud prevention and detection, administrative review under Section 13, and such other administrative functions as the Secretary determines necessary to carry out this Act.\n(c) DESIGNATION AS DIRECT SPENDING. — All amounts appropriated under this section — including amounts for payments under subsection (a) and amounts for administrative costs under subsection (b) — shall be classified as direct spending (as defined in section 250(c)(8) of the Balanced Budget and Emergency Deficit Control Act of 1985) and shall not be subject to annual appropriations Acts or to sequestration under such Act.\nSECTION 17. REPORTING AND OVERSIGHT. # (a) ANNUAL REPORT. — Not later than March 1 of each year, the Secretary shall submit to the Committee on Ways and Means of the House of Representatives and the Committee on Finance of the Senate a report on the implementation of this Act during the preceding fiscal year, including—\n(1) the total number of eligible individuals who received payments;\n(2) the total amount of payments made;\n(3) the number of individuals enrolled through mid-year enrollment under Section 4(e);\n(4) the number and amount of overpayments identified and recovered;\n(5) the number of fraud cases referred under Section 15;\n(6) the number of eligible individuals residing in each territory who received payments under Section 14;\n(7) the number of eligible individuals for whom payments were suspended or resumed under Section 10;\n(8) the total amount of mid-year enrollment buffer funds appropriated under Section 16(a), the amount expended, and the amount carried forward;\n(9) the total administrative costs incurred under Section 16(b), including a breakdown by major category of expenditure; and\n(10) such other information as the Secretary determines appropriate or as may be requested by such committees.\n(b) GAO AUDIT. — The Comptroller General of the United States shall conduct an audit of the implementation of this Act not later than 2 years after the first payments are made under this Act, and every 3 years thereafter. Such audit shall include an assessment of payment accuracy, program integrity, administrative costs, and the effectiveness of outreach efforts under Section 11(c). The Comptroller General shall submit the results of each audit to the committees described in subsection (a).\nSECTION 18. REGULATIONS. # (a) INTERIM GUIDANCE. — Not later than 90 days after the date of enactment of this Act, the Secretary shall issue such interim guidance as may be necessary to carry out this Act upon its effective date.\n(b) FINAL REGULATIONS. — Not later than 180 days after the date of enactment of this Act, the Secretary shall issue such final regulations as may be necessary or appropriate to carry out this Act. Until such final regulations take effect, the interim guidance issued under subsection (a) shall govern.\nSECTION 19. SEVERABILITY. # If any provision of this Act, or the application of such provision to any person or circumstance, is held to be unconstitutional or otherwise invalid, the remainder of this Act, and the application of such provision to other persons or circumstances, shall not be affected thereby.\nSECTION 20. EFFECTIVE DATE; TRANSITIONAL PROVISIONS. # (a) EFFECTIVE DATE. — This Act shall take effect on the date that is 90 days after the date of enactment of this Act.\n(b) FIRST PAYMENT PERIOD. — Notwithstanding Section 3(i), the first payment period under this Act shall begin on the first day of the first full calendar month beginning on or after the effective date of this Act and shall end on the following September 30. Each subsequent payment period shall be as defined in Section 3(i).\n(c) TRANSITIONAL DETERMINATIONS. — For the first payment period under this Act, the Secretary shall—\n(1) determine the five-year moving average using the five most recently completed fiscal years for which final audited budget results have been published by the Office of Management and Budget as of the effective date of this Act;\n(2) determine the total number of eligible individuals as soon as practicable before the beginning of the first payment period; and\n(3) publish the information described in Section 4(f) as soon as practicable before the beginning of the first payment period.\n","externalUrl":null,"permalink":"/ubi/proposal/","section":"UBI","summary":"","title":"UBI Proposal","type":"ubi"}]