The Judgement Intrinsic to GenAI (and technology as such)
A friend passed along this essay, which responds to this op-ed. The issue is whether LLMs are capable of judgement. The op-ed says judgement is "uniquely human," the essay disagrees. I, too, disagree with the op-ed, but my disagreement turns out to be quite a bit different. The essay's argument is broader than whether LLMs can judge what to do in a situation when prompted to do so. The author's target is a more general form of pronouncement of which this is one: the claim that LLMs or AI cannot do X. Usually such a claim is attached to another claim that reserves X to the remit of human beings. The counterpoint is an empirical one: actually, cutting edge models unavailable to those making such claims do do X, and even if they didn't, then they soon will do. That's fine, if a bit tiresome in the wake of the deflated hype that preceded ChatGPT 5. I wish the essay had gone into details about how certain things become possible for generative AI. But then, I don't think that's really what's at stake. The examples provided are not about capability as in can or cannot but, rather, pertain to how well. This is in keeping with how LLMs have developed through increased scale and without any fundamental changes. And this is exactly what the op-ed says as well: despite claiming that humans are uniquely capable of judgement, the author does not even question that generative AI makes judgements. The claim turns out to be that they're just no good at it.
The definition of judgement provided by the op-ed is broadly defined as whether a decision can be reached based on multiple axes of evaluation and evidence. It adds one further, albeit tacit, requirement, which becomes obvious from the criticism: the decision reached needs to be the correct one. Correctness is taken to be whether the course of action suggested by the LLM achieves the desired outcome, which in the example provided is a business deal. The op-ed twice performs a bait and switch. It claims that LLMs cannot make judgements, then tacitly accepts that they do, in fact, make judgements, but really they cannot judge well enough, which turns out to mean that they cannot predict the future decisions made by a company as to the price it would accept for its acquisition. This is a strange line of argument to take if you want to deny that some text-generating automatism is incapable of judgement. Isn't a judgement necessarily open to being wrong? And just because a machine, or a person for that matter, does something worse than another person, or a machine for that matter, poses a completely different question: has that ever stopped a corporation from choosing to deploy the worse option? Of course not. (And this is what's really at stake for both pieces, not the claim about judgement in their titles.) But I'm not really all that interested in this. Nor do I think that notions of "uniquely human" are worth defending. It's immaterial, idealist hogwash that only serves to shore up exclusionary ideology and bad thinking. What I am interested in is whether LLMs make judgements.
I'm not convinced that it makes sense to ask whether an LLM is capable of passing judgement, as if some of what it produces are judgements, while other outputs are not. Rather, I would first want to consider whether everything an LLM produces is a judgement. This is to ask, does every transaction from any prompt to any response involve judgement? Is an LLM intrinsically judgmental? The essay, and the op-ed to which it responds, assumes that judgement pertains to a specific subset of language responses. But I think the very capacity to respond presupposes judgement, unless we were to constrain judgement to a specific philosophical definition as asserting the truth or falsity of a statement's correspondence to something in the world. Indeed, this epistemological kind of judgement is not applicable to the bullshitters that LLM-text generation is.1 Based on the time-traveling logic of the op-ed, that's basically what judgment ends up meaning—but I'm sticking with the spirit of the piece, not what it ends up saying.
Judgement, we are told in the op-ed, requires one “to weigh considerations.” An LLM comprises weights, that is, valuations of a multitude of vectors. The training process is the weighting process. The heart of the matter for me lies in (1) the range of values that can possibly pertain to the production of a response as well as (2) the operations whereby that valuation occurs. Is the training and querying an LLM evaluative? I think it is crucial to keep the training and query response together as two moments of the technology called generative AI. I'll explain why in a moment.
But first, to answer the question up front: Yes, certainly an LLM evaluates, and unless the "judgement" is defined in the aforementioned epistemological sense, I think everything an LLM does is a judgement. Moreover, whether statistical regression calculations and their application to novel data are "running on" a human brain (and on whatever auxiliary technics required) or a supercomputer in order to arrive at a determination doesn't make a difference, because the determination is in the technique. The training of an LLM and how it is prompted all express judgements that contribute to how it will perform. And whether an action is taken in response to its response, which could be automated in advance, is yet another judgement. My position is that wondering whether an LLM is in and of itself capable of judgement is a nonsensical question. It seems to make sense, because in this ideological climate (the ideology of the technical object), the LLM is a black box. We are told this over and over. We don't know how exactly they work or what they do under the hood. It's a serious problem that computer scientists devote careers to. Yes, and even so, an LLM is not a black box in the sense that it exists independently and functions autonomously. It still exists as the result of decisions and of other technologies (some manifesting in forms we'd recognize as human), and its inputs and outputs function in concert with the activity of other technological processes that include judgements the locus of which is a human being.
To rephrase this more succinctly and with an eye toward the point I'm driving at, I think that LLMs do one thing: they receive a prompt and then provide a response. Attaching an LLM to additional tools or giving it free reign over a system via its own terminal makes no difference, because before that, the LLM was hooked up to a human being. The LLM was already having an effect on the part of the world most capable of propagating its result further. Hooking it up to tools constrains that field of reception. It is arguably doing less of the one thing it does, but it's doing it faster and probably more than once without a person's attention needing to supply the LLM with additional motive force. Either there is judgement in what an LLM does or there is not.
Now I want to get a bit further into why I think the training of an LLM can't be separated, except by a division of reason, from its querying function. Technology of any sort is imbued with the past experience of its creators, and thus it will project a way of orienting toward the world. I don't mean this in some spiritual sense, as if a knapped stone axe stores the ghostly glimmers of an ancestral human being's goals for chopping wood. (Bernard Stiegler would say that a tool reflects the memory of invention that it is back to whoever observes and wields it before they have a memory of it themselves. This speculative paleoanthropology is convincing from the vantage of millions of years, but less so in the specific instance as imagined here.) In the knapped stone axe persist both the succession of blows that shaped its blade and the binding of materials that affixed its handle. Its physical properties lend it certain affordances. The judgements of the crafting process (i.e. how the thing is made) transform a piece of the world into a material orientation on and of the world. In this case, the values may be expressed as hacking, slicing, hewing. The tool contributed to the judgements that constitute the activity of its wielder. (I consider this activity an extension of the technological process that began with the second blow to the stone that became the axe head.) The wielder may be a person or some device. Judgement enters into use, because without judgement nothing would happen. The interesting question is when does judgement occur: in the moment before a person acts; in the training of the person who will eventually use the tool; in the machine constructed to power the tool? Yes, in all of those moments, and before, when the technological process began.
One way to think about technology is as the coordination of a sequence of decisions, or a crystallization if you need to think in terms of solid objects, such that any enduring impact of the technology (perhaps as a physical object or a routinized mental system) is the persistence of those decisions. Technology provides a conduit for active judging, whereas unstructured traces of events do not. Behind every response generated by an LLM is a host of judgements made by the people who designed and trained it, and there are as well the recorded representations of judgements in the materials used to train it. To denigrate it as "just a tool" turns a blind eye to the judgements that constitute it and that get passed on to its users. It also misses how judgement tends to happen: in concert with a myriad of technologies (including the language in a person's thoughts) and feelings. Judgement is a process that occurs neither solely in the mind nor in a punctual stroke of decision. It propagates, taking on more evidence here, shedding the import of certain values there, until it either issues in a judgement proper or stops short, unable to obtain sufficient intensity to provoke its uptake.
https://blog.kagi.com/llms.