What transcends the Algorithm – AG Medienphilosophie

What transcends the Algorithm – A Critique of the Notions of a Postdigital and a Subsymbolic

The theological difference between immanence and transcendence finds a counterpart in technology

in the form of certain concepts, which, while being an aspiration of approximative processes, can never be reached. With the continuing success of machine learning and neural networks some authors begin to question such boundaries, for example, as I will argue, through the notions of a postdigital and a subsymbolic. As the argument goes, by employing a new programming and computing paradigm, which radically differs from instruction based coding, they reach a degree of parallel computing that renders them “quasi-analog” (Sudmann 2018, p. 66) and able to perform “sub-symbolic processing” of “sub-symbolic data” (Esser et al. 2013, p. 1). While some proponents of behaviorism, cybernetics and early AI reduced the human by claiming that humans could fundamentally be described in machinic, algorithmic terms – as advanced feedback/control loops, input/output-mechanisms or “information processing systems” (Weizenbaum 1976, p. 226) – these ideas, while certainly not comparable in their radicalism, make an attempt in the opposite direction: they don’t reduce the human, but ascribe something new to the technological, thereby moving it up in a realm somewhere beyond the old symbolic, rule-based, deductive, formal, digital technologies, possibly somewhat closer to human features. As a recent article in the ZfM suggests: “Secondly a neural net does not follow a paradigm of logical deduction or explicit rules […] it is the system as a whole that calculates.“; “Artificial neural networks can neither be thought of as completely atomistic nor as completely holistic. They appear to be something in between.“ (Bajohr 2020, p. 177/178, my translation)¹

The notions of a postdigital and a subsymbolic, addressing this ‘in-between-nes’ between the digital and something else, the rule-based deductive and something else, are an attempt to describe the differences between what Dreyfus called GOFAI (good old fashioned AI) and more recent technologies, mostly referring to statistical machine learning. While I certainly do not want to dismiss differences – I think the performative success undeniably suggests that there are differences, and attempts to capture these differences conceptually are well justified – I will argue in the following that the differences suggested by these two notions do not exist; unlike also to what Dreyfus thought about machine learning, who himself considered the possibility that his arguments about what computers can’t do might become obsolete facing these new technologies.

First I am going to present what, at least in my reading, is meant by these two notions. Then I am going to argue why I think they are misleading, in the sense that they miss the heart of the matter at hand and imply that machine learning technologies might be doing things they simply cannot do. They do not enter a realm between the holistic and the atomic, a realm beyond the symbol or the digital.

1) The Terms

1.1) The Postdigital

The term postdigital has already accumulated a variety of meanings. Here, I do not talk about aesthetic practices resulting from disillusioning experiences with the digital, like enjoying again the sound of vinyl, therefore meaning a turn back to the analog (Cramer 2014). Nor do I consider thoughts about modern culture and the relationships between humans and technology as a whole as postdigital.²

Rather I am referring to a technological instead of a cultural understanding of the postdigital. Petar Jandric in The Postdigital Challenge contrasts “older software technologies” with AI and lists “machine learning, neural networks, deep learning” as supposedly postdigital technologies or postdigital algorithms (Jandric 2018, p. 26). Sudmann describes ML-Algorithms as postdigital technologies because of their “massive parallelization”. In big neural networks, thousands and millions of calculating operations are carried out simultaneously – much more so than in traditional Von Neumann computing setups. They transcend the purely digital by calculating their weights and biases by using floating point numbers instead of integers, and this representation is considered as so delicate to be called “quasi-analog” instead of just digital (Sudmann 2018, p. 66). The mentioned article (Bajohr 2020) supports this conclusion, which is a rather surprising one, considering that floating point numbers have been invented to digitize the real numbers and therefore enabling digital machines to calculate with more than just integers (see below).

In business, the postdigital tends to be described as the totalization of the digital: “The post-digital world is one where technology is the fabric of reality, and companies can use it to meet people wherever they are, at any moment in time”, writes the technology consulting firm Accenture.³ This thought of the postdigital as a kind of completion of the digital also implicitly underlies the former understanding of the quasi-analog, as I will argue later. Accenture is, perhaps surprisingly, very outspoken about the strategic rather than descriptive sense in which the word is used: “Just as people no longer say they live in the ‚age of electricity,‘ the days of calling something digital to insinuate that it is new and innovative are numbered.” So “digital” has just been said too often to have an advertising effect, instead of their being a technological need for a new term.

1.2) The Subsymbolic

To describe the different modes of processing in classical computing architectures (Dreyfus’ GOFAI) and statistical machine learning systems (the question of how different they really are has to be treated carefully), often the terms of subsymbolic processing of subsymbolic data are used for the latter, while the former are said to perform symbolic processing on symbolic data (see e.g. Esser 2013). These terms likely originate from Wilma Buccis works in cognitive psychology (Bucci 1997). As a distinctive term describing the difference between older and newer forms of algorithms, the subsymbolic has a similar function to the postdigital in the sense I am discussing here.

Esser describes the difference using adjectives like “low-precision, synthetic, simultaneous, pattern-based” for subsymbolic and “high-precision, analytical, sequential, logic-based” for symbolic data and data processing (Esser 2013). Other descriptive distinctions are “Explicit programming vs. Bayesian/Deep Learning” and thus “Programming Languages vs. Neural Networks”, “Hardcoded Rules vs. Connectionism”. So Machine Learning technologies and a lot of what is called big data is termed subsymbolic.

What is called symbolic processing is based on a very explicit representation of inputs, operations and outputs: all of these have to be defined in a strict form of (logical) atomism. The claim now is, as e.g. (Bajohr 2020) summarizes and advocates, that machine learning technologies somehow soften up this atomism, becoming a “’mixed type’ or something else entirely” (Bajohr 2020, p. 170, my translation).4 That would be huge, considering that Dreyfus’ classical criticism of GOFAI is fundamentally based on arguing that GOFAI cannot get beyond the limitations and reductions posed by the form of logical atomism (he called this the “metaphysical hypothesis”): e.g. to process language, terms have to be defined as atoms pointing to other atoms, and that might not be a way to reach human-level language understanding capabilities. Considering the successes of today’s natural language processing, one might be inclined to think that Dreyfus’ metaphysical hypothesis has been invalidated, but, as I will argue, it has not: statistical algorithms, no matter how impressive the things they do might be or become, do not get under or beyond symbols and they can’t do without logical atomism.

2) Critique

2.1) Postdigital

What is misleading about the term postdigital used in the sense just described? While being a, to say the least, multilayered term, digitalization is, at its technological core, the representation of something continuous by discrete units. What is meant to be represented is approximated by using ever finer discrete granularity: a picture in pixels, a sound with a certain bitrate, etc. As technology advances, computers get faster, those units become smaller and more in number. A screen can show images in 4K and higher resolutions, preserving more detail which formerly was lost in the process of digitalization.

Sudmann argues that by providing a substantial leap forward regarding this approximation, machine learning technologies should be called postdigital. In fact, this leap happens in two realms: first, the values, weights and biases, by which neurons in a neural network are connected, are floating-point numbers and not discrete integers. Secondly, he stresses the interconnectedness of millions of neurons, which is said to suspend the discreteness of its elements. But allowing for approximating granularities to become finer, to decrease the size and increase the number of the discrete units, cannot be called something after the digital, because it is the very essence of the digital. This is because the ontological gap between the discrete and the continuous can never be closed this way. Better approximation allows to preserve more properties and features of the approximated, but they remain fundamentally different and that cannot be changed by better approximation. But is it not possible, that by better and better approximation, one ‘shifts’ into this postulated realm ‘in-between’, leaving the purely digital behind? I don’t think it is.

This can be seen in a very practical, material way. Machine learning algorithms are, before their implementation on concrete machines, mathematical procedures. In mathematics we have the luxury of the limit: things change when we go from(may the n be as big a number as we can possibly imagine) to, and they change fundamentally. At the limit,becomes zero, unmeasurably small. Everything else is something entirely else:may be closer to, but is not any more similar to the limit, than. It can be measured, you are allowed to divide by it, and it can without any problems be calculated with on computers. The same goes for the infinitely large: it is a fundamental change from with an arbitrary n to. In other words: there is no post-* to be had on the ‘immanent’ side of the limit. Transcending measurability and reaching the immeasurable small or large does not happen before substituting numbers by the infinity- or limit-symbol, which can only be had in the abstract realm of mathematics, or put differently: in thinking, and not in implemented technology. The discipline of numerical mathematics can help calculate such limits evermore precisely, but its implemented procedures can never bridge or soften up the gap.

A concept as basic as the real numbers is already impossible to implement in computing. As the continuous completion of the rational numbers, they have to be represented as the aforementioned floating point numbers: a kind of complex combining of natural numbers consisting of a significand, a base, and an exponent. Naturally, these can only approximate real numbers like as good as the storage capacity of the implementing device allows. Continuity can’t be reached, everything is always discrete, and therefore digital. If weights and biases in neural nets are represented by floating point numbers, they are nothing but digital. If n neurons on m layers are connected by k connections, the interconnectedness can be as massive as the computing hardware allows, but it can’t become anything other than digital.

All of this applies to the basic mathematical concepts used in machine learning as well. Stochastic training of algorithms is all about minimizing some calculated error. To find minima of a function, the basic approach doesn’t change from what most of us have at some point learned in school: you have to find the derivative. While things get a bit more complex, this is the fundamental task of the famous backpropagation algorithm that launched forward the learning capabilities of neural networks (Sudmann 2018, p. 56). So in essence it is all about differential calculus, and the derivative, as one might not necessarily remember from school, is in fact defined as a limit: .5 Obviously, we are not allowed to really go there, because dividing by zero is evil. So when a computer has to calculate a derivative, in general it has to apply numerical techniques – that means using values really close to what we want, so in this case a very small h, but without ever being able to use exactly what we want. Especially, these values are necessarily discrete values (floating point numbers).

What all of this is meant to demonstrate is the following quite simple fact: a postdigital can be had. Real numbers, limits, therefore derivatives are all concepts which can only be thought of as truly continuous. They extend the digital rational numbers to the continuous real numbers, they extend the possibilities of calculation from finite values to infinity. And they are not even that complicated; we (more or less) routinely operate with them. But they can only exist outside of technology, in pure mathematics, in pure thought. Everything else is always digital. The difference is captured in what Ernst Cassirer called “Substanzbegriff” and “Funktionsbegriff”: mathematical concepts do not recur to any substance, any foundation in the real world, so are “reine Formwissenschaft”, and no “Wirklichkeitswissenschaft”. Technology, on the other hand, has to be based on substancial concepts, it is useless as “Formwissenschaft” but has to be “Wirklichkeitswissenschaft” and therefore cannot play all of the formal games which mathematics can play (Koenig 2017, p. 165/167).

And this is nothing new which just emerged with statistical algorithms, but goes for every algorithm there ever was. The divide between theoretical-mathematical infinity and continuity and numerical digitality is itself discrete. It cannot be softened by better approximation through more computing power or faster algorithms. All they can offer is a ‘yet-more-digital’ or a ‘yet-a-step-closer-to-whatever-digital’, but never a post-digital. There is no realm in between. What is termed postdigital is rather a faster, better, more potent digitalization than anything new, so it leaves nothing behind, which would justify the use of the “post”-prefix.

One could ask: would it then not be justified to call machine learning as a theoretical, mathematical and not yet computerized endeavor postdigital, because it relies on postdigital concepts like the derivative? The answer is yes, it would be, but this would derive the term postdigital of every sting it might possess, because almost every algorithm could be called postdigital by this understanding. So the distinction digital/postdigital still would not work as a distinction between more traditional algorithms and stochastic learning, but only between a mathematical definition and its technological implementation.

So everything described is still fundamentally digital. The mentioned consulting firm Accenture seems to at least subconsciously know that: in the cited publication they write about postdigital consumers, workers, markets, but not technologies. What all of those might be, would be a whole new topic I do not wish to enter here.

2.2) Subsymbolic

Turning to the subsymbolic, I first want to address a bit further what it could be that is “under” the symbol. So what I will talk about is not necessarily what is meant by researchers contrasting symbolic and subsymbolic data, processing, or logic, but what could be or appears to be meant if we take the term seriously.

Symbols are our means of medial communication and expressing thoughts, e.g. language, gestures, mathematics, pictures, etc. One could think that “under” the symbol refers to something that is meant, but not said: something written between the lines, the subtext of a text. But that would just be something that is subsymbolic in a special case, but could become symbolic by uttering it or writing about it. A subsymbolic in a general sense seems to address something that necessarily gets lost in the process of symbolization, which occurs in the phenomenon we try to talk or write about, but cannot be transported by symbolic means. It is lost in the fundamental medial operation of conveying something as something (as Dieter Mersch emphasizes: etwas “als” etwas). This is related to the problems of logical atomism: what is it that the atomic or symbolic representation misses? And further: can machine learning help us not miss it any longer?

Wittgenstein offers a first account of this with his distinction between “Sagen und Zeigen”, or to say and to show. Every sentence utters a meaning, which is what it says, but does so in a specific way, which is how the sentence shows or conveys its meaning. This way of showing necessarily stays unexpressed by the sentence. One can utter a new sentence which says something about the way the first sentence shows what it has to say; but inevitably it will spawn a new form of showing which is not talked about, because it is impossible to say something without a way of showing. Due to this circular recourse in every instance of symbolic communication (so in every communication) there always remains something implicit which cannot be made explicit. This is something that in some sense stays beneath the symbol and cannot be eradicated.

A recurring theme when talking about the limits of symbolic expression is the sublime or “das Erhabene”. Most famous may be Kants take on the sublime. For Kant, menacing cliffs, piled up thunderclouds, storm and thunder or the seemingly endless ocean are not sublime as such, but they demonstrate the superiority of pure reason over experience, and this constitutes the sublime. Phenomena like these transcend all symbolic expression: “[…] for no sensible form can contain the sublime properly so-called. This concerns only Ideas of the Reason, which, although no adequate presentation is possible for them, by this inadequacy that admits of sensible presentation, are aroused and summoned into the mind.“6 This moment of self-reflectivity that constitutes the sublime is an algorithmic impossibility: the experience that ones symbolic means are not enough to capture what is witnessed is based on something beyond symbolization, and while it is questionable how one can relate to this experience, it is at least somehow realized. No algorithm or mathematical procedure can realize anything beyond its symbolical operations.

This is the dynamical sublime. Kants second type, the mathematical sublime, also is sublime because it shows the superiority of reason, but in a slightly different way. It is something absolutely great, meaning great beyond any comparison. To express or represent something symbolically always means, as Lyotard puts it, to relate it, to set it in correlation, and that is of course impossible for something absolute: “The faculty of presentation, the imagination, fails to provide a representation corresponding to this idea [of an absolute, B. W.].” (Lyotard 1991, p. 98)7

So the sublime is not located in phenomena, but a realization about the inner workings of ones mind through such phenomena: our mind can operate with the concept of a regulative large or infinite, although our senses cannot capture it and our symbolic means not adequately express it. We can without much difficulty compare different scales of infinity, the countable and uncountable infinite. But we do so by formal definitions (e.g.: is there a bijection to the set of natural numbers?) and cannot lay out a countable and uncountable set to see the difference. The sublime, again citing Lyotard, „bears witness to the incommensurability between thought and the real world.“ (Lyotard 1991, p. 95)⁸ Observing this incommensurability subverts the symbolic, because this aspect of reason cannot be made visible in the real through a material sign.

Lyotard gives great importance to the question of an unrepresentable. “Now in my view this question is the only one worthy of what is at stake in life and thought in the coming century.” (Lyotard 1991, 127)9 If this is indeed the height of fall we are talking about, this shows that the notion of a subsymbolic, which is inevitably associated to the question of the unrepresentable, should not be used careless.

So is it used too careless in the described way relating to statistical algorithms? After speculating about what a subsymbolic might be we have to ask the question how and if modern machine learning algorithms have a different kind of access to such domains than classical, symbolic algorithms. And the answer has to be, they do not.

Once again we have to remind ourselves that all algorithms that are at issue here are mathematical procedures. We are not talking about more metaphorical uses of the term including recipes for cooking and all kind of structured, rule-following daily activities. Be it PageRank, recommender systems, AI-Assistants like Siri and Cortana, calculating credit worthiness, the training of neural networks – all of those are mathematical procedures before they are implemented on whichever concrete hardware they are supposed to run on. So whatever should serve as input to an algorithm, be it language, pictures or other data structures, has to be represented by some kind of numeric value, and that means as a symbol. Nothing other than such numerical symbols is accessible to an algorithm. So to capture any kind of phenomenon in the form of algorithmic input data already means symbolization and eradicating anything subsymbolic from it. Of course such subsymbolic properties might be attached to it in a symbolic form, like discussed regarding the difference of showing and saying, but not without creating a new left out subsymbolic.

From the impossibility of subsymbolic data follows the impossibility of subsymbolic processing. What would that be? At the core of all algorithms there is just one way of processing, the mathematical, using whatever tools calculus, algebra, probability theory etc. have to offer. That might or might not be a purely symbolic endeavor (as intuitionists in the philosophy of mathematics argue), but it concerns the processing of symbols and not sub-symbols, non-symbols or more-than-symbols. A quote by Bourbaki shall illustrate these relations: “The mathematician does not work like a machine […] we cannot over-emphasize the fundamental role played in his research by intuition, which is not the popular sense-intuition, but rather a kind of direct divination (ahead of all reasoning) of the normal behavior, which he seems to have the right to expect of mathematical beings, with whom a long acquaintance has made him as familiar as with beings of the real world.” (Bourbaki 1950, p. 227) It is popular to see it as the beauty of mathematical beings that they are as in the view of Bourbaki abstract, formal structures, cleansed from every real-world obligation. Mathematics is a formal game of symbols, though a mathematician might use instincts, intuition and other non-symbolic, non-formal means while playing it. A machine playing this game can, as I have argued, only play a discrete subset of this game, cleansed not only from the non-symbolic, but from everything infinite and irrational (continuous), using only symbolic means.

Because of this symbolic mathematical foundation, the impossibility of an algorithmic subsymbolic is not dependent on its implementation, meaning the characteristics of the hardware it is run on. It might be debatable whether a quantum computer will still be a digital machine or if it might in some sense justify the use of the term postdigital, but it will not transcend this mathematical background. It will operate with numerical values in floating point representation, and that means symbolical, logical atoms. So atomism cannot be overcome.10

In fairness, this is not exactly what is meant by the term. It rather alludes to not using explicit commands encoding what the algorithm should do (if-else structures etc.) but letting it figure it out by statistical optimization, thereby not really using distinct rules and facts (Bajohr 2020, p.172). But for this it is a misleading term: omitting explicit coding and implementing an explicit optimization procedure (like backpropagation) does nothing to defy ‘symbolicality’ – again, there is no in-between. Lyotard describes the sublime as „a kind of cleavage within the subject between what can be conceived and what can be imagined or presented.“ (Lyotard 1991, p. 98)¹¹ Such a split does not exist in algorithmic processing, so in particular does not exist in those algorithms termed subsymbolic.

Perhaps one of the goals of drawing a questionable distinction between symbolic and subsymbolic systems is to avoid Dreyfus‘ aforementioned famous critical remarks in What computers can’t do. Written in 1972 and updated in 1979, his main points only count for what he calls GOFAI (Good Old Fashioned AI) or, subscribing to the same terminology I am discussing here, symbolical AI. Dreyfus refers to machine learning and statistical approaches only in more recent introductory remarks (Dreyfus 1979, p. xxx). While not calling them subsymbolic but connectionist systems, he still clearly separates them from symbolical AI, therefore becoming susceptible to all the imprecisions of this terminology I described above. And indeed Dreyfus does not formulate any fundamental arguments why machine learning technologies should not be able to reach human level performance in areas like language understanding. He sees a lot of similarities, but doubts that statistical algorithms will reach the necessary complexity and size to connect pieces of knowledge as diverse and flexible as humans do. So his objections are of purely quantitative nature (but one should acknowledge that he does not approach them nearly as systematically as the GOFAI-technologies of the 1960s). However, because machine learning does not overcome the symbolical, perhaps it should be revised if some of Dreyfus arguments still hold (without himself being aware of it). The symbolic is still an impenetrable border for AI. Therefore, one can understand the Alpha 60 from Godards Alphaville, an AI that computes the fate of the city, when it says: “Es ist ein Unglück, dass die Welt real ist”. A symbolically represented world would be completely accessible to it. The worlds ‘real remainders’, which cannot be eradicated by updating the residents dictionaries of permitted words, are a constant source of trouble.

It is perhaps important to note on what level I am making a statement and on which I am not. I am not claiming any performative limitations of algorithmic processing: the fact that a subsymbolic does not exist for algorithms does not make any proposition about capabilities or lack thereof in regard to those algorithms. The lack of a subsymbolic dimension does not mean that in some kind of turing test-setup a machine could not successfully simulate a human being with access to a subsymbolic dimension. It merely states that concluding the indifference or even a blurring of the lines dividing the respective processes based on such a test would be a behavioristic fallacy.

3) Conclusion

I have argued on a technological level and even below that, meaning its mathematical foundations, not discussing social uses, cultural consequences etc. of algorithms. One might find that a tad reductionist, but the two terms at least in part refer to this level. And although some warn us that the reference to technology might be a trick to produce unambiguity12, I think that the issues discussed indeed are not ambiguous.

My argument is not about denying differences between statistical machine learning systems and traditional computing setups, but rather to not carelessly infer principal differences from performative differences. The notions of a postdigital and a subsymbolic, if not stating them openly, at least imply such principal differences. They point in the right direction, taking matters too far: machine learning technologies allow for more digitization and other modes of symbolic data processing, but do not transcend them.

There are differences between stochastic training and classical algorithms, for the description of which we lack proper vocabulary. An example is the often cited unexplainability of statistical AI-systems. On the surface (and by surface here I also mean the source code which might be read by savvy cultural scientists etc.) it is no longer just readable what a system will do and how it will achieve it. We do not get if/else-structures, explicit loops etc. which basically tell us all we need to know. But that does not mean that everything (or anything) is principally unexplainable: every single differential and algebraic calculation could still be reproduced with pen and (a lot of) paper – it’s a matter of scale and speed which makes this reproduction practically impossible, not a deep obfuscation or a step beyond symbolic representability. But these matters of scale already make understanding big classical algorithms, which might consist of millions of lines of code, basically impossible.

The postdigital and subsymbolic are not precise enough to capture this, because they imply bigger differences than there really are. The critical borders they imply to have now been crossed or have at least become somewhat blurred are still impenetrable for technology. Therefore, these terms obfuscate more important differences between human and machine, or, to put it slightly less radical, human reason, understanding and thinking and algorithmic processing. Thus they allow for an anthropomorphization of technology: something postdigital is potentially closer to us than the digital, subsymbolic processing might be closer to human thinking than pure symbol manipulation. The authors cited thus far might not have argued in this direction, but others, like Luciano Floridi, have: “We are now slowly accepting the idea that we might be informational organisms among many agents, inforgs not so dramatically different from clever, engineered artefacts”, portraying AI as a fourth revolution after Copernicus, Darwin and Freud (Floridi 2008, p. 651). I go without citing some of the even more radical so called silicon valley ideologues who still argue, like some already did in the 60s, that humans like AI are essentially nothing more than input-output-processors. But perception is more than being fed pre-defined data which can only be accepted or rejected, thinking and pondering is more than data-processing, and acting on the basis of giving and taking reasons (Nida-Rümelin 2020) is more than calculating a certain behavior.

Bajohr, Hannes (2020): “Die ‘Gestalt’ der KI. Jenseits von Holismus und Atomismus”. In: Zeitschrift für Medienwissenschaft 2/2020, pp. 168-181. [↩]
Bourbaki, Nicholas (1950): “The Architecture of Mathematics”. In: The American Mathematical Monthly 57. 4, pp. 221-232. [↩]
Bucci, Wilma (1997): Psychoanalysis and Cognitive Science: A Multiple Code Theory. New York: Guilford Publications. [↩]