Skip to main content

UNREASONABLE CHARACTERS

Tyler Shoemaker

 

Graphic muddle is the death of real characters.
– Rhodri Lewis1

The real characters of the Unicode Standard are in a class wholly of their own. Unicode’s authors follow a practice that defines modern character encoding: when they assign numbers to the individual elements of writing systems, “transform[ing] characters from being a random collection of bits into things of meaning,” they only specify the semantic information of those elements.2 They do not specify what those elements look like. Strangely, encodings like Unicode draw a basic distinction between writing’s semantic units, or characters, and the graphic forms of writing, its glyphs, and then bracket the latter. The American National Standard Code for Information Interchange, a precursor to Unicode, is exemplary of the practice. “No specific meaning is prescribed for any of the graphics in the code table […]. Furthermore, this standard does not specify a type style for printing or display of the various graphic characters.”3 A character is an “atom of information,” writes Dan Connolly, of the World Wide Consortium. “Note that by the term character, we do not mean a glyph […]. It is typically a symbol whose various representations are understood to mean the same thing by a community of people.”4 The systems and standards responsible for managing our fonts, our emails, and yes, our emoji, are structured by a pervasive design principle that stays agnostic about how computers render writing on screen. Encodings simply ratify a character’s existence, giving it place in a series but little more.

By the end of the 1960s the character–glyph distinction had become standard for rendering digital text. But Unicode, far more so than other encodings, “takes the trouble” to define what it means by characters and glyphs, formalizing this distinction in a manner that had hitherto remained implicit.5 “Characters are abstract representations of the smallest components of written language that have semantic value,” the standard reads. “Glyphs represent the shapes that characters can have when they are rendered or displayed.”6 On the basis of these two definitions Unicode has grown to be immense. Maintained and published by the Unicode Consortium, a nonprofit comprised mostly of academics, tech industry workers, graphic designers, and the occasional policymaker, the standard provides support for nearly every writing system in use today, adding to this repertoire many historical scripts and a vast array of graphical symbols; its sixteenth version, released in 2024, totals 154,998 characters.. The Consortium’s work extends beyond its public image as gatekeeper for new emoji to ensure that writing systems spanning alphabets and abjads, logograms and cuneiform are all representable as digital text. “Everyone in the world should be able to use their own language on phones and computers,” the Unicode website declares, updating its more imperializing slogan from the 1990s: “When the world wants to talk, it speaks Unicode.”7 And so the world does, at least judging by the fact that, since its first release in 1991, the standard has become the predominant way of managing textual interchange on information technologies around the globe.

Given Unicode’s breadth, my topic is the challenge of rendering the standard and, more importantly, what this challenge shows about the gradual accumulation of disparate writing systems into one selfsame scheme. This challenge is practical and technical as much as it is conceptual, compounded by the fact that Unicode itself is a rendering technology that makes writing digitally renderable through its “superset” approach.8 Martha Lampland and Susan Leigh Star have observed that standards tend to nest within one another, and this phenomenon is particularly evident in Unicode, for its authors populated early versions of the standard with material from old and alternative encodings.9 Much as Bernard Dionysius Geoghegan has theorized in the context of digital image formats like JPEG, Unicode relies on a “network of standards and formats” that “cloak[s] contentious matters of representation in an aura of technical neutrality.”10 The standard, in other words, is by no means revolutionary material – political, technical, or otherwise. And indeed, one of its co-creators, Joseph Becker, once likened the adoption of Unicode to a Gorbachevian perestroika (перестройка), a restructuring of systems through trade and technological negotiations that (to borrow again from Geoghegan) gradually inches its way toward the “technical proceduralism so prized by liberal democracy.”11 The challenge of rendering Unicode is to render this progression in a way that balances the Unicode Consortium’s aspirations to universality with the contradictory representations such aspirations necessarily entail. Unicode is at once locally variant and globally consistent, a network of standards and a single system, old and new, all text and none. A full rendering of the standard would therefore involve depicting how these contradictions shape the very elements of writing that Unicode encodes.

The question is how to do it. A common course of action is to counterpose plain text with text that is rich. Unicode’s authors suggest doing so; technical discussions, journalistic explainers, and historical accounts will often follow suit. Plain text “is a technical term that refers to data consisting of characters only, with no formatting information such as font face, style, or positioning.”12 Plain text is “data that is only characters of readable material.”13 It “means a file containing nothing but miles and miles of text without the slightest bit of markup and without a single binary character.”14 Rich text is everything else: style, not content, matters beyond the purview of standards committees and their extended deliberations.15 Though the distinction between plain and rich text extends much further back into the history of computing, Unicode’s authors will themselves take the trouble to provide their own definition for the distinction, just as they do with character and glyph, and they append this discussion with a statement marked by unusual emphasis: “Plain text must contain enough information to permit the text to be rendered legibly and nothing more.16 This minimalism is key, they explain, because Unicode “encodes plain text.”17

But the standard doesn’t encode plain text. It encodes characters. Or if, at this moment, its authors mean to draw an equivalence between plain text and characters, the complementary half of the resultant analogy – rich text paired with glyphs – can’t hold. In fact, the entire construction quickly loses its balance and folds in on itself, because plain text is always already comprised of glyphs. As Johanna Drucker might remind Unicode’s authors, “letterforms (graphical expressions) and alphanumeric code (discrete elements of a system) should not be confused.”18 If characters are pure semantic information, abstract representations, they are not to be found in plain text any more than in text that is rich.

What this analogy’s failure indicates is that text neither rich nor plain can render how Unicode works in the abstract and, what is more, how it is that Unicode concretely works atop past encodings and their own contentious matters of representation to assemble the variety of writing systems it has come to support. While Dennis Tenen has powerfully demonstrated how an appeal to plain text can wedge open the “recondite surfaces” of computers to reveal the “lattice” of hardware, data, formats, and policy decisions that shape digital inscription, the character–glyph distinction underpinning Unicode sits athwart the plain text “frame of mind” Tenen would have his readers cultivate.19 Again: the real characters of the Unicode Standard are in a class wholly of their own; and rendering them with the same scope Tenen’s frame of mind encompasses requires a separate account with a different orientation toward what is and isn’t legible in digital text.

Accordingly, this is an account that centers on character and glyph, and it looks back to how, for Unicode, the distinction between the two hinges on design principles engineers at the Xerox Corporation established during the company’s foray into early desktop publishing during the 1970s and 80s. In parallel, I look to a handful of contemporary artists making interventions into Unicode. By probing key aspects of the standard, from its technical implementation to its ghosts, its gaps, and its politics, these artists torque the character–glyph distinction in ways that challenge the apparent invisibility of characters and the lowly superficiality of glyphs. Unicode itself may not hold up as revolutionary material, but rendering the standard with reference to these interventions and its roots in Xerox will mark where, with Unicode, there remains a potential to subvert universalizing claims on reasonable representation – and where in fact the network of standards, formats, design principles, and contingencies now nested within Unicode predispose it to this potential from the start.

 

Patched and Patchy Encodings, 1963–1991

Histories of Unicode often begin with the thicket of character encodings that predate the standard. The trajectory of these encodings follows what Brian Lennon has called a “technical allegory of postwar development.”20 When the original Unicode authors published their first version of the standard in 1991, they “aim[ed] to freeze” a proliferation of variant encodings that began in the 1950s.21 Those years saw an erosion in the dominance that extant encodings had otherwise maintained on international telecommunications networks. American and European standards setters had begun to grow dissatisfied with the relatively limited set of code points offered by standards like the International Telegraph Alphabet No. 2 (ITA2). The first stirrings of this dissatisfaction began in late 1956, and by 1958 the International Telegraph and Telegraph Consultative Committee (CCITT) was openly exploring ways to extend the ITA2, or to replace it altogether.22 At the same time, electronic data processing technologies were becoming commercially available. Early computer manufacturers similarly found the ITA2 to be too limiting for their new forms of data storage and transmission, but unlike standards bodies, industry response to these limits was less deliberative than cavalier. Many manufacturers outright ignored the ITA2, preferring instead to develop their own in-house character encodings for use on their machines.

Subsequent encodings generally stem from one of these two responses, with both ultimately retrenching American English in electronic communication. In a development that Daniel Pargman and Jacob Palme term “ASCII imperialism,” an American-led effort successfully pushed ASCII, or the American Standard Code for Information Interchange, through numerous working groups and subcommittees convened by both the CCITT and the International Organization for Standardization (ISO) in the early 1960s; after 1963, the American standard was to be the model for replacements to ITA2.23 With this agreement in place, European standards setters published their ASCII variant, ECMA-6, in 1965, while the ISO followed suit with ISO/EC 646 in 1967. But despite Europeans’ consensus around ASCII, a significant problem with the standard remained. ASCII’s original 128 code points provided insufficient coverage for European writing systems, which require additional space to support accented characters, orthographic elements, and other graphical symbols. (Writing systems that do not use the Latin alphabet hardly registered in these considerations, if at all.) Eventually ASCII would expand to cover 256 code points in the 1980s, twice its original size, but until this time standard setters would need to adapt the American standard to their own writing systems. They did so by “patching” portions of ASCII with new characters, re-encoding code points with new assignments to explode the standard into a kaleidoscope of “national use” variants.24 This, Lennon argues, ultimately made variants “bilingual” and “serial in nature.”25 Each represented one writing system plus American English, with the latter working as a point of translation from one variant to the next.

ASCII would become the backbone of global telecommunication networks, but at times the standard provided an only marginally better point of reference over that of the meshwork of guesses users made when testing a message for a possible encoding. Without having this knowledge in advance there was no reliable way to determine which set of code points in a message represented what characters – beyond simply trying out encodings in succession. A successful guess would show the message as its sender had written it; unsuccessful messages would produce garbled text, or so-called mojibake (文字化け). And developments in the computing industry only made the situation worse. By the time the ISO published ISO/IEC 646 in 1967, some 60 different encodings were circulating among American computer manufacturers.26 Just as with national use variants of ASCII, these encodings led to intractable compatibility issues between different computers; machines from one brand could not easily pass data to those of another. In the American context the situation provoked a response from on high: Lyndon B. Johnson signed a mandate in 1968 requiring all computers purchased by his federal government – then a major consumer in the computing market – to be compatible with ASCII.

From here, the predominant historiographic strategy is to make one’s way toward Unicode via further shifts in code tables. The outsize influence American computer manufacturers had on international markets further solidified ASCII’s position worldwide, and multiple accounts will follow ASCII beyond the US and Europe, doing important work to locate where the assumptions of Latin-alphabetic text fall short of the needs of other writing systems. For example, one way to narrate the intervening years between ASCII and Unicode is to reconstruct, as Jing Tsu has done, the problem of “Han unification,” a choice Unicode’s authors made to merge groups of Han script variants into single characters.27 Though the responsibility and ensuing controversy of Han unification lies with these authors – they made their decisions with little sensitivity to the historical and cultural significance of individual variants – Tsu also shows that the idea of merging variants has its origins in discussions begun by members of the US-based Research Libraries Group in the 1960s. Meanwhile, mandates like the one Johnson signed only enforced a minimum level of compatibility requirements, leaving ample room for proprietary schemes and modifications to ASCII. Other accounts of Unicode will in turn track how several national use variants are the product of industry forging ahead of national standards bodies to impose their own encodings on users.28

But for the focus these accounts train on the sociotechnical dynamics locked up in a single set of code points, they tend to miss how such dynamics also suffuse the other half of the character encoding dyad, the glyph. Unicode is far more entangled with the interfaces and rendering procedures of text technologies than a narrative told only through characters can convey. It is in this respect that computer manufacturers’ efforts to extend ASCII for want of more code points is significant: they did so because they needed encodings that could support software applications built for a growing class of graphical user interfaces (GUIs). Word processing applications were a particular focus for early GUIs, and companies were keen to market this software to multinational corporations. It is from this nexus of GUI-based word processors and corporate-driven multilingualism that key parts of Unicode’s design principles emerge.

 

Global GUIs and Corporate Multilingualism

A major point of interchange in this nexus is the Xerox Star. Released in 1981, the computer built on Xerox’s experimental Alto system to introduce the desktop metaphor to mass-market computing, using a GUI that displayed simulated icons, folders, and application windows. More than a design principle, the Star’s desktop metaphor was also literal. While, as Lori Emerson explains, Xerox designers intended the Alto’s GUI logic to act as a “metamedium” for creative thinking, they envisioned the Star to be an integrated system for office work.29 The Star was a corporate machine from the start, and its peripherals – a keyboard, a novel pointing device called a mouse, an Ethernet connection for sending email, and hookups for Xerox’s line of laser printers – were accordingly meant to work in concert with the machine’s software to suture the daily operations of corporate environments into PC technology.

Office communications were a primary selling point. Touting the machine’s word processing abilities, one user manual boasts how, with the Star, a “document ‘looks’ the same whether it is displayed on a workstation screen, typeset on a typesetter, or printed on any number of electronic laser printers.”30 “[W]e were all about getting beautiful things printed,” recounts David Liddle, head of development for the Star. “It was all about fonts and typesetting of equations and all that sort of stuff.”31 Usability was consistency was beauty, and at every turn Xerox meant to keep “anything except Xerox equipment” locked out of this ecosystem.32 The Star and its peripherals promised to ensure standardized and efficient print materials across all scales; any serious company would quickly recognize that consistent documents are vital for everything ranging from internal memos to multinational coordination. Indeed, the Star was (as one brochure puts it) designed especially with “today’s cosmopolitan business climate” in mind.33

In light of this climate, ads and manuals for the machine often remark on the necessity of multilingual communication, which the Star supported with a novel keyboard interface. A 1978 memo from Xerox proposes the idea. The Star would have a “soft” keyboard that, when activated, would appear onscreen to show a graphical version of the keyboard directly at a user’s fingertips.34 The memo suggests displaying the soft keyboard “as close to the physical keyboard as possible,” for the former would display alternative mappings for each of the latter’s keys, replacing the standard QWERTY layout and its alphanumeric entries with new glyphs from other writing systems. Letters ranging from Greek to Arabic would be supported, along with various pictorial and mathematical symbols; the Star’s soft keyboard would also make a limited set of kanji characters available and hanzi (promised the memo) was shortly on the way.35 In an operation that is now standard across most modern word processors, pressing the Star’s physical keys with its soft keyboard activated would produce these remapped glyphs in a typesetting window, rather than the alphanumerics and punctuation marks painted on the physical keyboard’s key caps.

Fig. 1: Proposed screen layout for the Xerox Star’s soft keyboard. David C. Smith, “An Idea for Using the JDS ‘Soft’ Keyboard” (Xerox Corporation, 1978).

This functionality is a key template for Unicode. Had a playful Xerox designer wiped away every possible mapping of the soft keyboard to show its underlying grid, they would have left behind an ideal model for the standard’s relationship to glyphs. If for Liddle, the Star’s typesetting was “all about fonts,” at present the computer’s capabilities would need to accommodate the continued expansion of Unicode, which for years has outstripped the number of characters any one font file can support; it requires whole collections of subfonts to render its full repertoire instead. Failure to match character with glyph will summon an echo of the Star’s soft keyboard grid: □, or glyph not defined, a “fallback” glyph that modern fonts use when they cannot render Unicode characters.36 The newest emoji will often take the shape of this glyph until software updates sweep it away. Yet □ is increasingly a core feature of Unicode itself because every new version of the standard introduces the risk of proliferating this glyph even further. “Language is a □□□□□□,” reads the title of a piece by net artist Canek Zapata.37 Its contents ostensibly feature snippets from William S. Burroughs’s essays on what Burroughs called the “word virus.” But as the web page loads, Zapata’s code randomly remaps these snippets into boxy, unreadable glyphs, rearranging them into quasi-architectural patterns, all vaguely reminiscent of the descending sprites in the video game Space Invaders. “Hemos recibido señales intermitentes de un punto muy definido de la galaxia [We have received intermittent signals from a very definite point in the galaxy],” a comment in Zapata’s webpage source explains. “Apartir de la primera decodificación las computadoras no han dejado de generar estos mensajes encriptados [From the first decoding computers have not stopped generating these encrypted messages].”

Fig. 2: Canek Zapata, “Language is a □□□□□□” (2022).

Unicode’s twin definitions of character and glyph are especially evident in the difference □ draws between an idealized code point and its graphic exemplars. Glyphs may come and go, font styles might change, Xerox designers past and present could remap the Star’s soft keyboard ad nauseum, but all the while the boxes into which the Unicode Consortium slots the world’s writing systems will remain invariant. In this sense, every Unicode character renders as □, for a character’s semantic content ultimately boils down to whether that character is addressable in and for computation, and little more.38 □ emblematizes a spreading sameness, which lies beneath any technical systems build atop ASCII imperialism – making it more of an apt representation, perhaps, of Unicode’s design philosophy than the trebled sense of uni- its authors intended that prefix to convey. In a 1988 draft proposal for what would become the standard, Joseph Becker writes, “The name ‘Unicode’ is intended to suggest a unique, unified, universal encoding.”39 To which Zapata might reply, “Unicode is □□□□□□□.”

The Star’s relationship with Unicode extends beyond visual templates, however. Becker himself was employed at Xerox when the company released the PC, and the soft keyboard that Xerox shipped it with built on technology he and his colleagues developed. The 1978 memo above credits the idea of dynamically displaying key remappings to the work Becker’s team did on the Japanese Document System (JDS), a word processor. As part of Xerox’s incursion into the East Asian consumer market during the 1970s and 80s JDS researchers devised methods for encoding and rendering thousands of Han characters on Xerox machines. They worked on the one hand from the same PC prototypes that eventually led to the Star; and on the other they drew from multi-byte alternatives to ASCII, like the two-byte Japanese Industrial Standards (JIS) encoding, JIS C 6266 (now JIS X 0208).40 As opposed to ASCII’s single-byte scheme, alternatives like the JIS encoding represented characters using sequences of bytes, which afforded designers far more room for encoding than a single byte can provide. Whereas ASCII forced engineers to develop “horrendous ‘extension’ contrivances” when shifting between writing systems, the extra space of expanded schemes could keep things simple: each character had one unique sequence, forever and always.41 Xerox would develop its own multi-byte encoding around 1979, calling it XCCS, the Xerox Character Code Standard; the Star used it, as did the network protocols Xerox entwined around the PC to keep the machine locked into a proprietary hardware stack.42 Peripherals like a keyboard may require virtual remappings for multinational corporate work, but with multi-byte encoding the underlying points of reference for managing the PC’s bitstream would never need to change.

The idea of multi-byte encoding stuck with Becker, and in the mid-1980s he would find common cause with software engineers Mark Davis (of Apple) and Lee Collins (first at Xerox, then Apple) to develop a new, generalized standard. The three of them “mined” the XCCS and other encodings for materials to supply the first version of Unicode, which adhered to the principle of maintaining an invariant mapping between code points and characters.43 Becker partly overviews these developments in his 1988 draft proposal, though the document is primarily a position statement that outlines the need for Unicode. The core ideas are all there: the system would expand multi-byte encodings to a global scale by splitting characters from glyphs, consolidating Han script, supersetting ASCII, and assigning fixed 16-bit sequences for the representation of characters. Since then Unicode’s bit size has expanded to 21, but in 1988 Becker touts, “65,536 distinct code points” would be available to users the world over.44 With Unicode, software engineers wouldn’t need to shift between one and encoding and the next, eliminating many of the troubles that otherwise led to garbled mojibake; they could rely on a “[f]ixed one-to-one correspondence” between every code point in the standard and the “world’s writing systems” instead.45 Here, finally, is the emergence of Unicode in a form still recognizable today. Yet the whole thing only works, Becker underscores, if the standard’s authors can refrain from building it on the basis of “unreasonable definitions of ‘character.’”46

 

Reasonable Characters

Just what constitutes an unreasonable character will be my primary concern from this point forward, for in many respects the distinction between reasonable and unreasonable mirrors character and glyph. But in contrast with the explicit definitions Unicode’s authors provide for character and glyph, or for that matter plain text and rich, a precise definition of reasonableness has been a matter of debate since the standard’s inception to present-day deliberations at the Unicode Consortium. This debate is most publicly evident around the question of new emoji, especially with regard to emoji skin tone and gender modifiers.47 But it also bore directly on the problem of Han unification that Tsu catalogues: making room for every single variant (so Unicode’s authors decided) would be unreasonable.48 From the earliest versions of the standard, one of the central activities at the Unicode Consortium has thus involved weighing candidate characters’ reasonableness.

Those hoping to propose new emoji might look to Adriana Ramić’s pamphlet, Unicode Power Stones: A Collector’s Guide, for examples of highly reasonable characters.49 Up in the left-hand corner of her page spreads are large, pixelated symbols ranging from the Euro and copyright signs to braille, ligatures, and logographs. Ramić renders them with GNU Unifont, a blocky, fixed-width font that at present provides the fullest possible support for Unicode within a single file. The style of GNU Unifont reads as plain text in the sense many would recognize it, though these rendered symbols contain no characters, only glyphs. Instead, the pamphlet’s characters sit below each symbol, in tables of metadata that enumerate different “character properties” defined by the Unicode Consortium when it adds new entries to its standard.50 Properties include character names, the code blocks in which characters will be located, different flags denoting their reading order and casing, and information about the script systems to which they belong. Tacit in this metadata is a definition: a reasonable character is one whose properties completely and unambiguously conform to every entry in the table. Each table counts the “semantic value” of the “smallest components of written language” – but it is their topmost entries that chisel out the animating pun of Unicode Power Stones.51 For every one of Ramić’s characters, as with all Unicode characters, there is a corresponding ‘hex’ number, a sequence of hexadecimal digits representing the code points to which characters are assigned when the Consortium introduces them into Unicode. U+0107, U+0A10, U+2318: once the Consortium assigns these sequences to characters, they are transmuted – as if by magic, as many popular narratives about Unicode would have it – into ć, ਐ, and ⌘.

Figs. 3a and 3b: Adriana Ramić, Unicode Power Stones: A Collector’s Guide (2015).

A hex of this sort is forever binding, because once the Consortium assigns a character that character’s place is effectively set in stone. Its place, however, will be a wholly abstract one; any legible traces of a character will instead take the form of a glyph. Ramić literalizes this with an eponymous exhibition piece, which features rows of speckled pebbles, each polished and engraved with a symbol. These glyphs ballast her abstract characters. In her pamphlet, Ramić gives the latter further weight by printing found art examples to the right of every character table. If the metadata in these tables spell out the semantics of Unicode characters, these examples and their stony manifestations embed characters within history and language use: as glyphs on pages, advertisements, coins, and computer screens. Visual distortions underscore the divide. The compromise of GNU Unifont’s extensive coverage are the small, low resolution bitmaps that constrain its glyphs to coarse outlines and rough spatial distinctions. In Unicode Power Stones Arabic and Bengali glyphs suffer especially from this, some so much so that their digital renditions hardly align with the examples Ramić provides. Such is the direct result of reasonable characters: their properties only promise that they will be computationally addressable and make no guarantee for reasonable representation.

So too are GNU Unifont’s coarse glyphs suggestive of the typographic biases that stretch back to ASCII and beyond. As Tsu’s account, as well as work by both Thomas Mullaney and Fionna Ross, shows, coercing various writing systems into the assumptions of typographic print often involves lossy compression and noise.52 Multiple artists working with Unicode have clearly caught on these typographic constraints as well, Zapata and Ramić included. But in a key way, text rendering with Unicode is also modulated by another paradigm of print: xerography. Zapata’s use of Burroughs and the cut-up, as well as Ramić and her collages, are both in resonance with xerography, and the word processing conventions of the Xerox Star borrow directly from it. If Unicode Power Stones shows a collection of characters at their most reasonable, returning once more to this machine and its xerographic logics traces a path from that reasonableness over to the many unreasonable characters in Unicode.

 

Xerographic Encoding

The Star featured an early version of WordArt. To “alleviate the ‘white space’ feeling” of a blank page when starting on corporate documents and conference papers, the machine came with an inventory of graphical “transfer symbols.”53 Once a user had selected a symbol from the inventory they could move it anywhere in a typesetting window and then alter its size, proportions, and “appearance properties” to fit the stylistic conventions of their documents; cut/copy/paste functionality enabled duplication. By these means (brochures again promised) a corporate office could eliminate its printing house. With the Star, any employee would be equipped to make “arbitrarily beautiful typeset documents.”54

While a transfer symbol’s stylistic modulations do share much with typography, the Star’s designers conceptualized this functionality in terms of xerographic media. In a retrospective about the machine, members of the original design team tie transfer symbols back to dry-transfer sheets, like Letraset. The symmetry is highly fitting. Starting from the 1960s, professional and amateur designers turned from traditional typographic materials to dry-transfer lettering, which could be applied anywhere on a page by burnishing the back of a symbol to activate its adhesive. Requiring little overhead and no special printing house machinery, dry-transfer offered an extensive range of typefaces and other graphical symbols, and designers used these materials to style text on documents spanning ad mock-ups and technical drawings, concert fliers and concrete poetry. Much as Kate Eichhorn has written that xerography “changed who could be an active participant in the making of culture” by opening print production to anyone with access to a copy machine, dry-transfer opened the practice of graphic design to a broad swath of amateur practice.55 Marketing materials for the Star are the corporate double of this ethos, a doubling made especially over-determined by the point of dissemination for “liberated” Letraset.56 For when it came time to publish dry-transfer, the parallelism between dry-transfer lettering and xerography converged outright: designers would use Xerox machines to copy and mass-produce their designs.

From the vantage of Unicode, the Star’s transfer symbols introduce another tacit distinction between character and glyph, one Unicode’s authors would absorb and formalize when designing the standard. Much Unicode-enabled text continues to reflect this deep connection between xerographic media and modern character encoding. The connection inheres, for example, in what Drucker has called the “menu bar” mentality of computer graphics. She argues that free-form pastiches styled on a whim enforce the idea that characters can be freed from glyphs and given “ontologically self-sufficient autonomy.”57 But the connection abides even more strongly in glyphs. Layout composition with Letraset was highly variable, and people leveraged this variability to render text with composite forms. Speaking in an interview about his use of Letraset, one graphic designer recalls “[r]unning out of [dry-transfer] letters late at night and having to make new letters out of existing letters” by splicing the latter apart and re-composing them into new glyphs.58 “Frankenstein” letters, the designer calls them; the concrete poet Kate Siklosi, whose practice involves extensive use of Letraset, might name them “newly created knots.”59 Either descriptor would be apt for the way contemporary word processing software renders Unicode characters into glyphs – something upon which a subset of artistic interventions into Unicode has also seized.

Works in this subset set out from the free play of graphic re-composition. Unlike how, as Matt Applegate argues, digital image-texts like ASCII art “conscript” text “into a visual signifying regime” of the plain text grid, these pieces echo the free-form pastiches of dry-transfer in a combinatoric play with the plenitude of scripts in the standard.60 The focus is on de-picting glyphs as such, shorn from semantics. Another of Zapata’s pieces, “cuatro caracteres,” is exemplary of the practice.61 It retrieves random characters from Unicode and then relies on nearly 80 different fonts to render them in a web browser. In a subversion of the plain text grid, Zapata positions these glyphs in each of the four corners of a small HTML text container, where the fonts take care of the rest: every one follows its own rendering instructions to hint, kern, shift, and slide glyphs into an erratic, multi-scriptural paste-up. Here, the fixed-width squares that latch ASCII art into pixelated images, or that constrain GNU Unifont to its coarse renditions, dissolve into the misaligned edges of digital dry-transfer.

Figs. 4a and 4b: Canek Zapata, “cuatro caracteres” (2021).

A similar technological setup comprises the basis for Daniel Temkin’s series, Unicode Frenzy.62 In an echo of the technique of transfer drawing the code artist queries Unicode to paste handfuls of random, semi-opaque glyphs atop one another in a web browser, where the curves of Tamil script arc into the corners of Hangul and various accents pepper Basic Latin. Other combinations abound. For the first piece of the series Temkin keeps every glyph aligned to center. Amid their darkened overlaps a hint of some new, gestaltist shape emerges, a Frankenstein glyph as of yet unassigned in Unicode. Another separate piece, “Unicode Compressure,” works in a similar fashion but adds a fast-running counter.63 Ticking glyphs’ resolution down pixel by pixel, Temkin forces web browsers to render them in an ever dwindling space; quickly enough they succumb to aliasing rot, looking like they’ve passed through too many runs in a Xerox copier. If, as Eichhorn remarks, degenerative xeroxing “forc[es] the eye out of its print-culture-induced trance” and “cools” “hot” copies into an alien medium, at some point the pressure becomes too much for the glyphs in “Unicode Compressure”: they heat back up, until they ultimately explode into the glow of an unreadable image.64

Figs. 5a and 5b and 5c: Daniel Temkin, “Unicode Compressure” (2014).Temkin’s implosions intimate a tendency towards unreadability in Unicode, something two other pieces especially draw out. Laimonas Zakas ran Glitchr from 2012–2014 on Facebook and Twitter (now X).65 The project exploits the many “combining characters” standardized by the Unicode Consortium, which are normally meant to modify the glyphs of other “base” characters; they are to be entered before or after a base character, whereupon word processors then render the two as one by pasting them together.66 But Zakas isolates combining characters and stacks them together into noisy, asignifying patterns. When rendered on their own, the resultant composites corrupt the text rendering capabilities of social media platforms, “reintroduc[ing] the inherent variability of linguistic products” into the neat boxes of posts by injecting xerographic smears and speckles.67 So-called “Zalgo” text is Glitchr’s pop variant. It crops up in horror memes to signify glitch and communicative breakdown with asemic compositions. And after the manner of dry-transfer’s wide applicability, numerous text generators are available online to create, copy, and paste these composites wherever one pleases. □ may emblematize the spreading sameness of Unicode’s prioritization of characters, but Zalgo text and Glitchr inject word processors with its counter, a Frankensteinian word virus that spawns perpetual differentiation.

Fig. 6: Zalgo text from Max Woolf’s Big List of Naughty Strings, a repository of Unicode characters and other sequences that tend to break text interfaces. “Big List of Naughty Strings” (2021).

The resemblance these pieces share with xerographic artifacts and noise is at once superficial and deeply confluent with Unicode. The errant renderings of glyphs on computer screens merely look like exceptionally messy xerographic prints – yet they also animate some of the oldest graphical dynamics of the standard, rooted as those dynamics are in the word processing conventions of the Xerox Star and its transfer symbols. In this sense the transfer symbol is paradigmatic for the way Unicode has made individual elements of writing systems digitally renderable: one of a vast inventory of disparate entries, many formerly print-bound, every character in the standard is now addressable, modifiable, open to anyone to copy and paste at their leisure and then spread on screen after screen in a continued proliferation of “culture degree Xerox.”68 To the extent that the standard’s abstract character’s are zero-graphic, the glyphs that render them are xerographic.

 

Graphic Muddle, Real Characters

What this re-rendered distinction between character and glyph should make clear is that the one-to-one mapping of code point and character need not “preclude the potential for enjoyable ambiguities or complete misreadings,” as the asemic writer Tim Gaze once complained.69 Zero-graphic characters and xerographic glyphs still very much afford this potential; that many encodings prior to Unicode tacitly defined glyph by shunting it into the category of anything “understood by the users” unintentionally suggests as such.70 But nor should this potential preclude a view of the nested sets of technologies, standards, design principles, and politics that have shaped Unicode – often with a normative edge – from the start. Again: the question is how best to render all such dynamics of the standard at once.

For my purposes, a final consideration remains regarding whether certain characters in Unicode might themselves render what I have outlined. Which (if any) Unicode characters capture the aesthetic effects of the character–glyph distinction and, at the same time, index the technological history of graphical computing within the standard? □ is one. But □ is also something of a midway point in the broader dynamics of Unicode, the extremes of which are better exemplified by two other classes of characters. It is with reference to these exemplary characters that I will complete my rendering.

First: a backtrack to the combining characters exploited by Glitchr and Zalgo text. While net artists may use these characters to torque Unicode’s underlying logic, Frankensteinian composites are actually the norm for many scripts in the standard, not the exception. When Unicode’s authors assign code points to non-typographic writing systems, they often break characters into smaller, non-lexical pieces and encode only those, leaving word processors to piece characters back together in a combinatoric fashion. Asemiosis, in other words, is the price of entry to Unicode for many writing systems. Becker’s draft proposal for the standard partly advocates for this approach, and when it has been implemented by the Consortium it has resulted in open graphic forms, quasi-semantic units that are all perfectly renderable without the glitchy exploits of Temkin or Zakas. Language artist Sujin Lee, for instance, simply shows them amid halting frame changes. Her ongoing video series, Ah Ahk Ahk Aht Ahn, works its way through every consonant–vowel combination in Hangul, pasting them together like so many Letraset symbols. The keyword is every: this is Hangul as Unicode’s authors have encoded it in the Hangul Jamo table, a set of code points that represent the base elements of Korean syllable blocks. Over the course of several videos, Lee intends to display all 11,172 combinations, their time onscreen corresponding to how long it takes her to voice them. But the majority of these glyphs are “theoretical combinations for a written language,” never used by writers to represent Korean.71 With the code points of Hangul Jamo making no distinction between extant and imagined couplings, Lee is free to create new, not quite sensical glyphs from the decomposed characters supported by Unicode.

Figs. 7a and 7b and 7c: Sujin Lee, Ah Ahk Ahk Aht Ahn (2015).

In doing so, Ah Ahk Ahk Aht Ahn makes my first class of exemplary characters legible. They are otherwise asemic – and in any case highly xerographic. Unlike the components in Lee’s videos, however, these ‘ghost characters’ (幽霊文字) do not represent any language, nor are they in any writing system.72 The Unicode Consortium added them to the standard in 1993 while expanding the standard’s repertoire to include JIS X 0208 (the very encoding that influenced Xerox’s XCCS). The JIS encoding featured all commonly used kanji, and to these it added hundreds of unique proper names from across Japan. Its developers sourced these names from photocopies of insurance records and municipal documents – some blurry, darkened, or otherwise marked by xerographic artifacts. In one instance, 山 and 女 were glued together to represent 𡚴 in a record, but the overlapping edges of paper on which these glyphs were printed created a line in the photocopy.73 This led to 妛, a non-lexical glyph after the manner of Lee’s videos, only renderable by word processors because JIS developers misinterpreted the photocopy as a real character and added it to their standard, after which it eventually ended up in Unicode as well; the Unicode Consortium would not encode 𡚴, the intended character, until some years later. But 妛 and other ghost characters would also remain in Unicode to “haun[t]” the standard “with the literary inefficiency of writing,” for again, if a proposed character passes through a successful vote at the Consortium, the promise of Unicode is that this character will occupy a place in the standard all but indefinitely.74 Engraved in Unicode, like the pebbles in Unicode Power Stones, are the following perfectly reasonable characters: 妛挧暃椦槞蟐袮閠駲墸壥彁.75

Fig. 8: The origins of 妛. Documented by Sohji Shibano for the 1997 version of JIS X 0208. The broken line in the middle of the top glyph is a xerographic effect.

In one sense, ghost characters are glyphs in their purest form, altogether detached from semantics. And yet in another, they embody the differentiation to which Unicode’s definition of character aspires; none of them may stand for other characters in the standard. They are unique in the way that Becker, in his proposal for Unicode, wants all entries in the standard to be, but they are also the graphic residuum that necessarily results from a design principle that renders character from glyph. Call these ghost characters one bookend that props up the logic by which Unicode makes writing computationally tractable.

The other bookend: my second and last class of characters. These are ones that are rendered by ‘homoglyphs.’ Homoglyphic characters have representations that are near photocopy duplicates of other characters supported by Unicode. Each of these “confusables” (as the Consortium calls them) stands for a unique, abstract entity in the same way that all Unicode characters stand for unique, abstract entities. But the semantic differences between homoglyphic characters render as minute, merely stylistic variation – if these differences render at all.76 Any digital font that supports them will consequently feature a conglomerate of font-like styles within a single typeface: A, 𝐀, 𝒜, 𝔸, 𝙰.77 Characters like these invert the graphic individuation of their ghostly siblings to embody, instead, the arbitrariness of representation that Unicode’s character–glyph distinction enforces. Unlike □, however, the arbitrary relationship between character and glyph in this context only leads to over-encoding and polysemy. The homoglyphic characters in my second class are not asemic so much as they are suffused with semantics.

And they are everywhere. For years people have used homoglyphs to circumvent the rendering constraints of web forums and social media platforms, which tend to lock digital text into a single style with no additional options for formatting. This means no boldface, italics, or other rich text features that have been common to word processors since the early days of GUI computing with the Xerox Star. But with Unicode, this constraint is partial. Users intending to render their text with a desired look need only select a homoglyphic character from elsewhere in the standard, much as one would leaf through old Letraset sheets. In this way, users may transform their characters into “𝓑ӭ𝘢ʋⱦįᵮůɫ” glyphs (beautiful glyphs).78 Put another way, every time you read superficially 𝒶𝔢𑣁𝚝հ𝙚𝕥ιⅽ 𝔱ℯ᙮𝘵 (aesthetic text), you read writing that collapses the difference between rich text and plain. No special rendering tricks or glitches needed, just access to a font that supports even a modest fraction of the thousands of assigned code points in Unicode.

Otherwise, there is always □, a reliable fallback for when the rendering fails, glyph not defined. It best typifies the character–glyph distinction, that key design principle Unicode’s authors have employed to expand a half century’s worth of prior standards and encode writing systems the world over. Every other entry in the standard sits to one side of □ or another, augmenting the relation between character and glyph up to the limit points of ghosts and homoglyphs, bookends of the entire set. Between the latter two classes of characters there lies all of Unicode: the many other encodings nested in its ranks, the GUIs and corporate interests that first necessitated it, the contingent decisions that have allowed access of entry to some characters and withheld that access from others, plain text and rich text, reasonable characters, unreasonable ones, and ultimately the potential for aesthetic torsion that abides throughout. Rendered together, these elements are what put the real characters of the Unicode Standard in a class wholly of their own.


  1. Rhodri Lewis, “The Publication of John Wilkins’s ‘Essay’ (1668): Some Contextual Considerations,” Notes and Records of the Royal Society of London 56, no. 2 (2002): 142. 

  2. Ryan Gosling, quoted in Unicode Consortium, “Acclaim for Unicode,” September 2017, https://www.unicode.org/press/quotations.html#gosling

  3. National Bureau of Standards, “Federal Information Processing Standards Publication: Code for Information Interchange” (Gaithersburg, MD: National Bureau of Standards, 1977), 11. 

  4. Dan Connolly, “‘Character Set’ Considered Harmful” (Internet Engineering Task Force, May 1995), https://www.w3.org/MarkUp/html-spec/charset-harmful.html

  5. Yannis Haralambous, Fonts & Encodings: From Advanced Typography to Unicode and Everything in Between, trans. P. Scott Horne, 1st ed. (Sebastopol, CA: O’Reilly Media, 2007), 54. 

  6. Unicode Consortium, The Unicode Standard 15 (Mountain View, CA: Unicode Consortium, 2022), 15. 

  7. Unicode Consortium, “Unicode Homepage,” Unicode, 2022, https://home.unicode.org/; Unicode Consortium, “The Unicode Standard, Second Edition,” January 1998, https://web.archive.org/web/19980126155948/http://www.unicode.org/unicode/uni2book/u2.html

  8. Unicode Consortium, The Unicode Standard 15, 3. 

  9. Martha Lampland and Susan Leigh Star, eds., Standards and Their Stories: How Quantifying, Classifying, and Formalizing Practices Shape Everyday Life (Ithaca: Cornell University Press, 2009), 5–7. 

  10. Bernard Dionysius Geoghegan, “The Bitmap Is the Territory: How Digital Formats Render Global Positions,” MLN 136, no. 5 (December 2021): 1099. 

  11. Joseph D. Becker, “Unicode 88” (Mountain View, CA: Unicode Consortium, 1998): 4; Geoghegan, “The Bitmap Is the Territory”: 1093. 

  12. Jukka K. Korpela, Unicode Explained, 1st ed. (Sebastopol, CA: O’Reilly, 2006), 54. 

  13. Everest pipkin, “The Fuzzy Edges of Character Encoding,” Running Dog, 2020, https://rundog.art/issues/automateme/the-fuzzy-edges-of-character-encoding/

  14. Haralambous, Fonts & Encodings, 59. 

  15. This distinction is in many respects typographic at its core. See Kate Brideau’s discussion of styled letters and what she calls the “typographic medium” in Kate Brideau, The Typographic Medium (Cambridge, MA: The MIT Press, 2021). 

  16. Unicode Consortium, The Unicode Standard 15, 19; original emphasis. The full definition of plain and rich text in the standard runs as follows: “Plain text is a pure sequence of character codes; plain Unicode-encoded text is therefore a sequence of Unicode character codes. In contrast, styled text, also known as rich text, is any text representation consisting of plain text plus added information such as language identifier, color, hypertext links, and so on.” Unicode Consortium, “Unicode Homepage”, 18; original emphasis. 

  17. Unicode Consortium, The Unicode Standard 15, 19. 

  18. Johanna Drucker, “From A to Screen,” in Comparative Textual Media: Transforming the Humanities in the Postprint Era, ed. N. Katherine Hayles and Jessica Pressman (Minneapolis, MN: Minnesota University Press, 2013), 77. 

  19. Dennis Tenen, Plain Text: The Poetics of Computation (Stanford, California: Stanford University Press, 2017), 5. 

  20. Brian Lennon, In Babel’s Shadow: Multilingual Literatures, Monolingual States (Minneapolis: University of Minnesota Press, 2010), 168. 

  21. Lennon, In Babel’s Shadow, 169. 

  22. Erica Fischer, “The Evolution of Character Codes, 1874-1968” (Internet Archive, June 2000), http://archive.org/details/enf-ascii: 9. 

  23. Daniel Pargman and Jacob Palme, “ASCII Imperialism,” in Standards and Their Stories: How Quantifying, Classifying, and Formalizing Practices Shape Everyday Life, ed. Martha Lampland and Susan Leigh Star (Ithaca: Cornell University Press, 2009). See also discussions in Fischer, “The Evolution of Character Codes, 1874-1968”: 11-18; Charles E. Mackenzie, Coded Character Sets: History and Development (Reading, Mass: Addison-Wesley Pub. Co, 1980), 423-434. Rita Raley’s work on Global English is also indispensable for tracking the wider reverberations of ASCII imperialism as Pargman and Palme formulate it. See Rita Raley, “Machine Translation and Global English,” The Yale Journal of Criticism 16, no. 2 (2003). 

  24. Pargman and Palme, “ASCII Imperialism”, 158. See also Nicholas A. John, “The Construction of the Multilingual Internet: Unicode, Hebrew, and Globalization,” Journal of Computer-Mediated Communication 18, no. 3 (April 2013): 321–38: 324. Pipkin, “The Fuzzy Edges of Character Encoding” links to several example variants. 

  25. Lennon, In Babel’s Shadow, 169. 

  26. Dongoh Park, “The Korean Character Code: A National Controversy, 1987–1995,” IEEE Annals of the History of Computing 38, no. 2 (2016): 42. 

  27. Jing Tsu, Kingdom of Characters: The Language Revolution That Made China Modern (New York: Riverhead, 2022), 256-261. See also Lisa Gitelman, “Emoji Dick and the Eponymous Whale, An Essay in Four Parts,” Post45: Peer Reviewed, July 2018. 

  28. Andrew Hardie, “From Legacy Encodings to Unicode: The Graphical and Logical Principles in the Scripts of South Asia,” Language Resources and Evaluation 41 (2007); Park, “The Korean Character Code”. For a more general account of this phenomenon in the context of international standards, see JoAnne Yates and Craig Murphy, Engineering Rules: Global Standard Setting Since 1880 (Baltimore: Johns Hopkins University Press, 2019). 

  29. Lori Emerson, Reading Writing Interfaces: From the Digital to the Bookbound (Minneapolis: University of Minnesota Press, 2014), 61. For a discussion of how engineers at Xerox PARC used the Alto’s capabilities for early word processing software (which would later serve as the template for the Star’s software), see Matthew G. Kirschenbaum, Track Changes: A Literary History of Word Processing (Cambridge, MA: Harvard University Press, 2016), 122-127. 

  30. Xerox Corporation, Xerox Network Systems Architecture: General Information Manual (Palo Alto, CA: Xerox Corporation, 1985), http://www.bitsavers.org/pdf/xerox/xns/XNSG_068504_Xerox_System_Network_Architecture_General_Information_Manual_Apr85.pdf. Jacob Gaboury argues that establishing such continuity is a key contribution that the field of computer graphics makes to the object-oriented paradigm of computing. Jacob Gaboury, Image Objects: An Archaeology of Computer Graphics (Cambridge, MA: The MIT Press, 2021), 131. 

  31. David Liddle, “Oral History of David Liddle” (Computer History Museum, Mountain View, CA, February 2020), 10, https://www.computerhistory.org/collections/catalog/102792010

  32. Liddle, “Oral History of David Liddle,” 10. 

  33. Xerox Corporation, “The Xerox 8010 Speaks Your Language” (Xerox Corporation, 1982), https://www.digibarn.com/friends/curbow/star/3/index.html

  34. David C. Smith, “An Idea for Using the JDS ‘Soft’ Keyboard,” (Xerox Corporation, June 1978), http://20.69.243.200/pdf/xerox/sdd/memos_1978/19780602_An_Idea_For_Using_The_JDS_Soft_Keyboard.pdf

  35. Smith, “An Idea for Using the JDS ‘Soft’ Keyboard,” 2. 

  36. Bob Hallissy, “Unicode BMP Fallback Font,” SIL International, March 2012, https://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=UnicodeBMPFallbackFont

  37. Canek Zapata, “Language Is a □□□□□□,” 2022, https://canekzapata.net/language/

  38. Ranjodh Singh Dhaliwal has recently generalized the concept of addressability to a fundamental condition of all computation. Unicode would be an apt site for identifying this condition in digital text. Ranjodh Singh Dhaliwal, “On Addressability, or What Even Is Computation?” Critical Inquiry 49, no. 1 (September 2022): 1–27. 

  39. Becker, Unicode Consortium, “Early Years of Unicode,” March 2015, https://www.unicode.org/history/earlyyears.html, 3 

  40. Becker, “Unicode 88,” 2. 

  41. Becker, “Unicode 88,” 4. 

  42. Joseph D. Becker, “Re: Swift from Joe Becker on 2014-06-04 (Unicode Mail List Archive),” June 2014, https://www.unicode.org/mail-arch/unicode-ml/y2014-m06/0098.html

  43. Ken Whistler, “Unicode Mail List Archive: Re: Questions about Unicode History,” January 2002, https://unicode.org/mail-arch/unicode-ml/y2002-m01/0611.html

  44. Becker, “Unicode 88,” 4. 

  45. Becker “Unicode 88,” 2. 

  46. Becker, “Unicode 88,” 2, original emphasis. 

  47. Rose Roscam Abbing, Peggy Pierrot, and Femke Snelting, “Modifying the Universal,” in Executing Practices, ed. Helen Pritchard, Eric Snodgrass, and Magda Tyżlik-Carver, DATA Browser 6 (London: Open Humanities Press, 2018). 

  48. The TRON encoding is one example of a system that does not make the elisions that Han unification does, though such a system is not without its own pitfalls. Providing a code point for every variant tasks an encoding with expressing context with code points. Shigeki Moro, “Surface or Essence: Beyond the Coded Character Set Model.” January 2003, https://www.researchgate.net/publication/228985177_Surface_or_Essence_Beyond_the_Coded_Character_Set_Model

  49. Adriana Ramić, Unicode Power Stones: A Collector’s Guide, 2015. 

  50. Unicode Consortium, The Unicode Standard 15, 159–94. 

  51. Unicode Consortium, The Unicode Standard 15, 115. 

  52. Tsu, Kingdom of Characters; Thomas S. Mullaney, The Chinese Typewriter: A History (Cambridge, MA: The MIT Press, 2017); Fiona Ross, “Historical Technological Impacts on the Visual Representation of Language with Reference to South-Asian Typeforms,” Philological Encounters 3, no. 4 (November 2018): 441–68. 

  53. David C. Smith et al., “The Star User Interface: An Overview,” in Classic Operating Systems: From Batch Processing to Distributed Systems, ed. Per Brinch Hansen (Berlin, Heidelberg: Springer-Verlag, 2001), 485. 

  54. Liddle, “Oral History of David Liddle,” 10. 

  55. Kate Eichhorn, Adjusted Margin: Xerography, Art, and Activism in the Late Twentieth Century (Cambridge, MA: The MIT Press, 2016), 94. 

  56. Jane Lamacraft, “Rub-down Revolution,” Eye Magazine 22, no. 86 (2013). 

  57. Johanna Drucker, What Is? Nine Epistemological Essays (Berkeley, CA: Cuneiform Press, 2013), 61–63. 

  58. Steven Heller, “When Type Was Dry,” PRINT Magazine, February 2018, https://www.printmag.com/daily-heller/when-type-was-dry-letraset/

  59. Katie Siklosi, “Handle with Care,” Jacket2, June 2019, https://jacket2.org/article/handle-care

  60. Matt Applegate, “Glitched in Translation: Reading Text and Code as a Play of Spaces,” Amodern 6 (2016). 

  61. Canek Zapata, “cuatro caracteres,” 2021, https://canekzapata.net/palimpsesto/4caracteres/index.html

  62. Daniel Temkin, “Unicode Frenzy,” 2011, http://danieltemkin.com/UnicodeFrenzy

  63. Daniel Temkin, “Unicode Compressure,” 2014, https://danieltemkin.com/UnicodeCompressure

  64. Eichhorn, Adjusted Margin, 100. 

  65. Laimonas Zakas, “Glitchr  (@Glitchr_) / Twitter,” Twitter, accessed December 26, 2022, https://twitter.com/glitchr_

  66. Unicode Consortium, The Unicode Standard 15, 54–55. 

  67. Applegate, “Glitched in Translation”. C. Namwali Serpell finds a similar irruptive capacity in emoji “stacking” and “staggering”: “brief bursts of affective intensity and interpretive pleasure” evoked through repetitious uses of the pictographs. C. Namwali Serpell, “😂; or, The Word of the Year,” Post45: Peer Reviewed, no. 2 (April 2019). 

  68. Jean Baudrillard, “After the Orgy,” in The Transparency of Evil: Essays on Extreme Phenomena, trans. James L. Benedict, Paperback edition reprinted, Radical Thinkers (London: Verso, 1993), 9. 

  69. Tim Gaze, “A Few Persistent Thoughts about Asemic Writing,” Utsanga, June 2015, https://www.utsanga.it/gazea-few-persistent-thoughts-about-asemic-writing/

  70. National Bureau of Standards, “Federal Information Processing Standards Publication,” 11. 

  71. Sujin Lee, Ah Ahk Ahk Aht Ahn, 2015, http://www.sujinlee.org/work/ah_ahk_ahk_aht_ahn.html

  72. Ken Lunde, CJKV Information Processing, 2nd ed. (Sebastopol, CA: O’Reilly, 2009), 178. 

  73. This information is based off the findings of Kohji Shibano and his team of JIS researchers, who investigated the origins of ghost characters during the development of the 1997 version of JIS X 0208. See “7ビット及び8ビットの2バイト情報交換用符号化漢字集合 (7-Bit and 8-Bit Double Byte Coded Kanji Sets for Information Interchange)” (Tokyo: Japanese Industrial Standards Commitee, 1997). For an overview of JIS X 0208’s proper names and location names, see Tatsuo Kobayashi, “情報交換用符号化文字集合と人名用漢字使用の実情 [Actual Use Scene of Han-Character for Proper Name and Coded Character Set],” Journal of Information Processing and Management 55, no. 3 (2012): 147–56. 

  74. Lennon, In Babel’s Shadow, 170. 

  75. I owe this sequence to Paul O’Leary McGann. See Paul O’Leary McCann, “A Spectre Is Haunting Unicode,” Dampfkraft, July 2018, https://www.dampfkraft.com/ghost-characters.html

  76. Unicode Consortium, The Unicode Standard 15, 246. A full list of characters that the Unicode Consortium considers to be homoglyphs may be found in its confusables database. See Unicode Consortium, “Unicode Confusables,” August 2022, https://www.unicode.org/Public/security/latest/confusables.txt

  77. The code points for these glyphs are U+0041, U+1D400, U+1D49C, U+1D538, U+1D670. 

  78. Though this is a welcome subversion of web writing’s constraints, accessibility advocates have rightly discouraged this practice, as screen readers will often spell out the full Unicode character names when they encounter certain homoglyphs. “𝓑ӭ𝘢ʋⱦįᵮůɫ” glyphs (beautiful glyphs), for example, reads “MATHEMATICAL BOLD SCRIPT CAPITAL B, CYRILLIC SMALL LETTER E WITH DIAERESIS, MATHEMATICAL SANS-SERIF ITALIC SMALL A, […]” glyphs. 


Article: Creative Commons NonCommerical 4.0 International License.

Article Image: Rafael Lozano-Hemmer, "Encode/Decode", 2020. Used with permission from the artist.