While carrying out the interviews for this dissertation, I was surprised when a number of interviewees referred to conflicts over the best way to build websites in Hebrew. They were not talking about aesthetics or ease of use, but rather the more fundamental matter of how to display Hebrew letters on the screen. Why should this be an issue? In a nutshell, because Hebrew uses non-Latin characters and is written from right to left. In this chapter I explain in some detail why putting Hebrew on the Internet posed a problem in the first place and discuss the now dominant technologies for doing so. My analysis places the issue of Hebrew on the Internet within the context of globalization, and draws from the toolkit assembled by scholars from the Science and Technology Studies (STS) approach.

I had known that in the early years of the Internet in Israel there had been a number of ways for working around the problems faced by websites wishing to publish Hebrew-language content; and like most surfers, I had vaguely registered that in recent years I was no longer having any trouble viewing sites in Hebrew, but I gave the matter no thought. However, my interest was piqued by some of my interviewees, who placed issues of Hebrew in the Internet within the central flow of their narratives. As they variously explained how a number of competing technologies gave way to one dominant standard, it became clear that they were talking about a process of closure, defined by Misa as “the process by which facts or artifacts in a provisional state characterized by controversy are molded into a stable state characterized by consensus” (Misa, 1992, p. 110).

On the face of it, devoting an entire chapter to the somewhat quirky matter of how to present Hebrew letters on a web page might seem like a diversion from the central issues of this study. There are two main reasons for believing otherwise, however. Firstly, while this chapter does indeed venture into new aspects of the Internet, tensions between the global and the local surface here too. Specifically, I explore what happens when a global technology meets particular local conditions that impede its spread. Also, I use the case of the Hebrew language on the Internet as a prism through which to explore assertions about cultural imperialism. Such an inquiry throws light on the complex ways in which the Internet is involved in processes of localization and globalization, particularly in relation to concerns about threats to minority languages purportedly posed by developments in global telecommunications.

Secondly, the emphasis of this project as a whole is on infrastructural issues to do with the import of the Internet to Israel and its subsequent spread throughout the country. The previous three chapters have analyzed the Internet in terms of its infrastructure and organizational institutionalization, and have not asked what uses people put the Internet to, for instance. Similarly, in this chapter I do not ask what people were reading and writing in the Internet in Israel, but rather I inquire into the very technology that has made it possible to read and write in Hebrew on the Internet in the first place.

In theory, any technology or artifact whatsoever is amenable to the kind of deconstruction proposed by STS, and case studies in the field are extremely varied, ranging from electric cars (Callon, 1987) and ballistic missiles (MacKenzie, 1987), to hydraulic doorstoppers (Latour, 1992) and the color of microwave ovens (Cockburn & Ormrod, 1993). In such a context, it would seem acceptable to research any technology simply because it is there. However, beyond “being there” as a technological artifact, I believe that a study of how Hebrew is dealt with on the Internet is of interest for two main reasons.

Firstly, settling these issues–or bringing about their closure–has been an important part of the infrastructural institutionalization of the Internet in Israel. That is, according to interviewees, the difficulties–now resolved–of publishing Hebrew-language content that could be read by people around the world (and not just in Israel) impeded the growth of the Internet in Israel.[1]Nowadays, though, it is as easy to publish a blog in Hebrew as it is in English, for instance. Moreover, given that developments in Israel were tightly tied in to global processes, as I shall explain below, and given that Hebrew is of course not the only language with special complications in computing, one would expect to find that parallel controversies have also been part of the institutionalization of the Internet in other countries. Findings regarding Hebrew might thus be expected to apply to other languages as well.

This leads us into the second reason for taking an interest in this subject–the so-called “multilingual Internet”. As it was emerging as a global phenomenon, commentators often noted that the Internet appeared to be a primarily English-language domain. Some critics even saw this as part of an ongoing colonialist attempt to dominate peripheral, non-English language speaking parts of the world. For instance, Phillipson asserted that,
In the next phase of imperialism, neo-neo-colonialism, Centre-Periphery interaction will be increasingly by means of international communications. Computer technology will obviate the need for the physical presence of the exploiters. New communications technology will step up the Centre’s attempt to control people’s consciousness. This will play an ever-increasing role in order to strengthen control over the means of production. For this to be effective requires the Centre’s cultural and linguistic penetration of the Periphery (Phillipson, 1992, p. 53).

In the early 1990s, when the proportion of English-language websites was extremely high (around 80%, see Figure 2), concerns about “language death” and the ascendancy of English might not have seemed too far off the mark (Brenzinger, 1992; Skutnabb-Kangas, 2000). Not only was the Internet in English, but, as I explain below, the Roman alphabet seemed hard-wired into its very architecture. Writing in 1993 about Japanese exclusionism on the Internet, Jeffrey Shapard argued:
Narrow vision, one-byte seven-bit ASCII biases,[2] the assumptions about character coding that arise from them, inadequate international standards, and local solutions that disregard what international standards there are and that pay no heed to the ramifications for others–all these are serious related problems that inhibit, rather than enhance, increased connectivity and communication (Shapard, 1993, p. 256)

For a while, then, it seemed that the Internet was set to be hugely dominated by English, with severe repercussions for those not literate in that language, as well as for languages written in a different alphabet: as the world became digitized, it was feared that languages left off the Internet would soon disappear from the world altogether.

This has been the subject of intense political concern, especially in the divisions of UNESCO that deal with linguistic and cultural diversity. UNESCO’s Universal Declaration on Cultural Diversity, for instance, states:
While ensuring the free flow of ideas by word and image, care should be exercised that all cultures can express themselves and make themselves known. Freedom of expression, media pluralism, multilingualism, equal access to art and to scientific and technological knowledge,including in digital form, and the possibility for all cultures to have access to the means of expression and dissemination are the guarantees of cultural diversity (UNESCO, 2001, emphasis added).
Of course, there would be no need to have made such a declaration unless the opposite state of affairs was held to pertain.

In 2003, UNESCO even adopted a “Recommendation concerning the Promotion and Use of Multilingualism and Universal Access to Cyberspace”, in which it explicitly states its belief in “[p]romoting linguistic diversity in cyberspace” (UNESCO, 2003). Again, this would not have been said unless there were fears of a lack of linguistic diversity in cyberspace.

Since then, however, trends have shown that there are more and more non-English speaking surfers (see Figure 7) and non-English websites (see Figure 8). More than half of those surfing the web do not speak English as their mother tongue, and significantly less than half of all websites are written in English (see Crystal, 2001; and the collection of articles in Danet & Herring, 2003).


Figure 7 – based on http://global-reach.biz/globstats/evol.html
proportion of web pages by language UNESCO
Figure 8 – proportion of websites in English: 1998 vs. 2003

At the same time, a strong case has been made against the assumption that English would colonize the Internet. “[T]he forces of economic globalization do not have a vested interest in the global spread of English”, wrote Daniel Dor, rebuffing Phillipson, though perhaps with the advantage of hindsight. Instead, “[t]hey have a short-term interest in penetrating local markets through local languages and a long-term interest in turning these languages into commodified tools of communication” (Dor, 2004, p. 98).

Although the effect of the Internet on the world’s linguistic diversity is an issue to which I shall return, feeding in as it does to key issues in theories of globalization, I am more interested in the technology behind the multilingual Internet than its cultural ramifications per se. Thus, while some researchers have asked how people use the Internet in a language that is not their mother tongue (Kelly Holmes, 2004; Koutsogiannis & Mitsikopoulou, 2003), this chapter asks instead how it is that they can use the Internet to communicate in their mother tongue at all, especially when that language is written in a different script and in a different direction to English. These are conditions that pertain to CJK (Chinese, Japanese and Korean) languages (Shapard, 1993) and to Arabic (Allouche, 2003; Peel, 2004), but also to Hebrew, the case around which this chapter is organized.

The primary theoretical motivation for enquiring into what would seem to be a banal subject, comes, as mentioned, from the field of Science and Technology Studies (STS). STS teaches us that we should never take the consensual status of any technology for granted, and we should certainly not treat it as the inevitable outcome of purely technological developments. Instead we should problematize it and ask questions about “relevant social groups” (Pinch & Bijker, 1987), social context, or coalitions of human and non-human actors (or “actants” (Latour, 1987, 1992)). In other words, we should resist technologically determinist accounts of technology, accounts based on “the idea that technology develops as the sole result of an internal dynamic, and then, unmediated by any other influence, molds society to fit its patterns” (Winner, 1999, p. 29). This resistance has two aspects: the first relates to the process by which a technology has come to be where it is, while the second refers to consequences of that technology. Simply put, neither the development of a technology nor its purported impacts should be seen as inevitable (Bijker, Hughes, & Pinch, 1987; Bijker & Law, 1992b; MacKenzie & Wajcman, 1999).

As well as offering the chance to contribute to the refinement of this body of theory, a case study guided by STS should also bring up other theoretical issues. That is, while opening the “black box” of a technology can be interesting in and of itself, it is even more theoretically valuable when the findings apply to bodies of theory external to the STS paradigm. However, it cannot be known in advance which bodies of theory these will be; rather, this must be extrapolated from the empirical investigation of the particular case. Accordingly, this chapter is both an exercise in the implementation of a certain research program as well as an argument about the way the technological development under discussion suggests a theoretical contribution to globalization studies.

In what follows I shall outline the problems of working with Hebrew on the Internet before presenting what has become the consensual way for dealing with them. Both the problems and the solution have two aspects, one more specific to Hebrew (directionality), and one more generally applicable to non-English scripts (encoding). After explaining the problems and their solution, I return to the problematic of the “multilingual Internet”. Towards the end of the chapter, I propose that the way that different languages have come to be accommodated in the Internet indicates a process of localization through globalization.

1.2 Background: The problem of Hebrew and the Internet

The problems of working with Hebrew in the context of the Internet are a subset of the problems of working with Hebrew in computing more generally. Indeed, the technological history of Hebrew in computers is long and involved, most of which lies beyond the scope of this study. However, the common themes running through the shared history of Hebrew and computers concern two simple facts about Hebrew. Firstly, it is written with non-Latin letters, and secondly, it is written from right to left.

1.2.1 Encoding

The first problem–that Hebrew is written with non-Latin letters–is known as the problem ofencoding. I shall come to the historical source of this problem presently, but first we need to understand exactly what the problem is. When computers store text, they do not store the letters per se, but rather encode them into numbers (which are rendered in binary form, that is, in ones and zeroes). If another computer wants to put those letters on its screen (as is the case whenever I read any document on my computer), it converts the numbers back into letters. It does this by consulting a map, which tells it, for example, that the code number 97 represents the letter “a”. For many years, the dominant map was known as ASCII (American Standard Code for Information Interchange) (Jennings, 1999). This map was constructed in seven-bit code, meaning that it had room for 128 characters, which was plenty for anyone wishing to use a computer to write text in English.[3] However, this caused a problem for languages with extra letters and symbols. How were Scandinavian languages, or even French and German, to cope?

Fortunately, it was not too difficult to turn the seven-bit ASCII code into an eight-bit code,[4] thus doubling the space available for new letters and characters. Different countries began exploiting the new empty spaces by filling them with their own specific characters. This created a situation whereby each country had its own map, or code sheet, where positions 0-127 reproduced the regular ASCII encoding, but where positions 128-255 were occupied by their own alphabet, accented letters, and so on.

This meant that my computer could no longer just consult a map in order to work out which letters it should be showing me on my screen. It must also make sure that it is looking at the correct map, of which there were now a large number (up to 16). For instance, the code number 240 could mean the Greek letter ? (pi) (according to the Greek version of the map, see Figure 9), or the Hebrew letter ? (nun) (according to the Hebrew version of the map, see Figure 10). This implies that if I wish to publish an Internet site, I must add something to the code of that site that tells other computers which map to consult. Not to do so would be like telling you to meet me at a street map grid reference, but without telling you which city’s street map I was referring to. Without directing the computer to the correct map, or, in other words, without telling it which encoding was used in writing the original document, a surfer might receive a page of gibberish, a very common experience for Israeli Internet users until recently (see Figure 11).
8859_7 greek
Figure 9 – the ISO character map for Greek

8859_8 hebrew
Figure 10 – the ISO character map for Hebrew

jibberish instead of hebrew
Figure 11 – A Hebrew website in which the Hebrew letters are represented as gibberish

This system–whereby different countries and alphabets had their own specific map–bore other potential problems: for instance, a document encoded with the Hebrew map would not be able to include Greek characters–you could not refer to both maps at the same time. In addition, if you did not have the necessary map installed on your computer, then you simply would not be able to read documents encoded according to that map.[5] Although it is not a problem that faces Hebrew, it should be mentioned at this juncture that there are of course languages that have far more than 127 characters (the “vacant”, second half of the eight-bit map). For such languages (especially East Asian languages), which might have thousands of different characters, the ad hoc eight-bit method adopted by European countries for encoding their special characters is obviously inadequate.

The first problem, then, is one of encoding. The tiny ASCII map quickly turned out to be inadequate with regard other languages’ letters and characters, resulting in the ad hoc creation of a large number of language-specific maps, a solution which in itself created new problems.

1.2.2 Directionality

The second problem is that of directionality, and is much more specific to Hebrew (and Arabic, of course). Computers were mainly developed in the United States, and just as this explains why ASCII was considered a good encoding system (Shapard, 1993), it also explains why computer code is written from left to right; after all, this was what came naturally to western computer programmers. However, we can see how this creates problems for Hebrew if we consider Internet pages. When we surf the net, our browser reads a source page and does what that page tells it to do (for instance, put this picture here, make this word link to another site, and so on). This source page, which includes the text that we see on our screen, is written from left to right, and our computers read it from left to right. This creates special problems for languages that are written from right to left. Specifically, it poses the question of how to write them in the source page.

Less significant variations notwithstanding, there have been two main ways for working with the directionality of Hebrew on the Internet.[6] The first method is known as visual Hebrew, and involves writing the Hebrew in the source page back to front so that the browser will display it the right way round. This method was dominant for the first few years of the Internet, and received governmental backing in June 1997, which was not formally rescinded until 2003.[7] The second method is known as logical Hebrew, and allows Hebrew to be written from right to left in the source page. It is this method that has become the standard, as I shall explain below.

Experts have pointed out the technical disadvantages of visual Hebrew as compared to logical Hebrew. They note that the kind of line breaks that word processors automatically put in text that we type do not work with visual Hebrew. Instead, one has to insert “hard” line breaks. If a surfer reading such a site then changes the size of either the font or the browser’s window, this can have the undesirable consequence of jumbling up the order of the lines and rendering the text quite illegible. More significantly, however, they note that creating visual web pages is more costly than creating logical ones, as some kind of flipping mechanism is required to turn the text back to front before it can be put in the source page. Similarly, copying text from a visual Hebrew web page into another program, such as a word processor, requires the same flipping process, otherwise the copied text appears back to front (see, for instance, Dagan, n.d.).

Given that logical Hebrew predates the popularity of the Internet, these serious problems with visual Hebrew raise the question of how it remained dominant for so long. If, as is common knowledge among programmers and web designers in Israel,[8] logical Hebrew is so much better than visual Hebrew, why did it take so long for it to establish itself as the standard? By way of a partial answer we can offer Wajcman’s comments made against technological determinism: “it is not necessarily technical efficiency but rather the contingencies of socio-technical circumstances and the play of institutional interests that favour one technology over another” (Wajcman, 2002, p. 352). It is to those socio-technical circumstances that we shall return.

1.3 The two-fold solution

Nowadays, there is little controversy over how to deal with Hebrew on the Internet. The consensual solution has two institutionally interrelated but technologically separate parts, each of which deals with a different aspect of the problems presented above. As I shall explain below, the solution involves using Unicode to solve the problem of encoding and logical Hebrew to solve the problem of directionality.

As mentioned, anti-determinist arguments have two aspects–one referring to the process of technological development, and the other referring to the impacts of technology. The following discussion of the dominance of logical Hebrew illustrates the first aspect, while the discussion of Unicode lays more emphasis on the second.

1.3.1 Directionality

The recognized solution for working with the problem of directionality in Hebrew websites is to build them using logical Hebrew, and the first years of the millennium saw a number of high profile Israeli sites make the move from visual to logical Hebrew.[9] As mentioned, this is now the standard used in government websites, and, in addition, it was incorporated into HTML 4.01[10] as far back ago as 1999.[11]

The implementation of visual Hebrew for the Internet was largely developed by Dudi Rashty in his capacity as an employee of the Hebrew University’s Computing Authority.[12] The dominance of logical Hebrew nowadays, however, has a lot to do with Microsoft, a fact that led to the expression of opprobrious opinions in a number of Internet-based forums. The rhetoric in such sites was similar to the anti-Microsoft diatribes often repeated with regard to Microsoft’s software,[13]marketing, and so on.[14] I shall not here discuss this rhetoric at length, though it would appear that Microsoft’s involvement was less sinister than its detractors would have us believe: Microsoft wanted to enter the Middle Eastern market and quickly learnt that there were a number of ways of dealing with Hebrew directionality issues. It understandably wanted one standard to be agreed upon that it could subsequently integrate into its operating system, but expressed no opinion as to which standard that should be. Indeed, this was Microsoft’s policy in every country that it entered. The standard that was decided upon in Israel was that of logical Hebrew, which was subsequently integrated within Windows. The fact that around 90% of the world’s desktop computers have some version of Windows installed[15] is part of the explanation for the dominance of logical Hebrew as the consensual standard. In other words, while Microsoft may have brought the issue to a head, it did not offer a priori support to one standard over another. Because the standard adopted was given the nickname “Microsoft Hebrew”, Microsoft was more closely associated with it than was actually the case.[16]

Moreover, as powerful as Microsoft has become, its success has depended on the personal computer (PC) replacing mainframe computer systems. This suggests that the dominance of logical Hebrew can be partly explained in terms of actor-network theory (ANT), and particularly its recognition that non-human actors, termed actants, may play as important a role in constituting technology as human ones. As John Law puts it, a successful account of a technology is one that stresses “the heterogeneity of the elements involved in technological problem solving” (Law, 1987). ANT also teaches us to view technologies as contingent ways of dividing up tasks between humans and objects, an idea illustrated by Latour in his playful piece on hydraulic doorstops (Latour, 1992). In that paper, Latour describes how a hydraulic doorstop ensures that a door is closed (quietly) after it has been opened (thus keeping out the cold). His point is that although this is a task that a person could carry out, it has been allocated to a machine instead. Accordingly, it seems appropriate to introduce into this analysis two objects–the dumb terminal of a mainframe computer system, and the personal computer–a move which requires a small amount of explication.

Simply put, representing visually stored text on a screen demands much less computer power than representing logically stored text. Indeed, dumb terminals–display monitors attached to mainframes with no processing capabilities of their own–were barely capable of doing so. Visual Hebrew was an apt solution for the world of mainframes; indeed, it was quite possibly the only solution. Personal computers, however, offered much more computer power and were more than capable of putting logical Hebrew on screen (Figure 12 shows the growth in PC sales from 1980 to 2004). Because using logical Hebrew saves time at the programming end–it cuts out the “flipping” stage because you do not need to turn the Hebrew back to front–programmers preferred it.[17] In terms taken from ANT, therefore, we can understand the shift from mainframes to PCs as also involving the redistribution of a certain task, namely, “flipping” Hebrew text: the dumb terminal was not able to do so, and so a human had to do it; the PC, however, is able to take on that task by itself.

Figure 12 – based on Personal Computer Market Share: 1975-2004, http://www.pegasus3d.com/total_share.html, retrieved 20/2/2006

This, however, merely begs another question to do with the timing of the dominance of logical Hebrew in the Internet: if “everyone knew” that logical Hebrew was better (apart from a few anti-Microsoft writers), and if Microsoft starting integrating it into its operating system from the early 1990s, how do we explain the Israeli government’s decision in 1997 that all government sites must be written in visual Hebrew? Why was logical Hebrew not adopted for use on the Internet at this stage?

Again, the answer does not lie with the technological superiority of one solution over another, but rather with a particular aspect of computing at the time, namely, the political economy of web browsers. In 1997, the most popular Internet browser by far was Netscape Navigator (see Figure 13), a browser that did not support logical Hebrew. In other words, sites written in logical Hebrew would simply not be viewable in the browser that was installed on most people’s computers. Netscape Navigator did support visual Hebrew, however.[18] Therefore, if you wanted to write a website that would be accessible to the largest number of people–users of Netscape Navigator–you would do so using visual Hebrew. Netscape Navigator only introduced support for logical Hebrew in version 6.1, which was released in August 2001. By this time Microsoft’s browser, Internet Explorer, had attained supremacy, and it, of course, had full support for logical Hebrew (by virtue of it being produced by Microsoft).

Previously, then, there had been a trade-off between ease of website design and number of potential surfers to that site–you would want to make the site in logical Hebrew, but most people were using a browser that would not let them see that site. With the emergence of Internet Explorer, however, there was no longer any need to make the trade-off. If in the past Hebrew readers outside of Israel would only be able read Hebrew websites written in logical Hebrew if they had an operating system specially designed to do so, now they could read websites in logical Hebrew without having to carry out any special actions (such as downloading and installing Hebrew fonts, a task that many found taxing).
browser share
Figure 13 – Source: http://en.wikipedia.org/wiki/Usage_share_of_web_browsers

In summary, for quite understandable technical reasons, programmers and web designers have long preferred to produce sites using logical Hebrew. However, following an STS research agenda, this section has shown that the dominance of that standard has more to do with changes in computer hardware, Microsoft’s interest in the late 1980s and early 1990s in global expansion, and the success of that company’s browser, at the expense of Netscape Navigator, than purely technical aspects.

1.3.2 Encoding

1.3.2.1 The origins of Unicode

The consensual solution to the problem of encoding has been provided by the Unicode Consortium, whose website declares: “Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language”.[19] In other words, instead of each set of scripts and alphabets requiring their own map, as explained above, Unicode provides one huge map for all of them. It offers a standardized way of encoding all documents in all languages. As Gillam writes in his guidebook on the subject, Unicode solves the problem of encoding
by providing a unified representation for the characters in the various written languages. By providing a unique bit pattern for every single character, you eliminate the problem of having to keep track of which of many different characters this specific instance of a particular bit pattern is meant to represent (Gillam, 2003, p. 7).

Put even more simply, Unicode makes it impossible that the same number might refer to more than one character, as we saw with the example of code number 240 being able to represent both the Greek letter pi and the Hebrew letter nun.

The appearance and increasing popularity of Unicode have been met with widespread (but not universal) approval across the computing industry. Indeed, articles in newspapers and trade magazines greeted Unicode extremely warmly (for instance, Ellsworth, 1991; Johnston, 1991; Schofield, 1993). For example, one article in a trade magazine talks about how Unicode is “bringing more of the world in” (P. Hoffman, 2000). The Unicode Consortium itself claims that “[t]he emergence of the Unicode Standard, and the availability of tools supporting it, are among the most significant recent global software technology trends” (Unicode Consortium, 2004). In a book on Unicode for programmers, Tony Graham deploys Victor Hugo to describe it as “an idea whose time has come” (T. Graham, 2000, p. 3), and says that for him it is a “dream come true” (p. x).

However, there are three elements of the discourse surrounding Unicode which call for sociological attention. The first is the determinist tendency to represent Unicode as the next stage in a natural process. The second is the existence of alternatives to the dominant technology. The third is a discussion of the technology’s purported “impacts”.

The first element–the tendency to see Unicode as the outcome of an almost natural process of development–can be seen, for instance, in an article by a programmer and writer who terms Unicode “the next evolution” for alphabets (Celko, 2003). In their books, both Graham and Gillam present Unicode as the natural and obvious solution to a problematic state of affairs. In keeping with technological determinist ways of thinking, they talk as if Unicode was out there waiting to be discovered.

One way of presenting an alternative to this discourse is to use the concept of “relevant social groups”, and particularly Pinch and Bijker’s assertion that “a problem is only defined as such, when there is a social group for which it constitutes a ‘problem'” (Pinch & Bijker, 1987, p. 414). To this we could add that the relevant social group also needs to have the means of formulating and disseminating a solution. Unicode expert Tony Graham, for instance, offers an explanation of the origins of Unicode:
The Unicode effort was born out of frustration by software manufacturers with the fragmented, complicated, and contradictory character encodings in use around the world. The technical difficulties that emerged from having to deal with different coded character sets meant that software had to be extensively localized before it could be released into different markets. This meant that the “other language” versions of software had to be significantly changed internally because of the different character handling requirements, which resulted in delays. Not surprisingly, the locale-specific character handling changes were regrafted onto each new version of the base software before it could be released into other markets (T. Graham, 2000, p. x).

This is borne out by other histories of Unicode, which locate the origins of its invention in the difficulties of rendering a piece of English-language software into an Asian language. And indeed, the Unicode project was initiated by programmers at Xerox and Apple.[20] The most obvious relevant social group, then, is clearly that of computer programmers working in multi-lingual environments.

Their superiors also had a clear interest in Unicode insofar as it can dramatically reduce the time it takes to turn software from English into another language.[21] For instance, the American version of Microsoft Windows 3.0 was released in May 1990, but the Japanese version was shipped only 18 months later. Partly thanks to the technology being discussed here, the English and Japanese versions of Windows 2000 were released on the same date. More generally, the time spent reprogramming software for foreign markets costs businesses money. Unicode is thus represented as aiding in computer companies’ internationalization efforts.

However, another social group can be identified which is enjoying the spread of Unicode, but which could not have developed it itself. This group is made up of librarians, who have been working with large numbers of scripts ever since the profession was created.[22] To take a random example, the Ohio State University Libraries contain over 45,000 records in Arabic, Hebrew and Japanese, and over 150,000 volumes in Chinese. The implementation of Unicode means that students of Chinese, for example, can search for titles and authors using Chinese characters, and without having to guess at how they might have been transliterated into Latin text.[23] It also means that libraries with multi-lingual holdings can maintain them all in a single database, with all the implications for more efficient information management and searching.

The social group of librarians, then, suggests a refinement to the original formulation of the concept of the relevant social group. The same kinds of problems that drove computer programmers to rethink encoding issues had been troubling librarians for many years previously. Unicode was created, however, when a social group with the means of developing a solutiondefined it as a problem; in other words, it is not enough for a problem to be defined as such–it must be so defined by a relevant social group that has the knowledge and the capital to come up with a solution. It is in this sense that the timing of the appearance of Unicode should be seen as contingent on its social context, and not as the inevitable result of internal technological developments.

The concept of relevant social groups can also be deployed in order to understand the limits of Unicode, whose focus has largely been on scripts used in business. For the core membership of Unicode, the problem that it purports to solve is that of internationalization. However, alternative relevant social groups, such as UNESCO and the Script Encoding Initiative,[24] attribute different meaning to the project. For the latter, the importance of successfully integrating a minority language into Unicode has nothing to do with business; rather, it “will help to promote native-language education, universal literacy, cultural preservation, and remove the linguistic barriers to participation in the technological advancements of computing”.[25] Moreover, the reasons given by such groups for the absence of minority languages from Unicode are explicitly political and include references to the relative poverty of speakers of minority languages, the obvious barriers they face in attending standardization meetings and drawing up proposals, and the fact that they do not constitute a large consumer base. UNESCO devotes a large section of its website to the issue of multilingualism on the Internet, and its framing of the issue is patently clear. A page entitled Multilingualism in Cyberspace opens with the following sentence: “Today various forces threaten linguistic diversity, particularly on the information networks”,[26] thus locating their interest in encoding issues in the field of language preservation. However, it is also framed as pertaining to the digital divide:
Increasingly, knowledge and information are key determinants of wealth creation, social transformation and human development. Language is the primary vector for communicating knowledge and traditions, thus the opportunity to use one’s language on global information networks such as the Internet will determine the extent to which one can participate in the emerging knowledge society. Thousands of languages worldwide are absent from Internet content and there are no tools for creating or translating information into these excluded tongues. Huge sections of the world’s population are thus prevented from enjoying the benefits of technological advances and obtaining information essential to their wellbeing and development. Unchecked, this will contribute to a loss of cultural diversity on information networks and a widening of existing socio-economic inequalities.

Likewise, linguistics expert and UNESCO consultant John Paolillo writes that, “[f]or the Internet to allow equivalent use of all of the world’s languages, Unicode needs to be more widely adopted” (Paolillo, 2005, p. 73).

This is an example of what the literature terms “interpretive flexibility” (Kline & Pinch, 1996; Pinch & Bijker, 1987). For one group Unicode is a way to simplify software internationalization and thus increase profit margins, while for another it is a means of preserving endangered languages and narrowing the digital divide. The former interpretation is currently dominant, though organizations such as the Script Encoding Initiative and UNESCO are trying to impose their interpretation as well.[27]

In short, the concepts of a relevant social group and interpretive flexibility can help to explain the emergence of Unicode and the current struggles over which character sets are to be included in it and how.

The second aspect that a student of technology must highlight is that of alternatives to the dominant technology. That is, we must avoid the tendency to see the “victorious” technology as the only one in the field (Pinch & Bijker, 1987) and make room in our analyses for competing, though less successful technologies too. Crucially, both successful and unsuccessful technologies must be analyzed on the same terms and with the same tools. With the case of Hebrew, we saw how visual Hebrew constituted competition to logical Hebrew; with Unicode, the competition, such as it is, would seem to be coming from what is known as the TRON project, a Japanese-based multilingual computing environment.[28] Raising a theme to which I shall return below, a senior engineer from that project sees the ascendancy of Unicode as directly linked to its support from leading US computer manufacturers and software houses, who promoted Unicode for reasons of economic gain, not out of consideration for the end user. Indeed, the full members of the Unicode Consortium include Microsoft, Apple, Google, Yahoo!, HP, IBM, Oracle, Adobe, and Sun Microsystems.[29] Their economic gain, it is argued, lies in the development of a unified market, especially across East Asia, which would “make it easier for U.S. firms to manufacture and market computers around the world”.[30] Programmer Steven J. Searle, leading representative of the competing TRON project, makes the point, therefore, that Unicode did not become the dominant standard on account of its technological superiority alone (indeed, that in itself is questioned), but rather because of the alliance of US firms supporting it.

1.1.1

1.3.2.2 The impacts of Unicode

Having discussed some of the processes in the formation of Unicode, I turn now to a discussion of its impacts, the third component of arguments against technological determinist views, and in particular, to arguments that Unicode is yet another instance of western cultural imperialism. This kind of argument is essentially a sub-set of claims made about English becoming a global language at the expense of local languages, now faced with an ever greater danger of extinction (Phillipson, 1992, 2003; Skutnabb-Kangas, 2000).[31] In response to such assertions, I shall argue that Unicode in fact enables counter-processes of localization and the strengthening of diasporic ties.

To say that an effect inheres in a technology is seen as an example of technological determinist thinking, often given the lie thorough empirical research. For instance, early commentary on the Internet saw it as an isolating technology because one sits alone in front of one’s computer (see especially, Kraut et al., 1998; Nie & Erbring, 2000). However, research over the last decade has not ceased to point to ways in which the Internet is a social phenomenon that enables, and does not detract from, social capital (for example DiMaggio et al., 2001; Kavanaugh & Patterson, 2001; Wellman et al., 2001). Equally, utopian perspectives that saw the Internet as bearing the potential to bring an end to war because it is a technology that connects people (attitudes most strongly associated with Al Gore in the early 1990s (Gore, 1994a)), have also been shown by the passage of time to be somewhat unrealistic. On the other hand, it has been argued, most notably by Langdon Winner (1999) by means of Moses’ bridges,[32] that artifacts have politics, that is, that the design of objects has political effects–seen quite clearly in buildings that lack wheelchair access, for example (and see also the hydraulic door stoppers in Latour, 1992). In this sense, the effects of the technology are determined by its design (see also B. Friedman, 1997). This, then, is a kind of technological determinism, but not a kind deemed sinful by STS researchers.

Most criticisms of Unicode are made in this spirit. That is, they point to a certain structural feature of the Unicode project and argue that it has negative political consequences. In general, critics of Unicode argue that it is bad for East Asian languages, and this for a number of reasons. In what is to the best of my knowledge the only article to offer a social science-based analysis of Unicode, David Jordan (2002) makes the fundamental point that any fixed encoding system will necessarily be insensitive to linguistic entities such as the Hokkein dialect of Chinese spoken in Taiwan, which creates new words and characters all the time.[33] Moreover, displaying sensitivity to the political context, and recognizing the importance of the Chinese mainland in influencing Unicode decisions relative to Taiwanese input, Jordan makes the feasible argument that, “[a]s Unicode becomes the ‘de facto’ standard for writing human languages, script innovations will presumably become less and less likely to receive wide use” (p. 111). With respect to the technology, then, Jordan makes two points: the first is a general one about the inability of a fixed and stable character set to deal with dynamic languages not based on an unchanging 26 letter alphabet; the second is a more specific one about the particular alignment of forces within Unicode and the broad social context of its diffusion, and particularly the ascension of China, which is having deleterious effects on a minority disapproved of by the Chinese regime.

This critique is linked to another one which sees the attempt to “simplify” Han Chinese by “unifying” CJK languages as a project of cultural imperialism. Unicode embarked on a project to fit East Asian languages onto a character map by unifying the thousands of Chinese characters that comprise their scripts. Critics of this project claim that while it may be beneficial for the (American) manufactures of computers and their software, it will throw up irrationalities for the users of such languages, especially the fact that the Unicode system does not let you know which character belongs to which language. This is problematic for creating sortable databases. Nor does the Unicode system enable many personal and place names to be written in Japanese, for example, which renders it unsuitable for use in government and other offices. Technological specifications deemed convenient for U.S.-based firms are thus seen as having unwelcome consequences for the everyday use of computers in East Asian countries. Similarly, criticism has emerged from Japan based on fears that Unicode will do away with many thousands of Japanese Kanji, with a Japanese academic terming Unicode a “cultural invasion” (Myers, 1995).

1.3.2.3 Localization through globalization

While the abovementioned arguments made against Unicode attribute certain unwanted effects to it, they do not do so in a simplistic fashion. They do not assume that Unicode, “unmediated by any other influence [than the technology itself], molds society to fit its patterns” (Winner, 1999, p. 29). Instead, they are arguments that point to the politics of this particular technological artifact while drawing on its broader social and political context. It is in this spirit that I suggest another possible consequence of Unicode, though one with a less critical tone, and which touches on the abovementioned interpretations of Unicode by UNESCO and other groups.

A journalist for a trade magazine once compared Unicode with Esperanto (Ambrosio, 1999). This kind of comparison would appear to reinforce accusations of cultural imperialism; at the very least it portrays Unicode as a force for homogenization across the globe. However, the comparison itself is actually entirely misplaced. The idea behind Esperanto was to enable people the world over to speak one common language; the idea behind Unicode, however, is to allow people the world over to speak in their own language, no matter what that language may be. Returning to a term introduced in the first paragraph of this chapter, it is in this sense that Unicode enables the multilingual Internet, making it easier to create Internet content in languages other than English, and, what is particularly crucial, enabling that content to be viewed all around the world.

This is especially pertinent in a world in which migration is increasing and diasporic communities are growing.[34] Miller and Slater (2000), for instance, stress the role of the Internet in shaping relations between members of Trinidad and Tobago’s large diaspora and friends and family in the homeland, [35] as do Graham and Khosravi (2002) in the Iranian context. As we have seen, the ability to transfer text in non-Latin scripts between computers is no trivial issue, as Hebrew-language Internet experts were aware in the early 1990s. It would seem that Unicode makes it simpler, though, thus making it a crucial component of the development of the multilingual Internet.

So whether or not we accept the abovementioned criticisms of Unicode, it is clear that part of the politics of this particular artifact is multilingualism, and in a way that would seem to instantiate Robertson’s well-known characterization of globalization, “the particularization of universalism and the universalization of particularism” (Robertson, 1992, p. 178). Indeed, Unicode could be seen as much more emblematic of this process than the examples usually given for it. Representing the mainstream of such research, theoreticians arguing for the limits of globalization refer to local translations and adaptations of western cultural goods–classic examples include local variations in MacDonald’s restaurants (Watson, 1997) and culturally-dependent readings of Dallas (Liebes & Katz, 1990)–which means they are focusing on the first half of Robertson’s definition alone (see also Hannerz, 1992; Nederveen Pieterse, 1995). In this context, the Internet could also be seen as a universal form that undergoes particularization in its various localities. Studies embodying the second half of the definition are well represented by Appardurai’s (1990; 1996) work, for example, and contain references to heightened ethnic awareness and the deconstruction of imagined communities at the nation-state level.

Unicode is singular in this regard: it is not a cultural good in the way that a Hollywood movie is, and nor is it a kind of identity. However, its paths of diffusion are similar to those of such global cultural forms, and it assists in the computer-mediated communication of local identities. Indeed, its success has been dependent on precisely those mechanisms so often critiqued as transmitting cultural imperialism, while enabling the use of non-English languages based on a quintessentially global technology. It simultaneously promotes the interests of global capital while providing a tool for resisting one of its purported cultural consequences (the domination of the English language)–thus while online shopping has become increasingly popular in Israel since 1999,[36] it has been found that people are three times as likely to purchase from a site in their language (Scanlan, 2001). In other words, the global and the local merge; each is a part of the other.

It is in this sense, I argue, that the Unicode project is an excellent example of processes of globalization. What is more, the fact that it does not itself have any content (in the way that a book or a back-lit menu in a fast-food restaurant does) means that it is even more transportable and subsists at a deeper level of abstraction than the usual examples of cultural globalization. It is a globalizing technology–in both of Robertson’s senses of the term–and not just a cultural instantiation of it.

1.4 Conclusion

This chapter has looked at one of the major technological underpinnings beneath the Internet in Hebrew and the multilingual Internet in general. Just as an earlier chapter explored in detail the physical connections made between Israel and countries overseas, so this chapter has examined another infrastructural aspect of the diffusion of the Internet in Israel. Going behind the scenes, it has inquired into the ways in which Hebrew, a difficult language for working with on computers, has been dealt with by programmers. My analysis shows how what would at first glance appear to be a banal and local technical problem is actually part of global efforts at standardization, and even ties in to Microsoft’s far-reaching business ambitions. This is quite in line with the ways that STS researchers approach the resolution of a technological problem. In particular, I used the concepts of the relevant social group, interpretive flexibility, and actants to explain the technological development of logical Hebrew and Unicode.

The research strategy adopted here, however, is merely a tool: it enables the researcher to uncover the social context of a technological development, but it does not offer a way for theorizing that context per se. At best, the researcher can only have a broad a priori idea about the theoretical field in which he may find himself (a study of kitchen appliances will probably require theoretical reference to gender relations in society (see for example Cockburn & Ormrod, 1993 and their analysis of the microwave oven)).

The case analyzed here clearly falls within the remit of globalization theories. The technologies discussed (both logical Hebrew, but more obviously Unicode) are patently connected to globalization through their institutional affiliations; however, they have enabled the Internet to shed its derogatory nickname as being the “Western Wide Web” and are behind the growing rates of non-English usage of the Internet. Given that people spend most of their time online in environments that use their own language (14 out of the 18 most popular sites among Israeli surfers are in Hebrew, for instance (Bein, 2003)), the ability to easily create content in languages other than English, and, what is just as crucial, for that content to be accessible around the world, is part of the globalization of the Internet. Put differently, it is what enables the local adoption of this global technology. Unicode, however, is not a cultural form in the same way that MacDonald’s is. While the latter is altered in various ways as it is absorbed by new countries creating hybrid forms, the globalizing power of the former is that it is exactly the same all over the world. Indeed, with the case of Unicode, the further it spreads across the globe and the more stringent the standardization that it brings with it, the greater the extent of the localization of the Internet.


[1] Similar problems were reported in Japan (Shapard, 1993).
[2] I explain the meaning of this below.
[3] In slightly more technical terms: a seven-bit code is one that uses binary numbers of up to seven digits in length. Such a code will therefore have 2x2x2x2x2x2x2=128 different binary numbers.
[4] Again, the reason is technical, but simple: bits always come in groups of eight. Therefore, a seven-bit piece of code always starts with a 0. If you turn that 0 into a 1 (thereby making it an eight-bit code), you have another 128 numbers at your disposal. (See also Haible, 2001).
[5] In Russia, where there were several different ways for encoding Russian characters, this could be particularly problematic. Similar problems are reported with Hindi (Paolillo, 2005, pp. 54-55).
[6] Here too it is worth nothing that these problems have a longer history than I can deal with in this chapter.
[8] I base this statement on interviews with actors involved in the development of technologies for dealing with Hebrew on the Internet, and on extensive reading of websites on the issue.
[9] Such as ynet.co.il and haaretz.co.il.
[10] HTML is a language in which web pages are written, and is overseen by the World Wide Web Consortium, the body that maintains HTML and publishes HTML standards (see www.w3.org).
[11] Though not at the expense of visual Hebrew, which is still supported by HTML 4.01, primarily to ensure backward compatibility with older sites.
[12] I interviewed Dudi Rashty on 25/11/2004.
[13] See sites such as http://www.microsuck.com/ and the site of the Anti-Microsoft Association (http://users.aol.com/machcu/amsa.html).
[15] It would be fair to presume similar numbers in Israel as in the rest of the world.
[16] This was related to me in interview by Jonathan Rosenne, an Israeli programmer and expert on issues of Hebrew on the Internet. This expertise is reflected in his participation in pertinent committees in Israel and overseas (his website can be found athttp://www.qsm.co.il/Hebrew/index.html). I interviewed Rosenne on 26/2/2006.
[17] This was explained to me in interview by senior IBM employee, Matitiahu Allouche, with whom I met on 15/2/2006.
[18] More precisely, Netscape Navigator would show sites written in visual Hebrew but without realizing it was doing so–this required the development of a special font that the end user had to install. By using this font, the browser would be made to think it was dealing with Latin text, while the surfer would see letters in the Hebrew font.
[20] See http://www.unicode.org/history/ for a more detailed account.
[21] A process known as “internationalization”, “the process of planning and implementing products and services so that they can easily be adapted to specific local languages and cultures” (http://whatis.techtarget.com/definition/0,,sid9_gci212303,00.html).
[22] A search for “Unicode” in the ISI Web of Knowledge, for example, returns 62 results, 20 of which are for articles in journals to do with librarianship and information retrieval (search conducted June 29, 2006).
[27] An attempt to include the Klingon script (from the Star Trek series) failed, showing how a certain interpretation of Unicode was rejected by that organization’s institutions.
[29] See http://www.unicode.org/consortium/memblogo.html for a full list of the consortium’s members.
[31] See UNESCO’s site, http://portal.unesco.org/culture/en/ev.php-URL_ID=8270&URL_DO=DO_TOPIC&URL_SECTION=201.html. UNESCO claims that 50% of the world’s 6,000 languages are endangered, and that a language “dies” every two weeks.
[32] Very briefly, Winner argued that Robert Moses’ bridges in Long Island were intentionally too low for buses to pass underneath so that the poor could not reach the beaches that the white middle class wished to keep for itself. (However, see also Joerges, 1999; Woolgar & Cooper, 1999 for a discussion of the historical accuracy of Winner’s example.)
[33] A point also made by Unicode proselytizer Tony Graham in his book (T. Graham, 2000).
[34] The International Labor Organization reports that “[t]he rate of growth of the world’s migrant population more than doubled between the 1960s and the 1990s” (http://www.ilo.org/public/english/protection/migrant/about/index.htm, accessed 3/7/2006).
[35] Trinidad and Tobago has one of the highest rates of net emigration in the world (at -11.07/1,000 population), according to the CIA World Factbook (http://www.cia.gov/cia/publications/factbook/fields/2112.html, accessed 3/7/2006).