Lexicography, an adventure not a job
Sep. 22nd, 2011 10:22 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
We tend to think of Information Technology in terms of motherboards, hard drives, wireless routers, iPhones and iPads. In fact, Information Technology predates the integrated circuit and even the vacuum tube. Of course the term, Information Technology only goes back to 1958 when the Harvard Business Review coined it. Even though IT originally referred to a collection of new technologies that didn't fall under a single name or category, I'll use it to mean any technology which is used to organize, produce or disseminate information en masse. While some people would make an argument for the alphabet being the original information technology, that invention only allowed the manipulation of information on very small scales. We don't see economies of scale come in play until the invention of the first printing presses and even, then production capacities were limited. The earliest industrial printing presses didn't arrive until 1800 at which point print took off. Like any good technology, the printing press led to the development of other, more advanced technologies. In fact, the printing press kicked off a revolution in scholarship which broadened the base of literacy and ultimately led to the creation of the modern nation state.
Nations as we know are largely Imagined Communities to use Benedict Anderson's term. That is to say that nations are held together by little more than the glue of conviction that we whoever we are comprise a community. That conviction may derive from a shared linguistic and cultural tradition or from a civic commitment to follow the rules and practices of governance laid out by community founders. France and Germany are examples of the first sort of nationalism. The United States, and modern India exemplify civic nationalism. As Americans(of the US sort), we retain so many language and faith traditions that all we really have in common is a shared understanding of governance as encoded in the US Constitutions(though that seems to have become a bit frayed of late). In both cases, nations survive and thrive through the development and promulgation of what Ernest Gellner called a High Culture. This to say that all nations create a literate culture which provides exemplifies national virtue and embodies the values underlying the national identity. In almost all cases this entails the development of a standard language which usually, in some form, becomes an official language. The United States has a de facto official language in English. While many municipalities and states within the larger union may provide a great deal of official literature in a variety of languages, they do so as a courtesy to new arrivals, as a way to help them find their feet in a totally new land. The fact remains that immigrants operate at severe disadvantage in the US until they master the English language. India has a different experience. In that nation, many languages are spoken have been adopted as official languages. India faces a dilemma not shared by the US. In the US, almost everyone comes from an immigrant background. India's languages predate recorded history in many cases and many can claim the mantle of authority owing to their role in the development of religion and culture through the history of the region. So India compromises by standardizing on set of dominant languages including the language which all ethnicities find equally distasteful, English. Hey, if you can't please everyone why not make them all equally pissed off?
So nations create literate high cultures and promulgate them to the masses. Why on Earth would they do that either consciously or unconsciously? This goes back to the revolution of the printing press. Before the advent of the printing press, books were laboriously created and meticulously copied by hand and passed around within a small community of clerks and scholars. In Europe, this was done largely under the auspices of the Catholic Church. In fact, the term 'clerk' is and alternate spelling of 'cleric'. The church ministered to many distinct peoples distinguished by their languages. Before the 1800s, languages varied widely by locality and mastering any more than a few was futile. The Catholic Church solved the problem by standardized on the Latin spoken in the Roman Empire. This was no one's language but the church's. While this standardization did allow for the transfer of knowledge across a large sphere of influence, it impeded the spread of literacy by forcing anyone who wanted to learn to read to learn a completely foreign language even as they mastered what were, at the time, very unfamiliar skills and ways of thinking. Because we have been born into a print based culture, we forget just how profoundly print changes our way of thinking. We 'look things up', we engage in an internal dialog with our information and process it piecemeal before adding up the parts. Oral cultures must know things in their totality. We retain the salient ideas confident that we look up the details by consulting our collection or borrowing a text from a library.
Printing changed all that. Once a master copy of work as made, a printer could duplicate at whim. Now, why would a printer print something in Latin when all his neighbors and patrons speak say, German? The crafty printer will develop an orthography to represent the spoken language in print form. That creates a problem. What happens when the printer's journeyman takes his product to the next big city? He may well find that the people there speak a mutually intelligible dialect but that the orthography his master developed makes little sense to the people in that city. Not to mention, he may find the local printers have other ideas for proper representation of th spoken language. These print forms of a spoken language constitute print vernaculars. The term vernacular refers to the version of language spoken in a region or locality. For a print vernacular, the locality is the printed page. Print vernaculars created huge markets for printed works and encouraged reading, especially of practical works in fields such as Engineering, Science, Geography and Mathematics.
The standardization of language confers huge advantages to the communities which go this route. In Pre-Industrial Societies, trades pass down through families. A son often fills the same occupation as father whether or not the local market demands those skills. The knowledge passed down by example and my oral tradition. Literate Societies which undertake the education of their children in standard languages create mentally agile workers able to train and retrain for skilled work as the current conditions of their economies demand. In early industrial societies, a son or daughter often holds a very different occupation from their parents. In mature industrial societies, individuals may change occupations or even fields entirely during their lives. In modern societies, individuals can expect to hold eight or more distinct jobs and variety of fields. For instance, I myself have held numerous jobs spanning at least four distinct fields. Standardization of language enables this flexibility by providing universality in educational works. In other words, once I possess high a high degree of literacy in English, I can pick up a textbook in Computer Science and, with instruction and evaluation, learn to be a programmer as easily as I can learn to be an Accountant. The worker thus becomes as interchangeable a part as a machine tool.
Dictionaries then seem like the original software project. Languages cannot be standardized without a common reference that lists the spelling, meaning and proper uses of their words. Without such a reference, a common pedagogy cannot be produced and children cannot educated with the literacy skills that make knowledge easily transportable. Because they embody a national language, dictionaries become political documents of immense power. Beyond that, the existence and availability of authoritative dictionaries make it possible to educate large numbers of highly literate citizens who have the ability to adapt as the economy changes. By standardizing language, dictionaries help standardize education through out a nation. This means, among other things, that students graduating from secondary schools in a community may enter any college and undertake advanced studies in any field, or they may choose to study a trade. This degree of self determination is extremely new in human history. As dictionaries standardize language and open up self determination in education, they facilitate commerce within a nation by providing a common language for businesses. Indeed, institutions will often take this systematization even further with the development of Controlled Vocabularies which speed decision making and streamline description. Like all software projects, dictionaries automate processes which used to require laborious and painstaking effort to accomplish.
As documents, dictionaries also record a fair amount of history. Using English, for example. Look the entry for just about any word. The entry will detail its origin, in many cases, going back to antiquity. Why is this important? For two reasons. As most students cramming for standardized tests know, a command of roots and affixes will boost scores. Or, rather, a deeper understanding of structure and history of one's language allows one to infer the meaning and use of unfamiliar words. Word histories also play a critical role in the construction of the dictionary. The authors do not have the authority to simply invent words. That's best left to sort of person who invents constructed languages like Esperanto or Middle Earth Elvish. Authors of dictionaries compile vocabularies from words actually in use or which have been used in the past. To do this, they painstakingly develop concordances or lists or words in literature which include the context in which they were used. That brief origin you see in a dictionary is the end result of years of hard research in the literary tradition of a language. The first appearance of a word can often mark momentous events in history. For instance, the English loanword, honcho. Growing up, I thought the word was Spanish in origin. In fact, the word came into the American English lexicon during the occupation of Japan. It is in fact a contraction of a Japanese phrase meaning, literally, 1st man, or foreman.
In my career, I have have the fortune to work on five dictionary projects. Usually, I played a role in producing them in their final representation. My work was typically akin to that of a production editor, though my I work mainly with electronic documents. Dictionaries are monstrous projects undertaken only the committed or the insane, often both. We all know that one of the most prolific contributors to the Oxford English dictionary was an institutionalized mad man. That probably explains a lot about the English language. The degree of commitment required borders on monomania because small dictionaries often take twenty five years of continuous field work to complete. On one dictionary I worked on I made the mistake of complaining that I had been working on my part for nearly a year and was still not done. The author smiled and told me he had just spent thirty years collecting the material for what he considered to be a first and very rough draft.
So, what is it like putting the finishing touches on a project like this? Nerve-wracking. First, you're acutely aware that you hold someone's life work, they legacy, in your hands. That never gets old for me. Second, you wind up dealing, one way or another, with every decision made in the project's lifespan and with the politics of language. In short, you have to come to grips with history in a very real sense. At the end of the project, you have to deal with the ways in which a language's orthography may have changed over time. This was very much the case with a Tamil dictionary I worked. The rules for Latinization(the process of converting foreign scripts to the Roman Alphabet) had been in flux nearly the whole time the project was collecting words. At the end of it all, it fell to one computer geek who is, at best, fluent in two languages, neither of which is Tamil, to bring order out of linguistic chaos. It's amazing what proficiency in Perl and rudimentary knowledge of Regular Expressions can accomplish. At other times, working on a dictionary can be uproariously funny. While working on a multilingual dictionary of Yoruba dialects, I discovered that there was a whole section on farting, including a word for 'ensemble farting'. One of my favorite words in that language is a word that describes 'stumbling while drunk'. It's onomatopoeic in that it mimics the rhythm of an uncoordinated walk. While struggling to pronounce that one, I beat at the time and discovered that the syllables hit in between the beats. You also discover that, even after thirty years or hard work, the dictionary may be incomplete. This can happen when you're only forty eight hours from publication. This happened with the Yoruba dictionary I worked on a couple of years ago. Yoruba is not one language; linguists describe it as a dialect continuum. That is to say that there are multitude of languages in the group which start out mutually intelligible at the center but diverge the farther out from the central one. Yoruba dialects have been flung far and wide. One such was Lucumi which was once spoken widely in the Spanish colony that would become Cuba. Today Lucumi is the liturgical language of the Afro-Caribbean religion, Santeria. At the eleventh hour, I discovered that while the translators had translated from Lucumi to Cuban Spanish, they had apparently refused to translate some words and phrases all the way to English. These words and phrases tended to have a deep religious significance. I went into a sort of academic hyper-drive. I quickly rounded up a coworker fluent in most forms of Caribbean Spanish and headed off to the library to check out every single book I could find on Cuban Spanish, Santeria, and Voodoo. This followed closely on the heels of a forward I wrote for an obscure publication about Information Theory. That project led me strip the library's collection of books on Information Theory, Number Theory and Cryptography and I still had those on my desk. I'm surprised I'm not on a 'no-fly' list following that. After a marathon effort of translation and anthropological research along with calls to 'Rick's Occult Shop' in Philly, we manged to complete most of the missing entries. We brought dictionary to publication status with a good three hours to spare.
That project, more than any, brought home the power of language to record history. In writing what amounted to a 'jacket description', I had to tell tale of the spread of these languages. The linguistic diaspora of Yoruba includes the East coasts of North and South America. The tale of the spread of Yoruba dialects recounts the tale of the Triangular Trade and all its horrors. The tale told is one of redemption and the human struggle for self determination. I have not been able to look at dictionary as a simple list of words since I wrote that description. I now see dictionaries for the historical documents they are and am aware of their immense power as political instruments.