Learning how to read a language is fundamentally different from learning how to speak it. As children, we pick up languages simply by being exposed to people speaking them. But when it comes to reading, we don't learn how to read simply by being exposed to symbols – we have to be taught the system that links the specific sound to every symbol. This system is called the “writing system(s)”, and there are many different kinds used all over the world.
Some of the language writing systems are alphabetics, some are syllabic, and others make use of both sound and meaning. With this great diversity in systems of writing, we might expect there to be variation in how easily children (and adults) can learn how to read. In this article, I explore what factors can make a written language easy (or difficult) to learn how to read. I also rank several world languages on a scale of difficulty using the factors that I've discussed.
Sounds & Symbols
Before we can begin talking about reading and writing, I need to make a very important distinction. Whether we’re reading English, Chinese or Swahili, when we read letters or characters on a page, we’re not looking at sounds or words – we’re looking at representations of sounds or words. This point cannot be overemphasized. So when we read English, what we’re actually doing is decoding symbols. And similarly, when we write, we are actually encoding sounds with symbols that will later be decoded. But what “sounds” are we coding, exactly? The simple answer is that we’re encoding (information about) the sounds of our language. For the sake of brevity for this article, I’m going to be extremely general and call these language sounds “phones”.
In most varieties of American and British English, there are around 36 of these “phones”, or unique sounds that can be combined together in all sorts of ways to form words. The English writing system is used to encode these “phones” and the English words that are composed of them. Take for example the word “cat” – intuitively, we can think of this word as having three unique sounds strung together: “c” makes the sound [k], “a” makes the sound [ae], and “t” makes a unique sound [t]. In this case, each written letter in “cat” corresponds to one unique sound, and probably all speakers of English would agree with this.
In another example, for the word “weigh”, there are several silent letters that don’t get pronounced. If asked, most English speakers would probably identify two or possibly three unique sounds here, but certainly no one would claim that every letter in “weigh” is assigned an individual sound.
Alphabets & the sounds they encode
These sounds that make up the words we speak and the sentences that we utter are encoded using the written (or typed) symbols in our writing systems. Any intuitions you might have about how this process works are probably fairly sharp. The writing system for English, for example, is alphabetic, but to be more specific, I’m going to call alphabetic writing systems phonographic.
This label simply means that the most basic unit of language encoded (“graphed”) by the writing system is a “phone”, which is what I’m using to refer to the smallest unique sound unit in a language. Generally speaking, if not always, languages that use alphabets can be said to use a phonographic writing systems. And to be dangerously more technical: though I speak of the writing system as belonging to the language, it’s actually the speakers of the language that use the writing system. (There’s no requirement that a language must be written, or that it must be written using a particular writing system).
So far we’ve established that languages like English use phonographic writing systems, where each written symbol (ideally) encodes one “phone” -- otherwise known as an individual sound. Even among languages that use alphabetic orthographies (“rules of writing”) like English, there is a great degree of variation in what is called the transparency of the writing system, or orthography. To understand what transparency refers to, let’s return to the two examples I considered earlier: “cat” and “weigh”.
When we read “cat”, it’s fairly easy to decode what sounds correspond to what letters; however, for “weigh” the process is a little less clean. Now, if you can imagine a language where the reading process is very straightforward because there’s very little uncertainty about what sounds are being represented by the symbols, you’ll have a good sense of what a “transparent orthography” is. Similarly, if you can imagine a language (like English, for example) where there is not a clean one-to-one correspondence (match) between sounds and symbols, then you’ll have a good sense of what an “opaque” (non-transparent) orthography is.
Which alphabetic languages are more difficult to learn how to read compared to others?
If we take a moment to think about transparency, we’ll probably develop this intuition: it’s easier to learn how to read languages with more transparent orthographies, and more difficult to learn how to read languages with less transparent orthographies. And indeed, this is generally the case, at least for children learning to read their first language.
In a study of several European languages, Seymour (2005) reported the scores of children who took a standardized test of reading proficiency in their native language. These reading tests were standardized for word frequency and other factors that allowed them to be directly comparable across languages. Seymour found that the scores seemed to relate directly to what he called the “depth” of the orthography used for writing each language. (What Seymour meant by “depth” is similar to what I mean by “opacity” here – the “deeper” the orthography, the harder it is to guess how a letter is supposed to be pronounced.)
The scores Seymour reported for each language, and the language’s associated orthographic depth, are given below.
Before we proceed, I’d like to point out just a couple of things here. First, the depth (or transparency) of a writing system is not a statement about how good or bad a writing system is, nor is it a claim about a language’s level of sophistication. It’s simply an observation about how directly the writing relates to the sounds of the language it’s used to encode. A second important point here is that Seymour does not tell us exactly how he calculated orthographic depth.
Although it seems like we can look at how a language is written and possibly assign a rating between 1 and 5, to make a more compelling argument for which alphabetic languages are most difficult to learn to read, we will want a slightly more objective way to measure transparency. As they stand now, Seymour’s depth ratings seem intuitive enough, but they also seem a little arbitrary.
To give a more straightforward way of evaluating how hard an alphabet is to read, I’m going to appeal to a measure used by Altmann (2008), which is called “orthographic uncertainty”. Orthographic uncertainty basically measures how “uncertain” you are (or how many possible options you have) about pronouncing a letter when you see it. I won’t go into detail about how this measure is calculated, but to summarize, orthographic uncertainty measures how many possible pronunciations are linked to each written symbol in a language’s alphabet.
If we do this for all of the languages reported in Seymour’s study, we find a significant and robust correlation between Seymour’s depth ratings and my orthographic uncertainty calculations. (This isn’t surprising, given the similarity between the two concepts.) The figure below illustrates this correlation, and also allows us to visualize the relationship between reading proficiency and orthographic uncertainty. The pattern here is quite striking: the greater the orthographic uncertainty (that is, the less transparent a writing system is), the lower the reading score.
I suggest that we can leverage this relationship to make conclusions about how difficult languages are to learn how to read – both for adults and for children. The languages at the top left of the figure are the ones that children can learn to read earlier and to a higher level of proficiency than languages at the center and/or towards the bottom right of the figure. From this analysis, I think it’s safe to conclude that languages like French, Danish and English present a much greater challenge to learners when it comes to attaining reading proficiency. And this challenge may be directly related to how relatively non-transparent a language’s writing system is.
This article looked at reading as a process of decoding information about sounds and words from symbols; as well as writing as a process of encoding linguistic information with symbols. Here we looked specifically at languages that use alphabets, and discussed orthographic transparency – one important factor that influences how easy (or difficult) a language is to learn how to read. We considered how transparency related to children’s reading proficiency scores in a study, and used this information to make conclusions about which languages are more or less difficult to learn how to read in general. In concluding this article, I’d like to open up a point for further discussion.
As it turns out, the alphabet isn’t the only type of orthography out there. There are other types of writing systems with different properties and different relations to the languages whose information they encode. There are syllabaries, systems that use the syllable as the most basic written unit. There are logographies, systems that use the word as the most basic unit. There are also mixed writing systems, which use phonographic, syllabic or logographic writing (Japanese is perhaps the most iconic example).
Although this article explored only languages written with alphabets, we can still generalize to other kinds of orthographies by leveraging orthographic transparency together with something called graphemic inventory – a measure of how many unique symbols a writing system makes use of. Some may argue that languages that don’t use alphabetic writing are harder to learn because readers have to remember more symbols. And indeed, if we look at a language like Chinese, we’ll see that readers need a command of about 2,500 different symbols for the most common reading activities. (I calculated this number from a database of hundreds of Chinese movie subtitles).
But an important observation is this: generally speaking, the larger the number of symbols a writing system makes use of, the more transparent the writing system is.
Although Chinese has a large symbol inventory, with very few exceptions each symbol has only one articulation. (I’m not taking dialectical variation into account – different languages/dialects can certainly share a common writing system). This tendency is intuitive. Imagine what would happen if you wrote English using one unique symbol for every unique syllable – you’d end up with a huge collection of symbols, but each would have only one pronunciation.
In the discussion I’ve presented here, I’ve suggested that orthographic uncertainty – or how “uncertain” you are about the pronunciation of a letter, character or symbol you encounter while reading – is strongly related to how difficult a language is to learn how to read. In reality, however, any language can be written with any writing system – it’s just that some writing systems are more suitable for certain languages than others.
For instance, Japanese has vastly fewer possible syllable types than, say, English, so using a syllabary for Japanese makes sense, while using one for English would prove too cumbersome to be practical. So the question of which language is most challenging to learn to read is really a matter of the language, the writing system, and how transparent the relation is between the two. And ultimately, no matter which languages you choose to study, learning how to read is going to require a considerable amount of patience, time, and practice.
Bernhard, G. & Altmann, G. (2008). The phoneme-grapheme relationship in Italian. In Analyses of Script: Properties of Characters and Writing Systems. Berlin: Mouton de Gruyter, 13-24.
Seymour, P. H. K. (2005). Early reading development in European orthographies. In The Science of Reading: A Handbook. Blackwell, 296-315.