The story of a humble Papadum

Etymology, according to Google, is the “study of the origin and history of words and their meanings”. Words in languages don’t emerge out of thin air; there is always a source for every word, and simply everyday words can hide very interesting histories within. In this post, I’ll use the Tamil word அப்பளம் appaḷam ‘Papadum’ as a case study, and trace its history to explain four historial sound changes in Indo-Aryan and Dravidian. This post will be rather involved with respect to phonetic jargon, so I advice the layperson to first read my earlier post explaining much of this jargon.

Our story starts at the Sanskrit word पर्पट​ parpaṭa, meaning the same thing that the Tamil word does today – a thin, crispy cake of rice. The first step is tracing this word encountering two sound changes to become pappaḍa in Middle Indo-Aryan.

First, let me explain what Middle Indo-Aryan is. The history of the Indo-Aryan language family is categorised into three stages: Old Indo-Aryan (OIA), which includes Vedic Sanskrit and Classical Sanskrit (though it’s more complicated than this simple statement); Middle Indo-Aryan (MIA), which includes the descendants of Sanskrit, including the various Prakrits and Apabhramsas like Pali, Magadhi, Sauraseni, Maharashtri, Elu Prakrit, etc.; and New Indo-Aryan (NIA), the modern descendants of those Middle Indo-Aryan languages, including modern Hindustani, Gujarati, Punjabi, etc.

Now that we know what Middle Indo-Aryan is, we can move on to the sound changes that occurred as OIA moved to the MIA stage, that made the OIA word parpaṭa into पप्पड pappaḍa. The first is consonantal assimilation. Assimilation, as its name suggests, is simply one consonant becoming more like another. Prakrits are characterised by large-scale assimilation of consonant clusters (two consonants occurring right next to each other). In this case, the –rp– cluster became –pp-, with the /r/ becoming more like /p/. Other examples for assimilation in MIA abound: OIA पत्र patra ‘leaf’ to MIA पत्त patta, OIA धर्म dharma ‘righteousness’ to MIA धम्म dhamma, OIA कर्म karma ‘work, deed’ to MIA कम्म kamma ‘work, deed’, OIA अश्रु aśru ‘leaf’ to MIA अस्सु assu ‘tear’ (as in ‘teardrop’), OIA भक्त bhakta ‘share, portion’ to MIA भत्त bhatta ‘share, portion, meal’, OIA sarpa सर्प ‘snake’ to MIA सप्प sappa ‘snake’, OIA सप्त sapta ‘seven’ to MIA सत्त satta. In each case, one consonant in the cluster becomes more like the other, or in other words assimilates to the other, so as to form a geminate.

The second sound change from OIA to MIA that we are concerned with, is intervocalic voicing of stops. “Intervocalic” means “between vowels”. “Voicing” and “stops” are both terms that are explained in my earlier post that I linked to in the beginning of this post. MIA made voiceless stops that existed between vowels into voiced stops. In this case, the voiceless retroflex stop between vowels, –-, was voiced to –-, a voiced retroflex stop. Another example is OIA पठति paṭhati ‘he reads’ becoming पढदि paḍhadi ‘he reads’ in a certain MIA language of which I’m not entirely sure (could be Pali, could be Sauraseni) Hence we arrive at pappaḍa, the attested MIA form of OIA parpaṭa.

Now, we are much closer to the Tamil word, appaḷam. But two challenges remain. How did the initial p– disappear into thin air, and how did -ड- –– become -ळ- –-, a retroflex lateral approximant (ळ, the “hard” l, in Marathi, Haryanvi, Gujarati, etc.)? The latter is easily explained. Intervocalic voiced retroflex stop, –– that is, does have a history of becoming –– in Indo-Aryan. There are other examples of this. The most common example would be in the Hindustani word for the number 16, solah. This word is from Sanskrit षोडश ṣōḍaśa, which became सोळह sōḷaha, and then eventually सोलह् sōlah. Another example is the Sanskrit word तडाग taḍāga, which became तळाअ/तळाव taḷāa/taḷāva, which eventually became modern Hindustani तलाव् talāv. And thus, pappaḍa became pappaḷa.

Now, we are at pappaḷa. The only puzzle left is the initial p– seemingly magically disappearing into thin air. The secret to solving this puzzle is that there is in fact an intermediary between Sanskrit and Tamil, through which this word passed before coming to Tamil. That intermediary is Kannada. This word, pappaḷa (along with the pappaḍa variant) was loaned to Old Kannada; perhaps it was Maharastri Prakrit that Old Kannada borrowed this word from. Old Kannada, while on its way to the Middle Kannada stage, underwent the sound change of transforming all p- in the beginning of a word to h-. This occurred in all words of the language that began with a p-, whether they were native Dravidian or borrowed from Indo-Aryan. One could go on forever listing examples of this, just as one cound for consonant assimilation in Middle Indo-Aryan, but here are just a few of them: Kannada ಹಲ್ಲು hallu vs. Tamil பல் pal ‘tooth’, Kannada ಹೆಬ್ಬರು hebbaru ‘the great one, a caste’ vs. Tamil பெரியவர் periyavar ‘the great one’, Kannada ಹೆಸರು hesaru vs. Tamil பெயர் peyar ‘name’, Kannada ಹೊಸ hosa vs. Tamil புது pudu ‘new’, Kannada ಹುಡುಗ huḍuga ‘boy’ vs. Tamil பொடியன் poḍiyan, and so on. As for words borrowed from Old Indo-Aryan that have undergone this change, you have Kannada ಹಸು hasu ‘cow’ from Sanskrit paśu, Kannada ಹಬ್ಬ habba ‘festival’ from Sanskrit parva, and so on. In our case, the word borrowed from Prakrit as pappaḷa became happaḷa. In fact, the Kannada word for a papadum is indeed ಹಪ್ಪಳ happaḷa today.

But we need to go a step further. This Kannada word was then borrowed into Tamil. Tamil, unlike Kannada, doesn’t have a /h/ sound, and Tamils couldn’t reproduce it. They took the simple solution – simply deleting it. Therefore, Kannada happaḷa when borrowed into Tamil became appaḷam, with the typical –am suffix in Tamil added, marking the non-human gender (papadums are non-human objects, after all).

We have thus traced the history of the Tamil word appaḷam from Sanskrit parpaṭa through four sound changes and one intermediary. But let’s go one step further. The MIA form of parpaṭa, as I mentioned, was pappaḍa. Also keep in mind the fact that Middle Indo-Aryan became New Indo-Aryan. So what did the MIA word pappaḍa become in New Indo-Aryan languages? For this I must introduce another sound change, and another phonetic phenomenon in Indo-Aryan (and Dravidian for that matter). The sound change in question is the reduction of geminates to singleton consonants and the consequent compensatory lengthening of vowels in New Indo-Aryan. This is a pervasive sound change affecting a large variety of New Indo-Aryan languages.

What does it mean? Examples should make it clearer. I’ll use the same examples I used earlier to demonstrate assimilation. MIA bhatta भत्त ‘share, portion, meal’ became Hindustani भात bhāt ‘cooked rice’, MIA सप्प sappa to Hindustani साँप sā̃p ‘snake’, MIA satta सत्त to Hindustani सात sāt, MIA कम्म kamma to Hindustani काम kām, MIA धम्म dhamma to Hindustani धाम dhām (as in कामधाम kāmdhām), MIA अस्सु assu ‘tear’ to Hindustani आँसु ā̃su ‘tear’. As you see, when a short vowel was followed by a doubled (geminate) consonant, the geminate consonant reduced to a singleton, and the preceding short vowel lengthened. This was sometimes accompanied by the vowel gaining nasalisation, a phenomenon called spontaneous nasalisation.

That’s the sound change. What about the phonetic phenomenon? That is the fact that retroflex consonants, when pronounced between vowels (intervocalic retroflex consonants), are flapped. Now the consonant ड in Hindustani, at the beginning of a word, is pronounced by curling the tongue to behind the alveolar ridge in the mouth, placing the tongue tip at the roof of the mouth, generating pressure and then instantly releasing that pressure. It is therefore a plosive or a stop. On the contrary, ड़ is a flap consonant – it is pronounced at the same region where ड is, but instead of the tongue tip generating pressure before the pressure being released, the tongue tip hits the top of the mouth once. There is no pressure buildup and release. The tongue tip flaps against the roof of the mouth.

Retroflex consonsants in Indo-Aryan, whether they be stops, nasals, or lateral approximants, are all flapped when they occur between vowels. So –– in pappaḍa, since it occurs between vowels, was pronounced as –-. This ड़ , in Hindustani, became a phoneme. That is, it in itself became a fundamental speech unit, and is because of this written separately from ड .

Now, having consider both of these phenomena, what do you think happened to pappaḍa? First let’s apply the sound change. papp– therefore becomes pāp-. Then –– becomes –-. Therefore you land at पापड़ pāpaṛa. Run that through schwa syncope, and you arrive at पापड़् pāpaṛ, the modern Hindustani term for Pappadam.

We began at the Sanskrit word पर्पट parpaṭa. After going through four sound changes and one intermediary, we arrived at Tamil அப்பளம் appaḷam. Meanwhile, after going through three sound changes and one phonemicization, we arrived at Hindustani पापड़ pāpaṛ. Each word in a language is a microcosm of the numerous sound changes that have occured as that language has evolved over the centuries and millennia. As you have seen in this post, a humble Papadum contains enough linguistic phenomena within it that I’ve written an entire blog post on it. This is why I love etymology, and historical linguistics, so much.

PS – in my dialect of Tamil, Papadams are அப்பளாம் appalām, and not அப்பளம் appaḷam. I don’t know why.

Sriviyaja, and Nagapattinam

This post is a continuation to the post I’d written a few weeks ago about the Ashokan edict at Nalasopara, near Mumbai. This post will be about another exhibit I saw at the museum when I went there two months ago.

Apologies for the horrible image, the statue was behind a glass shelf and I couldn’t take a picture without everything in front of it being reflected.

When I first saw this statue, I didn’t think much of it. It looks like a rather mundane statue of the Buddha, nothing out of place in a museum. But the tag below it surprised me.

This statue was found in Nagapattinam of all places, which, as I’d learn later, was a major centre of Buddhist learning once. But that’s not the amazing thing about this statue. This statue was found in a Buddhist monastery in Nagapattinam, called Chudamani Vihara. Chudamani Vihara was build in Nagapattinam, Cōḻa territory at the time (10-11 centuries), built by the Srivijayan king Sri Vijaya Maravijayattungavarman with the patronage of Rājarāja Cōḻan.

The rest I’m paraphrasing from whatever I managed to note down: the hands, topknot, broad face, long eyes and ears of the statue are South East Asian in style. The statue incorporates South East Asian and South Asian art, as you’d expect for a statue constructed by a Srivijayan king in Nagapattinam. Srivijaya, for the uninitiated, was a city state in modern-day Indonesia which dominated much of South East Asia and beyond for several centuries. And, of course, the statue showcases the close ties between South Asians (especially Tamils) and South East Asians in the days of yore.

Tamil Sandhi

Sandhi is a term most Indians only hear with reference to Sanskrit, but sandhi is in fact a cross-linguistically common phenomenon. Tamil has its own native sandhi, governed by its phonotactics. In this post I’ll describe and explain one type of Sandhi in Tamil, and present a problem regarding it which I’ve been pondering on for a while and which I have discussed with some others.

Before I go on, let me clarify the notation, as IPA is rather inconvenient for this purpose. ல் is l, த் is t, ன் is , ற் is , ள் is , ண் is , and ட் is .

In short, the problem is that two instances of Sandhi in Tamil produce two results each. த் + ல l + t gives ன்ற ṉṟ in some cases, but gives ற்ற ṟṟ in others. Also, ள் + த + t gives ண்ட ṇṭ in some cases but ட்ட ṭṭ in others. Why two results for the sandhi? What is the conditioning environment when you get one result versus the other? This is an open question. I’ve asked actual linguists and only received guesses yet.

That was the TLDR. Those with knowledge of Tamil and Dravidian linguistics would understand what I’m talking about and that the issue is one of voicing assimilation. For those that don’t, I’ll explain what exactly is happening.

Old Tamil phonotactics had several strict rules. It allowed only a few types of consonant clusters in the middle of the word. One rule relevant to this matter is that each consonant cluster must have uniform voicing – both consonants either voiced or voiceless. The other relevant rule is that when it comes to voiced clusters, Old Tamil disallowed an lateral approximant (ள் and ல்) appearing right before a stop, without the lateral approximant morphing in some way.

Hence, whenever a lateral approximant did occur before a stop, Old Tamil changed the lateral approximant in some way – either by devoicing it, or by changing its place of articulation.

With that in mind, let’s look at what happening in ல் + த l + t that makes it ன்ற ṉṟ. The /l/ there was a voiced alveolar lateral approximant, and since the t was preceded by an alveolar consonant, it assimilated and became ற is . So now, we are at lṟ. This was an alveolar stop consonant in such positions in Old Tamil, which inherited this phoneme from Proto-Dravidian *. Now, let’s consider which phonotactic rule might apply to lṟ. is a stop, we know that, and l is a lateral approximant, and in this case, the entire cluster is voiced, which means that the is also allophonically pronounced as a voiced alveolar stop. Since lateral approximants aren’t allowed in such positions, the voiced alveolar lateral approximant becomes a voiced alveolar nasal – by doing so, it maintains the voicing in the cluster and also the place of articulation, but morphs only in terms of manner of articulation.

Hence, l + tlṟṉṟ. The first step is assimilation of t with the place of articulation of l, and the second step is l morphing its manner of articulation since lateral approximants are disallowed before stops.

The exact same thing happens with ள் + த + t that makes it ண்ட ṇṭ. The t assimilates to the retroflex place of articulation of , and then the changes its place of articulation to prevent a lateral approximant occurring before a stop. Hence, + tḷṭṇṭ.

Let’s look at examples of these.

  1. வெல் vel ‘win ‘ + து tu ‘past tense marker’ → வென்று veṉṟu ‘having won’
  2. செல் cel ‘go’ + து tu ‘past tense marker’ → சென்று ceṉṟu ‘having gone’
  3. கொல் kol ‘kill’ + து tu ‘past tense marker’ → கொன்று koṉṟu ‘having killed’
  4. நில் nil ‘stand’ + து tu ‘past tense marker’ நின்று niṉṟu ‘having stood’
  5. முயல் muyal ‘exert’ + து tu ‘past tense marker’ → முயன்று muyaṉṟu ‘having exerted’
  6. நீள் nīḷ ‘length, to lengthen’ + து tu ‘past tense marker’ → நீண்டு nīṇṭu ‘having lengthened’ (intransitive)
  7. கொள் koḷ ‘hold’ + து tu ‘past tense marker’கொண்டு koṇṭu ‘having held’
  8. இருள் iruḷ ‘darken’ + து tu ‘past tense marker’ → இருண்டு iruṇṭu ‘having darkened’
  9. ஆள் āḷ ‘rule’+ து tu ‘past tense marker’ → ஆண்டு āṇṭu ‘having ruled’

Alright, so far so good. We have proper explanations for what is happening in these instances of sandhi. However, the story doesn’t end here. As I mentioned, the whole issue is that these sandhi have two possible results; this is one result, what about the other? What is happening when ல் + த becomes ற்ற ṟṟ, or when ள் + த + t becomes ட்ட ṭṭ?

What is happening is that instead of the alveolar approximant in the cluster maintaining its voice but changing its place of articulation to a nasal in the first pair as per phonotactics, in this case, the alveolar approximant loses its voicing and becomes a stop, hence resulting in a geminate stop. Examples:

  1. வில் vil ‘sell’ + து tu ‘past tense marker’ → விற்று viṟṟu ‘having sold’
  2. கல் kal ‘learn’ + து ‘past tense marker’ → கற்று kaṟṟu ‘having learned’
  3. கேள் kēḷ ‘hear, ask’ + து ‘past tense marker’ → கேட்டு kēṭṭu ‘having heard, asked’
  4. வெல் vel ‘win’ + தி ti → வெற்றி veṟṟi ‘victory’
  5. இருள் iruḷ ‘darken’ + து tu → இருட்டு iruṭṭu ‘darkness’
  6. நீள் nīḷ ‘length, to lengthen’ + து tu ‘causative marker’ → நீட்டு nīṭṭu ‘to extend’

See, there are two possible results for each of the two sandhi. The same suffix, the –t– past tense marker, triggers two kinds of sandhi in different roots. In addition, in the roots vel ‘win’ and iruḷ ‘darken’, sandhi in the verbal derivations go one way (lateral approximant becoming a nasal), while sandhi in nominal derivations go another (lateral approximant losing voice to become a stop). This is precisely the problem. Why exactly is this? Why two results, and what are the causes of one result occurring over the other?

Going beyond this problem, all that I’ve written yet has only been concerning lateral approximants occuring before coronal consonants – dentals, alveolars and retroflexes. But what about alveolar approximants occurring before the bilabial and velar stop, and palatal affricate? In most such cases, the latter kind of sandhi occurs, where the lateral approximant becomes a stop. Examples:

  1. கல் kal ‘learn’ + பி pi ‘causative marker’ → கற்பி kaṟpi ‘teach’
  2. முயல் muyal ‘exert’ + சி ci → முயற்சி muyaṟci ‘effort, perseverance’
  3. ஆள் āḷ ‘rule’+ சி ci → ஆட்சி āṭci ‘rule’ (noun)
  4. நீள் nīḷ ‘length, to lengthen’ + சி ci → நீட்சி nīṭci ‘length, elongation, extension’
  5. இயல் iyal ‘to be possible’ + கை kai → இயற்கை iyaṟkai ‘nature’
  6. ஆள் āḷ ‘person’ + கள் kaḷ ‘plural marker’ → ஆட்கள் āṭkaḷ ‘people’

You may notice that முயல் muyal ‘exert’ + து tu → முயன்று muyaṉṟu above, but முயல் muyal + சி ci → முயற்சி muyaṟci ‘effort, perseverance’ here. Similar instances occur with ஆள் āḷ ‘rule’ and நீள் nīḷ ‘length, to lengthen’. However, I’m not considering these a part of the problem of two results for Sandhi, as here its the c palatal affricate coming second in the consonant cluster, which predictably leads to the lateral approximant losing voice and becoming a stop.

But, as always, there are exceptions. Is there any natural language without exceptions? This exception is to do with Modern Tamil, which contains the word பசங்க pasaṅga​(ḷ) ‘children, boys’. This word is inherently plural, and doesn’t have a singular of the same root. According to the Madras Uni Tamil Lexicon, pasaṅga(ḷ)​ < pacaṅkaḷ < pacalkaḷ. The –kaḷ is the plural suffix. What is striking here, is that pacal + kaḷ did not become pacaṟkaḷ but rather pacaṅkaḷ, perhaps through an intermediate step of pacaṉkaḷ.[1] The word pacal/pasal is not in use in Modern Tamil, but what is in use is the word payal ‘boy’. This hearkens back to the common -c-/-y- alternation in Tamil (and South Dravidian), which is also seen in words such as ucir/uyir, ucaram/uyaram, vācal/vāyal, and in Kannada cognates of Tamil words, such as Kannada hesaru ‘name’ for Tamil peyar ‘name’, Kannada mosaru ‘curd’ for Tamil mōr (< *moyar) ‘curd’, and Kannada basiru ‘stomach’ for Tamil vayiṟu. The word with the –c– → –y– sound change has continued to be in use, while the older word with the –c– intact remains in a fossilized expression in the plural. There is probably a reason for this, perhaps to do with dialectal intermixing, but that’s beyond my knowledge.

Finally, I should also mention that one of the aspects which differenciates Malayalam and Tamil is that Malayalam does not make lk into ṟk (Govindankutty, 1972).


Since writing this post I have found more information on this topic. According to Krishnamurti (2003), these l// and // stem alternations can be reconstructed back to Proto-Dravidian itself. He cites the Tamil verb niṟuttu ‘stop’, which can be analysed as a causative of nil ‘to stand’, as niṟ (< nil) + -ttu. He suggests that the origin of these stem alternations is at an ever earlier stage of Proto-Dravidian than current reconstructions. Once the internal Sandhi phenomena had altered stems in certain verbal forms in Pre-Proto-Dravidian, these Sandhi-fied (altered) stems were reanalysed as unbound stems in their own right, and new verb stems and derivations (both verbal and nominal) were formed using them. Or otherwise, and this part is my hypothesis, if one result of the Sandhi was already “occupied”, then the other result would be used to prevent homophones from occurring. For example, if iruḷ ‘to darken’ became iruṇṭu ‘having darkened’ with the past suffix –tu added, then the nominal derivation with a homophonous –tu would become iruṭṭu ‘darkness’ to prevent them from being homophones.

References

  1. Tamil Lexicon
  2. A. Govindankutty. (1972). From Proto-Tamil-Malayalam to West Coast Dialects. Indo-Iranian Journal, 14(1/2), 52–60.
  3. Bhadriraju Krishnamurti. (2003). The Dravidian Languages. Cambridge Language Surveys. Cambridge: Cambridge University Press.

International Phonetic Alphabet – Consonants

This post is meant as an introduction to consonants in a phonetic context, along with their IPA representations.

Before moving on to the IPA itself, there are two types of IPA transcriptions: Phonemic and phonetic. Phonemic transcriptions transcribe the underlying phonemes (read up on Wikipedia if you don’t know what phonemes are), while phonetic transcriptions transcribes exactly what is spoken. // denotes phonetic transcription, and [], phonemic.

For instance, ‘cat’ is /kæt/ but [kʰæt]. In phonemic transcription, aspiration which is not a fundamental aspect of the consonant and hence not a phoneme, is not written.

With that sorted, we move to consonants. The table below has all of the pulmonic consonants, but for all beginner purposes, these are enough. Once you learn these, learning further on your own would be easy. Also, I will not be explaining what each and every IPA character represents – only when it is not intuitive or obvious.

Image result for IPA chart

Consonants are defined by three things: voicing, place of articulation and method of articulation. The phonetic terms given to consonants have these three components, in that order. For instance, /l/ is the voiced alveolar lateral approximant. By the end of this post, you should be able to understand what each of those three components means.

Voicing

Voicing is simple. It means the vibration in your larynx while pronouncing a consonant. Consonants pronounced with a vibrating larynx are voiced. Those without it vibrating are voiceless. Many consonants come in pairs where one is voiced and one is voiceless: /v/ and /f/, /b/ and /p/, /d/ and /t/, /g/ and /k/. You can see for yourself which one is voiced and which is voiceless, by touching your larynx (the Adam’s apple) while articulating them. In a conventional IPA table, as the one above, voiced and voiceless pairs are written together in the same cell. You can see yourself, from the note below, that the ones in the right are voiced.

Place of articulation

Place of articulation is also simple. It’s just where in the mouth that sound is articulated, using the various components of the mouth: lips, teeth, tongue, uvula, pharynx, etc. In conventional IPA tables, the columns contain the places of articulation. So let’s go column by column.

Labial consonants involve the lips, in one way or another. Bilabials involve both lips. Pronounce /p/, /b/ and /m/, and see how you do it. The tongue is not involved at all, is it?

Labio-dentals involve lower lips and upper teeth. Pronounce /f/ and /v/ (/v/, not /w/), and again see how you do it.

Next, all the coronal consonants. These are all consonants articulated with the front part of the mouth; they include all dental, alveolar, post-alveolar and retroflex consonants. The Arabic ‘sun letters’ are all coronal consonants, while the ‘moon letters’ are non-coronal.

Dental consonants are articulated with the tongue at the teeth. Pronounce /t̪ / and /d̪ / – these are just the Indo-Aryan and Dravidian “dentals”. The IPA notations of dental ‘t’ and ‘d’ are ‘t’ and ‘d’ with a square upside-down U diacritic underneath. Do not confuse this for a similar looking square rightside-up U; that’s a different diacritic.

Other dental characters are /θ/ and /ð/. /θ/ is the ‘th’ in English ‘thick’, or Arabic ‘hadiith’. /ð/ is the ‘th’ in English ‘then’, or ‘dh’ in Arabic ‘Riyadh’. All of them are pronounced with the tongue at the teeth. Both of these voiceless-voiced pairs. Pronounce /t̪ / and /d̪ / one after the other. Do you see that the only difference is in the larynx’s vibration? Now guess which one is voiced and which one voiceless among these two, and among /θ/ and /ð/.

Alveolar consonants are pronounced with the tongue at the alveolar ridge, which right behind the teeth. English /t/ and /d/, Hindustani /r/, English /l/, they are all alveolar. The General North American /r/ is also alveolar; they are all pronounced at the alveolar ridge. The difference between alveolar /t/ & /d/ and dental /t̪ / and /d̪ / is all about the place of articulation. The latter are pronounced at the teeth; the former behind the teeth, at the alveolar ridge. Again, voiced-voiceless pair, but that should be obvious by now.

Post-alveolalar is just as it sounds – behind the alveolar ridge. /ʃ/ is the standard ‘sh’ in English ‘shirt’, and /ʒ/ is ‘s’ in ‘measure’ or ‘Asia’. Pronounce both one after the other and hold your hand to your throat. Again, do you see that the only difference is vibration, AKA voicing? These two are post-alveolar consonants. Now pronounce /tʃ/ (‘ch’ in ‘child’) and /dʒ/ (‘j’ in ‘jump’). These two are affricates, which are represented in IPA using dual characters for a reason which we will get to. These are again pronounced behind the alveolar ridge, and once again another pair of voiced-voiceless consonants. Your turn to figure out which is voiced and which is not.

Two more frequent terms when speaking of alveolars, post-alveolars, and dentals are apical and laminal. Apical consonants are coronal consonants in which the tip of the tongue touches the roof of the mouth or the teeth. All Indo-Aryan and Dravidian ‘retroflexes’ are apical: the tip of the tongue touches the roof of the mouth. Meanwhile, laminal consonants are those in which the body, and not the tip, of the tongue touches the roof of the mouth. Indo-Aryan and Dravidian ‘dentals’ are all laminal. Does the body of the tongue not touch the teeth?

Retroflex consonants are pronounced with the tongue literally bending back (or flexing back, hence retro-flex) to touch the roof of the mouth behind the alveolar ridge, or maybe even at the hard palate, which is the dome of the mouth. /ʈ/, /ɖ/ are the characters for the retroflex ‘t’ and ‘d’ in most Indo-Aryan languages. /ɭ/ is the character for the retroflex ‘l’ in Marathi and in Dravidian languages, and /ɳ/ is the retroflex ‘n’ in Gujarati, Punjabi and in Dravidian. Retroflex characters are formed by adding a curl moving to the right of the ordinary dental/alveolar character. Again, the first two are voicing pairs, do you see?

At this point I will again caution you to be very careful about what type of an IPA transcription you are reading or writing: phonemic, or phonetic. To reiterate, a phonetic transcription would transcribe exactly what the sounds that are being pronounced, i.e., the surface level pronunciations. Meanwhile the phonemic transcriptions transcribe the underlying phonemes, and phonemes can be written different in different contexts. For instance, the phoneme /t/ when used in the context of English is alveolar; but when the same character is used for French, /t/ is dental. When speaking of French, one does not need to specify that /t/ is actually [t̪ ] (dental) because there is no other form of /t/ to contrast it; if you see French and /t/, it must be dental, no other option. On the contrary, in Indo-Aryan and Dravidian, /t/ alone could be either [t̪ ] or [ʈ] (dental or retroflex), hence the need for clarifying by adding the dental diacritic.

Another example is of /r/. /r/ when used for English, French and Hindi mean very different things in each language. In English it is [ɹ], in French it is [ʁ], and in Hindi it is [ɾ]. Note that I write phonetic transcriptions with [], and phonemic with //.

Now that we are on this topic, though in both Indo-Aryan and Dravidian the retroflexes are notated with /ʈ/ and /ɖ/, they are differently pronounced in Indo-Aryan and Dravidian. The so-called Indo-Aryan “retroflexes” are actually post-alveolar. Try pronouncing them and pay close attention to where the tongue touches the roof of the mouth. It is behind the zone behind the teeth, behind the alveolar ridge. Most of the Indo-Aryan retroflexes are apical post-alveolar: they are pronounced with the tip of the tongue (apical) touching the region behind the alveolar ridge (post-alveolar). In contrast, in Dravidian, the retroflexes involve curling of the tongue farther back into the mouth, almost near the hard palate.

Palatal consonants are articulated at the hard palate, the dome of the mouth. /j/ (‘y’ in ‘young’) is the most common palatal consonant. Note that in IPA /j/ represents ‘y’, and /y/ is an entirely different sound. /y/ is a vowel, in fact, and we’ll get to it in due time. Do not confuse /j/ and /y/.

There are consonants pronounced between the hard palate and the alveolar-ridge; the post-alveolar consonants I spoke of are these. They may also be called alveolo-palatal and palato-alveolar. They simply mean that the consonant is articulated somewhere between the alveolar ridge and the hard palate. Alveolo-palatals are pronounced closer to the hard palate than the alveolar ridge, while palato-alveolars are pronounced closer to the alveolar-ridge. The previously mentioned /ʃ/, /ʒ/, /tʃ/ and /dʒ/ are all palato-alveolars.

Velar consonants are articulated at the velum. /k/, /g/, /x/ (‘kh’ in Arabic ‘akhbaar’, or ‘ch’ in German ‘nacht’), /ɣ/ (‘gh’ in Arabic ‘ghariib’ and ‘ghalat’), /ŋ/ (‘ng’ in Bengali ‘bangali’) are all velar consonants. Again, /k/ and /g/ are voicing pairs, and so are /x/ and /ɣ/. I’ll leave it to you to figure out which is voiced and which, voiceless.

Uvular consonants are at the uvula, no surprises there. /q/ (/q/ in Arabic ‘qalb’), /ɢ/ (‘q’ in Persian ‘qand’) and /ɴ/ (the second ‘n’ in Japanese ‘Nihon’) are uvular.

Glottal consonants are articulated at the glottis in the throat. /h/ (standard ‘h’ in ‘hello’) and /ʔ/ (Arabic hamza and the pause in ‘uh-oh’) are both glottal.

Phraryngeal consonants are, well, articulated at the pharynx. /ħ/ (Arabic ‘h’ in ‘Muhammad’) and /ʕ/ (‘3’ in Arabic ‘3arab’).

Manner of articulation

The third fundamental aspect of every consonant is how it is articulated at its place of articulation. Take the English consonants – /t/, /d/, /s/, /z/, /r/ and /l/. All of them are alveolar – they are all pronounced at the alveolar ridge. So what’s the difference? Alright, you have voicing, so let’s consider voicing pairs as one entity. You then have {/t/, /d/}, {/s/, /z/}, /r/ and /l/ – still four entities. The answer is that each of these four are articulated at the alveolar ridge in different manners. Similarly, /p/, /b/ and /m/ are all bilabial – with both lips. The first and second are voicing counterparts, while the third is different. Different how? We’ll get to it shortly…

Conventional IPA tables indicate manner of articulation in the rows. So let’s go row by row.

One way of categorising manners of articulation is by the how the flow of air from the lungs is affected by the articulation of the consonant. On one side you have obstruents, which obstruct the flow of air in some way or the other. On the other, you have sonorants, which do not obstruct air at all and maintain a constant flow of air from the lungs.

Under obstruents, there are firstly plosives/stops. ‘Stops’ and ‘plosives’ are for all practical purposes synonyms. Stops are pronounced by completely obstructing all air flow from the lungs, creating pressure in the mouth, and then releasing the air along with the pressure in one burst. Consider: /p/, /b/, /k/, /g/, /t̪/, /d̪/, and /ʈ/, /ɖ/. They are all articulated in this manner, and are hence stops. /p/ is a voiceless consonant, as it does not involve vibration of the larynx. It is bilabial, since it involves both lips and nothing else. Finally, it is a stop (or plosive), as it involves complete obstruction of air. Hence, it is a voiceless bilabial stop/plosive.

What about /g/? It is voiced, velar, and a stop. Hence, voiced velar stop. It’s really as simple as that: the three components combined together. /ʈ/ is the voiceless retroflex stop, and so on. I’m sure you’ve gotten the idea. /ʔ/ (the pause in ‘uh-oh’) is the voiceless glottal stop; it involves complete obstruction of airflow at the glottis, build up of pressure, and instantaneous release of the pressure.

Next, fricatives. Fricatives are articulated by restricting most of their air from the lungs, and then forcing them through a narrow channel in the mouth or the throat. Consider: /s/, /z/, /f/, /v/, /ʃ/, /ʒ/, /x/, /ɣ/, /χ/, /ʁ/, /θ/, /ð/ and /h/. They are all fricatives, as they all involve forcing air to pass through a narrow channel that is created at the place of articulation. In /s/, the channel is created by the tongue pressing against the alveolar ridge, and it doesn’t involve vibration of the larynx; it is a voiceless alveolar fricative. /z/ is the voiced alveolar fricative. What about /v/? It is a voiced labio-dental fricative, as the channel for /v/ is between the upper lips and the bottom teeth. /ʃ/, /ʒ/ are voiceless and voiced post-alveolar (palato-alveolar) fricatives, respectively. /x/ (Arabic ‘kh’) and /ɣ/ (Arabic ‘gh’) are voiceless and voiced velar fricatives, as they are pronounced by pushing air through a channel created by pressing the body of the tongue to the velum, the same region where /k/ and /g/ are pronounced.

/θ/, /ð/ are dental fricatives, as, again, the channel through which air passes is created between the tongue and the teeth. /χ/ and /ʁ/ are pronounced by creating that same channel at the uvula (the same as Arabic /q/); they are voiceless and voiced uvular fricatives. is Finally, /h/ is a glottal fricative, as here the channel is at the glottis. /h/ is a voiceless glottal fricative.

Thirdly, affricates. Affricates can be understood as sequences of a stop and a fricative. The stop and fricative must be pronounced at the same place of articulation. The IPA representation of affricates is composed of, therefore, two characters: one for the stop and the other for the fricative. They may be linked by a curve above, but might not be as well. /t͡ʃ/ (English ‘ch’ in ‘choice’), /d͡ʒ/ (English ‘j’ in ‘jump’), /t͡s/ and /d͡z/ are affricates. Try saving /t/ and /ʃ/ (‘sh’) in a sequence, you’ll get something like ‘ch’. Similarly for the other three. If you’re not familiar with the latter two, either think of them as a sequences of /t/ & /s/ and /d/ & /z/. Or, for /t͡s/, think of Kashmiri /t͡sɨ/ ‘you’ and Marathi /pat͡s/ ‘five’.

/t͡ʃ/ and /d͡ʒ/ are affricates that are pronounced at the region behind the alveolar ridge. Hence, they are voiceless and voiced post-alveolar affricates, or palato-alveolar affricates. As for /t͡s/ and /d͡z/, they are pronounced at the teeth. By now you should be able to guess that they are respectively the voiceless and voiced dental affricates.

Now we come to sonorants. Among sonorants, firstly there are nasals or nasal stops. They are pronounced similar to stops, except that air is allowed to pass through the nose while pronouncing them. In contrast, ordinary stops completely block air coming from the lungs, from the mouth or from the nose. /m/, /n/, /n̪/, /ɳ/, /ɲ/, /ŋ/, /ɴ/ are all nasals. /m/ is a bilabial nasal, /n/ an alveolar nasal, /n̪/ a dental nasal, and /ɳ/ a retroflex nasal. Note the square upside-down U diacritic (denoting dental consonants) underneath the dental nasal, and the rightward hook from the bottom of the retroflex nasal.

/ɲ/ is the palatal nasal (note the leftward hook from the bottom) – for those familiar with Devanagari, it is ञ, or it is ñ in Spanish, or ‘gn’ in French (eg., in French ‘magne’). It is pronounced by the blade of the tongue touching the roof of the mouth at the hard palate (hence a palatal consonant), completely blocking air from escaping the mouth, but then allowing it to move through the nose (hence a nasal). /ŋ/ (note the inward hook at the bottom) is the velar nasal. It is the ‘ng’ in Bengali ‘bangali’, or ङ to those familiar with Devanagari. It is velar because it is pronounced at the same region where /g/ and /k/ are, and it is a nasal; hence velar nasal. /ɴ/ is the uvular nasal – pronounced at the same region where /q/ (‘q’ in Arabic’) is. One thing to note is that in most situations when talking of nasals, they are voiced. Voiceless nasals do exist, but they are relatively rarer when compared to their voiced counterparts.

Next we come to approximants. Approximants are articulated by bringing the articulators (tongue, roof of the mouth, lips, teeth, etc.) close, but by never interrupting the airflow through the mouth. Stops fully stop airflow, fricatives let a stream of air pass through a narrow channel, and approximants do not block the airflow at all. Once again, approximants are most often voiced; voiceless approximants are rare.

/j/ (‘y’ in English ‘young’) is a palatal consonant, and an approximant: it is a palatal approximant. /w/ (‘w’ in English ‘water’) is also an approximant, but is more complicated. First, pronounce /w/. Do you see that your lips are rounded when you do so? Hence it involve the lips, and is a labial consonant. At the same time, while pronouncing /w/, your velum (region of pronouncing /k/) also constricts, making it a velar consonant at the same time. Finally, /w/ is a voiced consonant. Hence, /w/ is a voiced labio-velar approximant, because /w/ is articulated at two places, the lips and the velum. This introduces to you the idea of co-articulated consonants, consonants that are articulated at more than one location in the mouth. /w/ is one of them.

/ɹ/, the General North American English ‘r’, is also an approximant. It is pronounced at the alveolar ridge: hence the alveolar approximant. /ɻ/, the Tamil ‘zh’, is an approximant that is pronounced by curling the tongue way back to the hard palate; it is therefore a retroflex approximant.

Thirdly, there are trills. Trills are articulated by one component of the mouth (one articulator) vibrating against another. The Hindi /r/, when held for longer than a moment, is a trill that is pronounced at the alveolar ridge. Hence, an alveolar trill. The alveolar trill is also the sound used to mimic the sound of engines. /ʀ/, the sound that is produced when trying to gargle without any liquid in the mouth, is also a trill that is pronounced at the uvula – hence it is a uvular trill. There’s an IPA symbol for the sound of blowing raspberries too – /ʙ/. This sound is pronounced with both the lips, by vibrating them against one another, and is therefore a bilabial trill.

Fourthly, flaps/taps. Flaps, also known as taps, are pronounced by hitting one articulator (tongue, lips, etc.) against another (roof of the mouth, etc.). [ɾ] is the alveolar flap; note the curved ‘r’ in contrast to the full ordinary ‘r’. In Hindi, [r], the alveolar trill, and [ɾ], the alveolar rap, are not two different sounds; they’re both one fundamental sound unit (phoneme) – /r/. A singular /r/ is a tap, while a long or elongated /r/ is a trill. /ɽ/ is the retroflex flap – it is the ड़ sound. /ɽ/, the retroflex flap (ड़), is different from /ɖ/, the voiced retroflex stop (ड), in that in the former, the tongue collides with the roof of the mouth for just an instant, without any build up of pressure. The latter is however a stop, which involves obstruction of air such that it generates pressure, which is then released in an instant.

Finally we come to lateral consonants. Lateral consonants are of many different types, but where they are all the same is that they allow air to flow through both sides of the tongue, but the tongue blocks air from flowing through the middle of the mouth. There can be lateral approximants, lateral fricatives, lateral affricates, and many more. If any consonant is pronounced by blocking airflow in the middle of the mouth, it is lateral. /l/ is a lateral approximant – observe yourself as you pronounce it how air flows only through the sides, and not the middle. Since it is pronounced at the alveolar ridge, it is a voiced alveolar lateral approximant. Here we go back to the beginning of the post. /ɭ/ (note the rightward hook at the bottom) is the voiced retroflex lateral approximant, for reasons that should be clear by now.

Aspiration, breathy voice, and palatalisation

Aspiration is marked by the marker ʰ, ‘h’ in superscript. /kʰ/ (‘kh’ in Hindi khol ‘open’) is the voiceless aspirated velar stop. Now, the voiced aspirated consonants in Indo-Aryan languages (actually breathy voiced consonants, but let’s not going too deep into that) are marked with the marker – ʱ. /bʱ/ (‘bh’ in Hindi bhaluu ‘bear’) is the breathy voiced bilabial stop. The breathy voice marker is a superscript ‘h’ which is curved at the top – /ɦ/.

Palatalisation is a phrase one would hear often in phonetics and when one talks of consonants. It can refer to two phenomena – one being a secondary articulation, and the second being a type of a sound change. Let’s leave the second aside for now. In the context of secondary articulations, palatalised consonants are consonants that are pronounced with some part of the tongue raised towards the hard palate. Such consonants thus have the hard palate as a secondary place of articulation, along with their primary one.

Palatalisation is marked with a superscript ‘j’ – ʲ. For example, palatalised /k/ is just /kʲ/. /kʲ/ here has two places of articulation – the velar, which is primary; and the hard palate, which is secondary. /kʲ/ is pronounced with the tongue blocking air at the velum, but at the same time with the body of the tongue raised towards the hard palate. It may also be described as a very quick sequence of /k/ and /j/ (/j/ is ‘y’, remember). /kʲ/ therefore is a palatalised voiceless velar stop. /bʲ/ is a palatalised voiced bilabial stop.

Syllables, consonant length and word stress

The full stop marks syllable boundaries, and a small straight mark at the top before a syllable (similar to an apostrophy, but not curved like it) marks word stress. Take the word ‘consonant’ as an example: /ˈkɒn.sə.nənt/ (ignore the vowels for now, we’ll get to it). The syllables are ‘con’, ‘so’ and ‘nant’, separated by full stops. The first syllable is stressed, hence the apostrophy-like marker before it.

The length marker for both consonants and vowels is ː. ː looks like colons, but it is composed of two triangles which are more easily visible when you enlarge the font. Consonant length is relevant in geminate consonants, or doubled consonants. ‘Geminate’ comes from ‘gemini’, the twin constellation; geminate consonants are just doubled consonants, like the ‘tt’ in Hindi kutta ‘dog’. The ‘tt’ here may be written either as /t̪t̪/ or as /t̪ː/. The two forms do have slightly different connotations, but let’s not go into that.


Ashoka the Great, and Nalasopara

A few days ago I’d visited the Mumbai museum, the former Prince of Wales museum in South Mumbai. I went inside not know what to expect what awaited me inside – I hadn’t read anything about the exhibits on display – but I was excited to discover for myself. And as I entered the Sculpture Gallery, the first section of the museum, right by the entrance to the building, I was not disappointed, not at all.

The first “sculpture” in the gallery (though I’m not sure if it is even a sculpture, hence the quotes) was a block of stone, with text written on it. I immediately had an inkling as to what this block of stone was, and my inkling turned out to be correct when I looked up to see the explanation pasted on the wall. That simple-looking block of stone is nothing other than a part of an Ashokan edict found in Nalasopara, a town about 47 kms north from Mumbai.

Ashoka, the grandson of Chandragupta Maurya and the third Mauryan emperor, created more than thirty such inscriptions not only throughout South Asia but also outside it, during his reign that stretched from 269 BC to 233 BC. These edicts are found in four scripts and four languages – Magadhi Prakrit in the Brahmi script in Central and Eastern India, Gandhari Prakrit in the Kharoshti script in the Northwest of South Asia, Greek in the western regions of Ashoka’s kingdom which neighboured the Greek-speaking Greco-Bactrian kingdoms, and Aramaic, the official languages of the Achaemenid Empire, which ruled the western regions of Ashoka’s empire in centuries prior.

The sheer historical significance of this mere block of stone is not to be underestimated. This edict, along with the thirty others, represent the earliest inscriptions in India following the Indus Valley civilization. Ashoka in these inscriptions refers to himself as Devānaṃpiya Piyadasi, the Pali version of Sanskrit Devānaṃpriya Priyadarśi. The title means, “Beloved of the Gods”, and “He who treats/looks at others with love/kindness”. The language of this particular edict from Nalasopara is Magadhi Prakrit, and the script is, as the picture above says, Ashokan Brahmi. This Brahmi is the same script that would, in the millennia to come, give birth to the numerous scripts in use to this day throughout South and South East Asia.

It is hard to overstate the significance of this piece of rock. An artisan, more than two thousand years before this day today, using a chisel inscribed those words on that piece of rock. Not a decade or a century, or even a millennia, but two whole millennia ago. As I observed the inscription and took further pictures, I just wondered what life must have been like those many years ago. How must people have lived in the town then called Shurparaka. Did that artisan know that his handiwork would stand the test of time, and that someone at a distance of 2000 years from him would still be able to observe his work? We can never know the answers to these.

And, of course, the edict also made me lament the state of history education in India. How many Mumbaikars are aware of the fact that an Ashokan edict (an Ashokan edict!) exists so close to their city that they could visit the site and return in a single day? Very few, I would say. Our country has a ancient history. What is the point of it all, if we don’t teach it in our schools?

Lisān al-Arwī, or Arabic Tamil

When one thinks of the scripts that have historically been used to write Tamil, one generally thinks of the various Southern Brahmic scripts, Tamil Brahmi and its descendants that have been used to write the language for the past two millennia. Very few are aware that another completely unrelated script which has been used to write Tamil.

Arwi, known also as Arabic Tamil or Arabu Tamiẕ in the language itself, is a literary register of Tamil written in the Arabic script, developed and used primarily by, as one may expect, Tamil Muslims, starting sometime in the 14th century. According to Muthiah (2008), “The oldest Tamil Muslim literature available in Tamil is the Palchandamalai which dates to the 14th century. However, the golden period of Tamil Muslim literature is considered to be from the 16th to the 18th century, when 12 major classics and 5 minor classics were produced by the community”.

Tschacher (2001) suggests two reasons for the formation of Arwi. The first, that Muslims, having learnt to read the Qur’an, found it more convenient to use the same script for their language, than learn a separate script for it as well. The second, that Tamil Muslims, reluctant to translate religious terms from Arabic or transliterate Arabic in Tamil, preferred to rather adopt the Arabic script for Tamil than do the reverse.

When one looks at Arwi text (some samples are shown below), the first suggestion begins to look unlikely. The writers of Arwi clearly knew the Tamil script – Arwi follows the literary conventions of Tamil, including but not limited to vallinam migudal, the doubling of word-initial stops following certain suffixes in the preceeding word. They were not educated first in the Arabic script (and were certainly not Arabs who learnt Tamil, for that matter); they were educated in both Tamil and Arabic, perhaps in Tamil first and then Arabic, but that could have been either way.

The second suggestion is much more likely – that they chose to write Tamil in the Arabic script to accurately transliterate Arabic liturgical terms, but were already educated in Tamil. The Tamil script is really the last script to transliterate Arabic. Just to give you an example, four Arabic characters, kaf (ك), qaf (ق), ghayn (غ) and kha (خ) would all be transcribed by one single grapheme (க) in Tamil. Ambiguities such as this would make transcribing Arabic with Tamil a very unwise idea.

There are several characteristics about Arwi that immediately hit one upon seeing it. First and foremost, Arwi is written in Naskh, not in Nastaʿlīq (and when I first found Arwi I rejoiced at that, Nastaʿlīq is a mighty pain to read, especially for a beginner). Secondly, it does not use the Persianate or Urdu-specific character modifications to represent sounds not in the original Arabic alphabet (like to write /p/, /g/ or the retroflexes): it uses, for the most part, its own set of innovations. And thirdly, it uses diacritics nearly always, making it more of an abugida-like alphabet than a typical Semitic abjad.

The first two observations are very interesting from a historical perspective. The primary Islamic influence in northern South Asia was Persianate, due to Persianate Turkic rulers. However, initial Islamic influence in southern India, especially in modern Tamil Nadu, Sri Lanka and Kerala, was from the Arabs themselves, the Arab traders during the Middle Ages and beyond. Arwi therefore does not follow Persianate traditions of writing; it follows Arabic, at least with respect to the script alone.

The third observation, meanwhile, may be used as another argument that the writers of Arwi were initially and primarily educated in Tamil, and only subsequently in Arabic. The Tamil script is an abugida – every single vowel is written, and consonants without the base /a/ vowel are marked with a dot above the consonant, called the puḷḷi. Arwi writers, already used to writing all vowels in Tamil, were uncomfortable with not writing even a single short vowel, and hence used diacritics much more than other alphabets using the Arabic scripts do. In fact Arwi even makes heavy use of the sukūn, which performs the same function as the puḷḷi. It struck me how the two not only perform the same function, but also look very similar.

Moving on to the script then, I should start with the fact that while there is a generally agreed orthography, there are no rigorous rules. Tschacher says that different authors might write differently, and sometimes the same author will write the same word in the same poem in two different ways. Complications in matching each Tamil script character to an Arabic script one often arise due to the fact one Tamil character may map to multiple Arabic characters. For instance, consider the transcription of ச c. ச c is particularly notorious in that it is written with four separate characters in Arwi, depending on where it appears in a word: چ, and س, and ش, and ج. Even in the same position in a word, two characters may be found: for instance, word-initially, one can find چ, and س, and ش.

That last point is an important one I wish to elaborate upon. As you will see below, many Tamil characters, even in the same position of a word, are written differently in different texts and by different people. This, I suspect, is due to either dialectal differences or the influence of spoken Tamil in Arwi writing. In the particular case of the transcription of ச c, this phoneme in Tamil, in the word-initial position alone, is pronounced different in different dialects. In modern standard common Indian Tamil, c– is pronounced /s/, in certain dialects (such as mine) as /t͡ʃ/, and in a few as /ʃ/. If the reader is familiar with the Arabic script, these three pronunciations correspond exactly with the three Arabic characters that are used interchangeable word-initially; they likely indicate such dialectal differences.

Alright then, with all that covered, here’s the table of Arwi characters corresponding to each Tamil script character. The vowels are shown with both ‘alif and kaf. This is mostly taken from Tschacher (2001).

Tamil character & RomanisationArwi character
க – kWritten with either ك kaf or a modification: ࢴ. The latter may be used intervocalically or after a nasal to indicate [g].
ங – Almost always as ڠ
ச – cWritten with چ word-initially or in intervocalic geminate position, س sometimes word-initially or intervocalically, ش often word-initially and sometimes intervocalically, and ج after a nasal.
ஞ – ñEither ن or a modification: ݧ
ட – ṭڊ intervocalically or after a nasal, and ڍ in intervocalic geminate position.
ண – ڹ
த – t ت word-initially and sometimes intervocalically and after a nasal; otherwise ث intervocalically and after nasals.
ந – nن
ப – pڣ word-initially, in intervocalic geminate position and sometimes also intervocalic single consonant; otherwise ب intervocalically and after nasals.
ம – mم
ற – ر, but sometimes تّ used in intervocalic geminate positions: see note below for details.
ன – ن
ய – yي
ர – rڔ most often; sometimes ر is also seen.
ல – lل
வ – vو
ள – صٜ/ڝ
ழ – ۻ
அ – vاَ and كَ
ஆ – ā آ and كَا
இ – iاِ and كِ
ஈ – ī اِيْ and كِيْ
உ – uاُ and كُ
ஊ – ū اُوْ and كُوْ
எ – eࣣا and كࣣ
ஏ – ē اࣣيْ and كࣣيْ
ஐ – aiاَيْ and كَيْ
ஒ – oاٗ and كٗ
ஓ – ō اٗوْ and كٗوْ
ஔ – auاَوْ and كَوْ

In addition to whatever I’ve written above, there are some more interesting points I’d like to note here. Firstly, as you might noticed above, the dental stop த intervocalically is written with ث, the voiceless dental fricative in Arabic. This perhaps points to a pronunciation of த as such a voiceless dental fricative intervocalically. Generally, it is well-known that த becomes a voiced dental fricative in such a position, but had it been so in the speech of the Arwi writers, it would have been written with ذ, which does write a voiced dental fricative in Arabic. So it must indeed have been a voiceless fricative in the speech of the writers of Arwi.

In a similar vein, the character used for the retroflex stop is not a modification of the Arabic character for /t/, ت, but is rather a modification of د, /d/ in Arabic. This may perhaps be due to the perception of the retroflex flap (the intervocalic allophone of the retroflex stop phoneme) as being more similar to Arabic’s /d/ than /t/.

Secondly, entirely new diacritics have been invented for /e/ and /o/; the /e/ diacritic is especially difficult to find and type. I cannot even read /e/ on my phone, so I’ve had to make some modifications to Arwi so as to make it easier to type and viewable on my phone. I talk about these modifications below. Again, these additional diacritics point to Arwi writers being educated in Tamil, and making Arwi practically an abugida in the model of Tamil.

Thirdly, while the characters for ள and ழ are noted as being different in one of the major texts in Arwi that Tschacher references, in practice both of them are often written with the same character: ۻ, which prescriptively writes only ழ. However, ர r and ற are written with separate characters most often – the former with ڔ and the latter with ر. This is indeed very, very interesting. If one assumes that using the same character for what are prescriptively two different phones implies that the phones have merged for the writers, then this leads to interesting conclusions. It implies that and were merged in the speech of the Arwi writers, but r and weren’t – they were separate phones. In addition, it is that is written with the regular Arabic character for /r/, and it is r that is written with a modification. From this one can perhaps draw the conclusion that the Tamil Muslims found the former to be a better match to Arabic’s /r/ as compared to the latter. These are very interesting points to consider in the matter of sound changes in Tamil.

Further, the actual character prescribed for (ள) is صٜ, ṣād with a dot underneath. But this doesn’t exist in Unicode, I’ve typed it by composing ṣād  and a dot diacritic I found somewhere. I prefer to use ڝ, which does exist in Unicode, and achieves the same purpose just fine. I’m not looking too deeply into how and are transcribed in Arwi. I don’t think there was any particular reason in using variants of ص for the two – most likely it was just a matter of using an Arabic character that wasn’t already taken up in transcribing some other Tamil character.

Fourthly, the use of a modification of ع ʿayn (the voiced pharyngeal fricative/approximant in Arabic) to write ங , the velar nasal, is curious. Firstly I should mention that the actual Arwi character has three dots below the ع, but since such a character doesn’t exist in Unicode I have to use Jawi’s ڠ. And that gets me to the point I wish to make – I believe there is, or was, a tradition in South East Asia of pronouncing Arabic ʿayn as a velar nasal. This tradition might have existed in South Asia as well, or the South East Asian tradition influenced Arwi. I do not know any more details on this at the moment.

And finally, while Arwi is generally written in the high register of Tamil, the literary register, there are instances in Arwi literature of the low register, colloquial spoken language, being written. I’ve already discussed how dialectal influences may have crept in. In addition to that, as I mentioned above in the table, geminate ற்ற ṟṟ is rarely written not as رّ, but as تّ. The reason is simple: ṟṟ, in the low register, becomes tt (the low register being tadbhava Modern Tamil, so to speak, while the high register is tatsama Middle Tamil). Tschacher suggests that if more such instances exist of the low register creeping into Arwi exist, Arwi literature might provide an insight into the development of tadbhava Tamil over the past three centuries.

Finally now, here are some writings in Arwi that I found on the internet:

From this Quora answer
This one might be difficult to read, especially the Tamil equivalents that have been provided, but it is the longest Arwi piece of writing that I’ve found, so this functions as me Rosetta Stone for Arwi. I should note I have found two or three instances of what I think are mistakes in the Arwi writing in the text above.
This is the translation of the text above, in case any one wants it.
This is Tirukkuṟaḷ verse 391:
கற்க கசடறக் கற்பவை கற்றபின்
நிற்க அதற்குத் தக
The previous images have all been taken from the Quora answer linked above

Now I’ll move on to the modifications that I’ve made to make Arwi easier to type and so that my phone can read the script. I had to do this because the /e/ diacritic and the [g] character (kaf with a dot below it) are not readable on my phone at all. Among the modifications I’ve made are:

  • I write [g] (intervocalically and post-nasally) either with base kaf itself, or with Jawi’s ڬ, kaf with a dot above.
  • For , the standard according to Tschacher is صٜ, but this I believe doesn’t exist in Unicode. So I use ڝ instead. I also always write and separately, as per prescriptive Arwi, as in my dialect they are entirely different phonemes.
  • For r, I sometimes use ڕ , part of the Sorani Kurdish alphabet. It’s available in the Kurdish Gboard keyboard. I continue to write r and separately, though they’ve merged in my dialect and indeed in most Indian Tamil dialects.
  • For /e/, I use the tanwīn diacritic instead (the one used for the –in case ending in fuṣḥā, or Modern Standard Arabic), as I cannot view the Arwi diacritic on my phone. I use this one because it is added below the character, as /e/ is in Arwi.

As for typing the other characters and diacritics, the /o/ diacritic is available on Urdu Gboard. ڠ is available in Gboard Jawi, and ڊ and ڍ in Sindhi Gboard. For the other ones approximations exist, but I’m not too fond of them. I generally just copy paste those characters that I can’t type.

Using some of those approximations, here’s some Arwi with transliterations and a translation:

اڔْوِيْ، اَلَّثُ عَرَبُ تّمِۻْ، تّمِۻْ مُسْلِمْكڝَالْ عَرَبُ اٍۻُتُّكَّڝِلْ اٍۻُثَڣَّڍَّ تّمِۻِنْ اٗڔُ وَكَيْ آكُمْ
அர்வீ, அல்லது அரபுத் தமிழ், தமிழ் முஸ்லிம்களால் அரபு எழுத்துக்களில் எழுதப்பட்ட​ தமிழின் ஒரு வகை ஆகும்.
Arvī, allatu araput tamiḻ, tamiḻ muslimkaḷāl arapu eḻuttukkaḷil eḻutappaṭṭa​ tamiḻiṉ oru vakai ākum.
Arwi, or Arabic Tamil, is a variety of Tamil written in the Arabic script by Tamil Muslims.

Today, Arwi is barely known. From what I’ve read, the printing press dealt it quite a blow, as typing Arabic back then was not as feasible. The fact that there are few written samples of it on the internet is testament to this, as is the fact that I’ve had to find approximations to a few characters due to the inability to type and view them easily. Some websites in fact mention that many existing works in Arwi are rotting due to lack of maintenance, and considering how little known Arwi is, I wouldn’t be surprised at all if these claims were true. While it would be very nice if Arwi once again came to be used more widely, I know that realistically this is not going to be. It is sad, but it is what it is.

PS: It ticked me how the syntax of the English translation in my Arwi sentence sample is almost exactly the opposite of the Tamil sentence. Left-branching vs. right-branching contrast right there.

Some additional links:

References:

  • S. Muthiah (2008). Madras, Chennai: A 400-year Record of the First City of Modern India, Volume 1. Palaniappa Brothers, Chennai.
  • Tschacher, Torsten (2001). Islam in Tamilnadu: Varia. Halle (Saale).

I will write like this only

Anyone familiar with Indian English would know that it is characterized by not only the conservation of several archaisms that have disappeared in native dialects of English (I’m looking at you, thrice**), but also by certain unique Indianisms that originate from the influence of native Indian languages on the English that Indians speak. One famous Indianism is the peculiar use of the word only by Indians to add emphasis to a sentence. For instance, the title of this post means that I will write like this, specifically, and not in any other way: it adds emphasis on the way in which I will write and refutes that I will write in any other way aside from it. Interesting, isn’t it? Where does this usage come from?

The simple answer is that the use of the word only in this manner is a calque of similar words used in Indian languages for the same purpose. Now, what is a calque? A calque is a literal, word-to-word translation of a phrase or word in one language, into another language. A calque is contrasted against a loanword, which is simply a word of another language directly borrowed into another. Calques, however, are words that are not borrowed directly, but rather translated word-to-word. An example would be translating “bookworm” into Hindi by literally combining the translations of the words “book” and “worm” in Hindi: किताब-कीड़ा kitāb-kīṛā.

What is only a calque of, then? Indian languages (both Indo-Aryan and Dravidian) have, as I mentioned, words that add emphasis to the words that they are placed after. Consider the following examples in Hindi and Tamil:

मैं ऐसे ही लिखूँग​।
ma͠i aise hī likhū̃gā
I like.this EMPH will.write
"I will write like this only."

நான் இப்படி-தான் எழுதுவேன்
nān ippaḍi-dān eẕuduvēn
I like.this=EMPH will.write
"I will write like this only."

The words ही and தான் dān act as emphasizers: they emphasize the word or phrase that the immediately succeed. In this case, they emphasize ऐसे aise and இப்படி ippaḍi, both of which mean “this way”.

English has no such native emphasizer. Conveying emphasis in English involves other strategies, including but not limited to intonation and stress. Indians therefore calque (or translate literally) the usage of these emphasizers with the word only in English. Now, you may be curious; why the word only? The reason is again simple. Consider the following:

ஒன்று-தான்
oṉṟu-dān
one-EMPH
"only one"

एक ही
ek hī
one EMPH
"only one"

You see, the emphasizers in Indian languages can also, in certain contexts, be translated with the English word only. Indians have generalized the translation of the emphasizers with only, to all contexts, not just in this one specific context.

And that is why we will speak like this only. Please adjust.

** – for Indians who are surprised, yes, the word thrice is archaic outside India.


That is all that I was initially planning on writing on this post, but while I was thinking about this, my mind wandered to the fact that Tamil has not one emphasizer, but two, and I decided to elaborate further on this as an addendum. The following section is going to be more jargon-laden, but I’d encourage the layperson to still read the examples that I provide below.

As I was saying, Tamil has two emphasizers. The word above, -தான் –dān, is one; -ஏ –ē is another. To use some jargon, these aren’t words; they are clitics. Enclitics, to be specific. –ē is an emphasizing enclitic that is reconstructed all the way back to Proto-Dravidian, and all Dravidian subfamilies have descendants or reflexes of it. The usage of –dān as an emphasizing enclitic is restricted to the South Dravidian I subfamily, which includes Tamil, Malayalam, Kannada, Tulu, Kodava, Koda, Toda, and several other languages. It originates from tān, a reflexive pronoun meaning “self”. The use of the pronoun is found in all Dravidian subfamilies, but its use as an emphasizer is restricted to the South Dravidian I family. The extension of a reflexive pronoun as an emphasizer is not uncommon. English does it too, by using “myself”, “himself”, etc. as emphasizers.

The origin of these enclitics is relevant, and impacts their usages and their meanings. For one, –dān cannot be used for verbs, as it originates as a reflexive pronoun. –ē can be used with verbs. Further, while both of these enclitics are broadly called emphasizers, they assign different kinds of emphasis, and can, in fact, both be used at the same time. Here I will use different combinations of the two clitics on different verbs to display how they can change the meanings of sentences.

First, simple sentences.

நான் பார்த்தேன்
nāṉ pārttēn
nāṉ pārttēn
"I saw."

நானே பார்த்தேன்
nāṉē pārttēn
nāṉ-ē pārttēn
"I did see." (emphasis on I)

நான்-தான் பார்த்தேன்
nāṉ-dāṉ pārttēn
"It was I who saw it."

நானே-தான் பார்த்தேன்
nāṉē-dāṉ pārttēn
nāṉ-ē-dāṉ pārttēn
"I was INDEED I who saw it, not anybody else."

நான்-தனே பார்த்தேன்
nāṉ-dāṉē pārttēn
nāṉ-dāṉ-ē pārttēn
"It was I who saw it, wasn't it?"

நான் பார்த்தேனே
nāṉ pārttēnē
nāṉ pārttēn-ē
"I did see, didn't I?" / "I did see" (emphasis on the seeing)

As you may realize, the four possible combinations of the two enclitics add different kinds of emphasis, which have to do with the origins of these enclitics. In addition, can also function as a sort of marker of relative clauses. Consider:

நான் ஒருத்தனை நேற்று பார்த்தேனே, அவனை இன்றும் பார்த்தேன்.
nāṉ oruttaṉai nēṟṟu pārttēṉē, avaṉai iṉṟum pārttēṉ.
I one.man-ACC yesterday saw=EMPH, him today=also saw.
"The man I saw yesterday, I saw him today too."

This is a display of a few of the many ways in which Tamil makes heavy uses of enclitics. Other enclitics in Tamil, aside from these two, are –um, which roughly means “also”, –ā, which makes any word it is added to into a yes-no question, and –ō, which has several uses, including the formation of an irrealis mood, as Tamil does not conjugate for moods in its verbal system.

“Medicine milī?”

I had gone to the Kala Ghoda festival here in Mumbai the day before yesterday, and while standing in the line for something I overheard the man behind me talking on the phone. He was talking to his mother, asking her if she had gotten some medicines. To a casual observer, it would indeed be mundane, everyday stuff. But two words that he said struck me: “Medicine milī?”. Think about the question for a minute.

This question translates to “Did you find the medicine?”. The man was inserting an English word in a Hindi conversation, as urbanites in Mumbai often do. Hell, we do it in every conversation, not just often. What intrigued me, however, was not the English word, but the Hindi word. milī is the past tense, feminine gender, form of the verb milnā, which in this case would be translated as “to be found”. Why did the man use the feminine form of the verb, when he was using an English word, which are generally taken to be masculine?

My hypothesis is that he was subconsciously thinking of the Hindi word for ‘medicine’, while consciously using the English word. The most common Hindi word for ‘medicine’ is दवाई davāī, a feminine word. In Hindi,
“दवाई मिली?” davāī milī is grammatically correct: using the feminine form of the verb for the feminine noun. But the man wasn’t using the Hindi word at all, he was using the English word! So again, why the feminine form of the verb?

Hence my hypothesis that, while he spoke the English word, he subconsciously thought of the Hindi word. This is an amazing example of code-switching. What is code-switching, you ask? Wikipedia defines it as a speaker alternating between two or more languages, or varieties of a language, inside a single conversation. It’s the use of elements of multiple languages inside one conversation. It is what speakers of Hinglish of Spanglish do, basically.

But, as this example indicates, it is not merely one language with words of another language sprinkled in. There is something much deeper going on, with both (or all) languages existing in the subconscious simultaneously and affecting the speech in much more intricate ways that they would if they were just words of one language sprinkled in the sentences of another language.

This is the kind of stuff that excites me about languages: the fact that they are so complicated and intricate, yet we speak them instinctively, without even thinking about it. Multi-linguals code-switch continuously but don’t ever realize what an incredible thing it is that they’re doing!

Of course, this analysis that the man was simultaneously thinking of two languages while speaking is just a hypothesis of mine. It could be that there is a simpler explanation, in which case this post is moot. But even so, the more important point I’m trying to make is that languages are fascinating, and we don’t even have to learn linguistic jargon and whatnot to realize that. Just observe people speaking, or yourself speaking, more closely, and you’ll realize how amazing they are.

The mysterious ‘ne’

I first became interested in languages when I began learning French in 9th standard, at the age of 13. Language is instinctive in us, we never have to consciously make an effort to construct words and phrases and sentences. As long as we don’t try to learn a new language, that is. Learning French gave me the first glimpse into the wonderful (and mind-boggling!) complexity of human language. I began to pay closer attention to the way I speak, analysing every word and sentence I speak or hear. One of the first curiosities that piqued my interest was the mysterious ne in Hindi. Here are two examples to display this mysterious word.

मैं किताब लिखता हूँ।
ma͠i kitāb likhtā hū̃
I write a book.
मैंने किताब लिखी।
ma͠i ne kitāb likhī
I wrote a book.

You see, the translations in English are simple. To convert the sentence from present to past tense, you simply convert the present tense form of the verb, “write”, to past tense, “wrote”. (By the way, I initially planned on using the verb “read”, but that is spelled the same in both tenses in English, so…) But in Hindi, as you see, not only does this mysterious ने, or ne, appear out of thin air, but the verb, लिखी likhī, is now in the feminine form! The masculine form of the same verb would be लिखा likhā, and last I checked I was still a man. Why am I using the feminine form of the verb then?

These questions made a 13-year old boy very curious and excited. I asked my parents, friends, and teachers; nobody knew. I kept getting one answer: “That’s just how it is”. No one for accepting such a cop out, I kept looking. I eventually forgot about it, until I finally learnt and, more importantly, understood, the answer years later, by which point I had learnt much more about linguistics and morphosyntax. If that second word scares you, it should; it is very, very confusing for a beginner, Wikipedia notwithstanding. I didn’t understand it for years, myself. But here I’ll try to explain why Hindi inserts the weird ne and converts the gender of the verb, in as simple language as possible.

To explain, I will start with verbs. I’m assuming the reader is familiar with verbs: ‘action words’, basically. For our purposes, verbs are of two types: ‘transitive’ and ‘intransitive’. If you remember what these are from your days in school reading Wren and Martin, great! Skip the following paragraph. Otherwise, here’s a catch-up for you. Consider the following sentences:

I eat the food.
You write the book.
We see the man.
He hears the sound.

There is one thing common in all of them. There are ‘doers’ to the action: I, you, we, and he. And there is the ‘doee’, or the ‘experiencer’ of the action: the food, the book, the man, and the sound. Each of these four verbs involve two entities: the doer, and the experiencer. Compare that to these four sentences:

I sleep.
We fall.
You walk.
They run.

In these four verbs, there is space for only one entity: the sole entity performing the action. Can someone run something else, or walk something else? Well, yes, in English it turns out that you can, but ‘walking something else’ and ‘yourself walking’ are different things, aren’t they? Think about it, you’d use different verbs in Hindi for them! Why they’re described with the same verb in English is a story for another day, but quite related to the topic I’m explaining at the moment.

Anyway, in the first example, verbs could take both a doer and an experiencer. Such verbs are transitive verbs. In the second example, the verbs could only take one single sole entity. Such verbs are intransitive. This difference is key to understanding the ne.

Now, let’s move onto the entities that are involved in the action of the verb. In English, we say:

I saw him.
I sleep.
He heard me.

and not:

Me saw he.
Me sleep.
Him heard I.

Why? Do you see a trend here? The trend is that we treat the sole entity of an intransitive verb, in the same way as we treat the doer of a transitive verb. In “I saw him”, I is the doer. In “I sleep”, I is the sole entity of the action. If you think a bit, doing an action upon something else, and being the sole entity of the action, are quite different. Yet, in English, we treat them the same, and use the same form of the pronoun. We treat the experiencer of the transitive verb differently: “I saw him“, “He heard me“, and not “I saw he“, and “He heard I“.

And that is also what we do in Hindi, but only in the present tense.

मौं किताब लिखता हूँ।
ma͠i kitāb likhtā hū̃
I write a book.

मौं सोता हूँ।
ma͠i sotā hū̃
I sleep.

The doer of the transitive verb, and the the sole entity of the intransitive, are both treated in the same way: मौं ma͠i . The experiencer, किताब kitāb, is treated differently. But what happens when we move to the past tense?

  
मैंने किताब लिखी।
ma͠ine kitāb likhī
I wrote a book.

मौं सोया।
ma͠i soyā
I slept.

किताब फटी।
kitāb phaṭī
The book tore.

Now, we finally come to the mysterious ne! A very weird thing (well, not that weird, but anyway…) is that in the past tense in Hindi, the doer of the transitive verb, मैंने ma͠ine, is not treated the same way as the sole entity of the intransitive verb, मैं ma͠i. For the doer, the extra ne is added. Meanwhile, the experiencer of the transitive verb, and the sole entity of the intransitive verb, are treated the same: किताब kitāb. What does this mean?

It means that, while in the present tense in Hindi, the sole experience of an intransitive verb and the doer of a transitive verb are treated the same way and the experiencer of a transitive verb is different, in the past tense instead, the sole entity of the intransitive verb and the experiencer of the transitive verb are treated the same way, and it is the doer of the transitive verb that is differently treated.

That is why, in the first transitive sentence, not only do you have to add ne to the doer, to mark that this is a doer of a transitive sentence in a different way, you also have to agree the verb in the feminine gender, as the word किताब kitāb in Hindi is feminine. Why? Because this word is the experiencer, and the experiencer of a transitive verb is the same as the sole entity of an intransitive verb, but only in the past tense!

This system, where the sole entity is the same as the experiencer, and the doer is different, is called the Ergative-Absolutive Morphosyntactic Alignment (remember the word morphosyntax in the beginning?). It contrasts with the usual “sole entity = doer & experiencer being different” system of the Nominative-Accusative Alignment. Since Hindi only has the Ergative-Absolutive alignment in the past tense, Hindi has what is called split-ergativiy. AKA, ergativity only in one section of the language, i.e., only in the past tense. Well, it is really the entire perfective aspect, but that’s going a bit too technical there.

Anyway, split-ergativity doesn’t exist in a whole lot of languages in the world. Some other languages (other than Hindi and Urdu, that is, which are basically the same language, and let’s not get into that right now) which have this feature include Pashto, which is spoken in Pakistan and Afghanistan, and Sorani, a variety of Kurdish spoken in Iraq. Several other languages have complete ergativity throughout the language, such as Basque (spoken in Spain and France), Tibetian (spoken in, well, Tibet) and the Mayan languages (spoken in Central America). So Hindi is in quite the unique club for having split-ergativity as a feature.

To summarize the post, in Hindi, in the past tense, the sole entity of a transitive verb is treated in the same way as the experiencer of an intransitive verb, and the doer of the transitive verb is treated grammatically differently. Why is it so? That is to do with how Hindi evolved from Sanskrit and the innovations it adopted along the way, and is another story for another day. I’ve also cut down on a lot of details and jargon to make this post as accessible to the layman as I can. For now, just know that the ne is one of the things that makes Hindi very, very unique.

Feedback on the post, especially on how accessible you found it if you were a layperson, would be very helpful.