Clitics in Tamil, Part 2

This post is a sequel to my earlier post on the clitics =dān and =ē in Tamil. Since writing that post, I’ve thought about these two quite a bit, and have also come across more literature, so this post is intended to put my thoughts not only on these two but also on =ō in one place for future reference. The previous post was more about putting together descriptions – this one is more on the analysis and “interesting linguistic tidbits” side of things.

The first and most important finding since initially writing that paper is Susan Herring’s 1991 paper on the grammaticalization of rhetoric strategies in Tamil. Grammaticalization is probably my favourite linguistic phenomenon (as of now, anyway) and I’ve come to really be intrugued by =ē and =dān, so something that combines both immediately piqued my interest. I made an edit to my original post discussing her paper briefly. The summary is that Herring calls =ē a “rhetorical tag marker”, and argues that “its role has shifted from that of a clause-final particle to a particle which may relate an attribute to a nominal head within a clause”. Essentially, two clauses such as “I said (something) to you then, didn’t I? Did you understand it?” have fused together to become “Did you understand whatever I said to you then?”. One argument she uses is that =ē when used as a tag question involves a particular intonational pattern which I mentioned in my previous post – in these relativizing constructions, that intonation does not occur, meaning that this construction is grammaticalized as a relativizing one.

So that’s one thing. Another is a refinement of the descriptions of the functions of =dān and =ē. Firstly, one of the functions of =dān as Schiffman (1999: 192) describes is that it “often functions in a discourse to indicate that new information is related to old information; it therefore functions as a communicative device that speakers use to establish solidarity”. Now, this description is fine by itself, but it can (again, I think) be better described as =dān marking topics. I can’t say much of what the proper textbook definition of a ‘topic’ is, but as I understand, it often boils down to topic being information already mentioned in the discourse, while the focus/comment is new information. So =dān is both a marker of contrastive focus, and a topicalizer. What led me to realizing this is reading that the morph -(n)un in Korean also performs a multifunctional role as a topicalizer and marker of contrastive focus (Rhee, 2014). There is probably something between these two roles that leads one to develop into the other; I have to look into that sometime and see whether there are parallels in other languages (I’m sure there are some).

I will also mention that Rhee (2014) discusses much more about the grammaticalization of discourse strategies in Korean, many of which have parallels in Tamil and elsewhere in Dravidian. Rhee himself compares Tamil with Korean data briefly, but it would be interesting to look into it in more depth at some point. Herring (1991) is a very good paper, but it’s still just one paper and I don’t know if there have been further studies on this.

Another addition to my first post is a function of =ē that I missed completely, which is that of a vocative. =ē on nouns can, among other things, function as a vocative. Whether this has developed from its role as an “emphasis” (as vague as that word is) marker, or whether this vocative =ē is entirely separate, is something I don’t know.

An attempt at a generalization of the functions of =ē and =dān when added onto everything but finite verbs is that =ē isolates the entity onto which it is added from other (unspecified) entities; consider the examples ‘I come personally, alone’, ‘I write with my fingers directly without a pen’, and ‘I went there solely on foot’. Meanwhile =dān either contrasts it against another entity or topicalises it: ‘I am the one who came, not anyone else’, ‘It is with my hand that I wrote, not with my foot/I didn’t type’, ‘I went there on foot, not by car or train’. =ē performs several other functions when added to finite verbs, but since =dān cannot be added to those, there is no question of distinguishing them in that case. Based on this, I tentatively suggest calling =ē “isolating focus” and =dān “contrastive focus”. I’m not convinced of this myself, it’s just an idea. I think mapping out their functions is more important than naming them.

That is it for =dān and =ē for now. Let me turn to =ō. =ō is yet another clitic which has quite interesting functions. Very abstractly, it can be defined as a dubitative marker, but it has several grammaticalized functions. First, some examples:

(1)    avan innikki varuvan=ō...
GLOSS: he today he.will.come=Ō
TRANS: "I wonder if he will (perhaps) come?"

(2) vēlay=ā paṇṇiṭṭiy=ō?
GLOSS: work you.finished.doing=Ō
TRANS: Have you finished the work, by any chance/perhaps?

(3) vēlaya paṇṇiṭṭiy=ā?
GLOSS: work you.finished.doing=QUES
TRANS: Have you finished the work?

(4) varuvaḷ=ō vara māṭṭāḷ=ō enakku teriyādu
GLOSS: she.will.come=Ō come she.will.not=Ō is.not.known
TRANS: I don't know whether she might (perhaps) come or not.

(5) varuvaḷ=ā vara māṭṭāḷ=ā enakku teriyādu
GLOSS: she.will.come=QUES come she.will.not=QUES is.not.known
TRANS: I don't know whether she will come or not.

(6) ōhō, vēlaya paṇṇiṭṭiy=ō?
GLOSS: Oh, work you.finished.doing=Ō?
TRANS: Oh, you've finished the work? (surprise, new discovery)

The =ō clitic in these examples indicates that the speaker is unsure of the veracity of the statement, and doubts it (hence ‘dubitative’, which is a from the Latin source of, and a doublet of, the word ‘doubt’). It’s performing that role in (1) and (2). In (2), the speaker thinks that the listener could have finished the work, but is unsure and hence asks the question. (3) is provided to contrast the dubitative =ō with the question clitic =ā, which marks pragmatically neutral yes-no questions. (3) is a simple yes-no question, with no implication as to what the speaker thinks of the likelihood of the listener having finished the work. (4) and (5) are similar minimal pairs; both are embedded indirect questions, with the difference being that (5) implies that the speaker is neutral to the likelihood of the woman coming or not, while (4) implies that she speaker is decidedly unsure of it. One can think of =ō as marking an irrealis/subjunctive mood, to draw an analogy with Indo-European.

In (6), =ō expresses the surprise of the speaker, and that the speaker has only recently discovered the information. This interpretation requires a marked intonational pattern – the finite verb rises in pitch, then falls, then rises again at the end of the phonological word. This is reminiscent of the differing interpretations of =ē in intonationally marked conditions.

Those are the functions that can be readily explained by the definition “dubitative”. Here are examples of its more grammaticalized uses:

(7)    koẓandai taṇṇi.y=ō pāl=ō kuḍikkum
GLOSS: child water=Ō milk=Ō it.will.drink
TRANS: The child will drink either water or milk.

(8) nān [nēttikki eṅga pōnēn=ō] aṅga innikkum pōvēn
GLOSS: I yesterday where I.went=Ō there today=also I.will.go
TRANS: I will go today wherever I went yesterday.

(9) nān [nēttikki oru eḍattakku pōnēn=ē] aṅga innikkum pōvēn
GLOSS: I yesterday one I.went=Ē there today=also I.will.go
TRANS: I will go today where I went yesterday.

In (7), =ō acts as an ‘or’ conjunction. The child can drink either water or milk, but it is not known which the child will drink. It is cross-linguistically common for ‘or’ conjunctions to develop from dubitative or irrealis mood markers (Mauri, 2008), so this isn’t surprising at all. What’s more interesting is (8). In (8) the =ō clitic, when added to a clause with an interrogative (in this case ‘where’), makes that clause an indefinite relative (in this case, ‘wherever’): I will go to the place I went yesterday, and it does not matter wherever it may have been. Compare this to (10), which is the relativizing =ē that I mentioned earlier and which Herring (1991) analyses. Don’t they look remarkably similar? I think that the same processes by which the =ē relative developed, led to the =ō indefinite relative too; i.e., an initial paratactic construction was over time fused together into a hypotactic one: ‘I wonder where I went yesterday. I will go there today too’ > ‘I will go today too wherever I went yesterday’.

The same syntactic and intonational arguments for this process in the =ē relative can also be made for the =ō indefinite relative. The only difference between them, is that while the =ē relative as far as I know is an innovation that has occurred only in Tamil, the =ō relative has a wider distribution. It is present at least in Kannada and Telugu (Schiffman, 1983; Krishnamurti & Gwynn, 1985), and appears to be a feature of South Dravidian. There is in fact contention on whether it is a construction that has developed under Indo-Aryan influence, given that it resembles superficially the Indo-Aryan correlative (the =ō relative must always have an /e/-initial interrogative in the subordinate clause and an /a/- initial deictic in the matrix clause, reminiscent of correlatives). For instance, consider Hock (2008).

Here are two final examples:

(10)   nān nāḷɛkki eṅgay=ō pōvēn
GLOSS: I yesterday where=Ō I.will.go
TRANS: I will go somewhere tomorrow (some specific, if undecided place)

(11) nān nāḷɛkki eṅgay=āvadu pōvēn
GLOSS: I tomorrow where=ĀVADU I.will.go
TRANS: 'I will go somewhere or other tomorrow (I don't know & don't care where)'

=ō when added to an interrogative makes it an indefinite pro-form (‘someone’, ‘something’, ‘somewhere’). (10) implies that tomorrow I will go to some specific place, even if I haven’t decided exactly where. To contrast, (11) implies that tomorrow I will go to some unspecific place that not only have I not yet decided, I also don’t care where I go. This use of =ō in (10) is not restricted to Tamil – Kannada and Telugu have it too (Schiffman, 1983; Krishnamurti & Gwynn, 1985). It may have originated in South Dravidian through similar process by which the =ō relative did and the =ē relative later did in Tamil.

A comment I’d like to make is that it’d make an interesting study to look into what triggers the use of the =ē relative vis-a-vis the usual participial one. The participial relative is more syntactically constrained, so one primary use of the =ē relative (and the =ō relative for that matter) is to circumvent those constraints; but what about cases where both could potentially be used? What drives the use of the more pragmatically marked =ē relative in such cases? One would probably need a corpus to study that.

And finally, while I am on this topic, I also realized something about Tamil-English contact. Consider the following dialogue and its translation into Tamil:

(2) A: You saw that movie ā? It's good ā?
    B: Ya, it's good only.

(3) A: nī anda paḍatta pāta.y=ā? nannā irukk=ā?
    B: āmām, nannā=dān irukku

GLOSS: A: you that movie you.saw=QUES? good
       B: yes, good=DĀN

=dān being calqued with <only> is not new information (reference unintended), but note how the question clitic =ā is borrowed into English. In Tamil, in pragmatically neutral cases, it is added to the finite verb of the clause, which in pragmatically neutral cases appears at the end of the clause (Tamil being strongly left-branching). But note that when =ā is borrowed into English, the pragmatically neutral behaviour is for =ā to be added to the object, not the finite verb. What seems to be more important is that the clitic continue to be clause-final. This probably speaks to the process by which =ā was borrowed into English in the first place, as a clause final yes-no question marker rather than a clitic. I find this quite fascinating.


  1. Rhee, S. (2014). ‘‘I know you are not, but if you were asking me’’: On emergence of discourse markers of topic presentation from hypothetical questions. Journal of Pragmatics, 60, 1-16.
  2. Herring, S. C. (1991). The grammaticalization of rhetorical questions in Tamil. In E. Traugott & B. Heine (Eds.), Approaches to Grammaticalization, Vol.1 (pp. 253-284). Amsterdam: John Benjamins.
  3. Schiffman, Harold. (1999). A Reference Grammar of Spoken Tamil. Cambridge: Cambridge University Press.
  4. Schiffman, Harold. (1983). A Reference Grammar of Spoken Kannada. Seattle: University of Washington Press.
  5. Krishnamurti, B., Gwynn., J. P. L. (1985). A Grammar of Modern Telugu. Delhi: Oxford University Press.
  6. Hock, H. H. (2008). Dravidian syntactic typology: A reply to steever. In Annual Review of South Asian Languages and Linguistics: 2008 (pp. 163-198). De Gruyter Mouton.
  7. Mauri, C. (2008). The irreality of alternatives: Towards a typology of disjunction. Studies in Language, 32(1), 22-55.

ற்ற and ன்ற் in Modern Literary Tamil

In this post, I will be using <> to transcribe Tamil orthography, // for phonemic representations, and [] to represent allophonic realizations that are relevant to this topic. I’ll use ISO 15919 to transcribe the Tamil script.

In Modern Literary Tamil, the sequence ற்ற <ṟṟ> is prescribed to be pronounced as [t̠r], where [t̠] is an apico-alveolar stop, and [r] is an apico-alveolar trill. In practice, it is most often articulated [ʈr], with a retroflex stop and a trill. However, in Spoken Tamil, none of this exists. All instances of ற்ற <ṟṟ> in Literary Tamil (LT) become த்த <tt>, geminate dental stop, in Spoken Tamil (ST). For instance, literary <kāṟṟu> ‘wind’, <māṟṟu> ‘to change>, <kaṟṟu> ‘having learned’, <ciṟṟāy> ‘mother’s younger sister’, and <paṟṟi> ‘about’ are <kāttu>, <māttu>, <kāttu>, <citti>, and <patti> in ST. Now, the usual assumption is that LT reflects an older stage of the language. Going by this, are we to accept that [t̠r] has become [tt] today, and that [ʈr] is an attempt at maintaining the older pronunciation of [t̠r]? At face value, there is nothing wrong with this. But, when we take a look at comparative and typological evidence, this claim becomes more suspect.

Let me take a step back and situate Tamil in a Dravidian perspective. Proto-Dravidian had obstruents at six places of articulation: bilabial, laminal dental, apico-alveolar, retroflex, palatal, velar. Five of these are standard for South Asia, but the sixth sticks out as very unusual: the apico-alveolars constrasting against the laminal dentals and the retroflexes, which were presumably sub-apical palatals. The most important aspect when it comes to PDr obstruents is that they lenited intervocally and post-nasally, but did not do so word-initially and as intervocalic geminates: /at̪a/ was [aða], /nt̪/ was [n̪d̪], and /at̪t̪a/ was [at̪t̪a]. For the alveolar stop, , the reconstructeable intervocalic allophone is [d̠], a voiced alveolar stop, given that it has reflexes of [d̪] or [ɖ] in Central Dravidian. In South Dravidian and South-Central Dravidian, however, the intervocalic allophone was phonetically a rhotic, transcribed as [r̠] for convenience. This rhotic remained distinct from *r, which is the phonemic rhotic reconstructable for Proto-Dravidian. Its post-nasal allophone remained [d̠], and its geminate allophone remained [t̠t̠], in all subfamilies. The retroflex stop /ʈ/ also behaved similarly. The intervocalic allophone was likely [ɖ] or [ɽ], its post-nasal allophone [ɖ], and geminate allophone [ʈ]. Intervocalically, both apical stops had rhotic realizations.

The various allophones of Proto-Dravidian alveolar stop have not developed together, given how unstable this phoneme is in the family. In many languages of South and South-Central Dravidian, [r̠] has merged with [r] (< *r). [r̠] remains as a distict rhotic in Konda, Malayalam, Toda, the Kurumba languages and other languages of the Nilgiri tribes, and the Kanyakumari and Jaffna dialects of Tamil. Meanwhile, [t̠t̠] has developed into [t̪t̪] (Kannada, Tulu), [ʈʈ] (Telugu), and [t͡ʃ] (Kuvi, Kui). [n̠d̠] has developed into [n̪d̪] in Kannada, [n̪d̪] or [ɲd͡ʒ] in Tulu, [ɳɖ] in Telugu, and [ɲd͡ʒ] in Kui. Once this comparative data is taken into account, the Tamil picture starts to become more clear. It is a reasonable assumption that Old Tamil had the alveolar stop as a distinct phoneme, with an allophonic distribution identical to what we reconstruct for Proto-Dravidian. Modern Malayalam is conservative in this respect – it retains this exact allophonic pattern.

Given that Malayalam retains genuine alveolar stops as reflexes of Proto-Dravidian *-t̠t̠- and retains a distinct rhotic [r̠] as the reflex of *-t̠-, one can safely assume the same to be the case in Old Tamil. Malayalam does not have alveolar reflexes of etymological *-n̠t̠-, but it does have [n̠d̠] in its phonology. The details of Malayalam synchronics are immaterial at the moment, so I won’t elaborate on that. If we forget Literary Tamil for the moment, the developments in Modern Spoken Tamil are very clear. Tamil has merged [r̠] with [r], has merged [t̠t̠] with [t̪t̪], and has developed [n̠d̠] into either [n̠n̠] or [ɳɳ], depending on the dialect. We know the starting point, which is Old Tamil of the first half of the first millennium CE. We know the end point, which is the Spoken Tamil of today. What comes in the middle, is the question.

Proto-DravidianOld Tamil/MalayalamModern Literary TamilModern Spoken Tamil
*-ṯ-[r̠] <ṟ>[r̠]/[r] <ṟ>[r]
*-ṯṯ-[t̠t̠] <ṟṟ>[t̠r] <ṟṟ>[t̪t̪]
*-nṯ-[n̠d̠] <ṉṟ>[n̠d̠r] <ṉṟ>[n̠n̠]/[ɳɳ]
Summary thus far.

Can we take LT to be a middle stage between Old Tamil (OT) and ST? I don’t think so. Of course, one can never be absolutely certain that LT is not a naturally evolved middle stage, but I don’t have to point out that these LT pronunciations of <ṟṟ> and <ṉṟ> are certainly typologically (in a Dravidian perspective, I mean) very unusual, which itself makes them suspect. The second [t̠] in [t̠t̠] never becomes a trill across Dravidian. So, what’s going on?

As the reader may have suspected, I believe the LT pronunciations of <ṟṟ> and <ṉṟ> to be artificial, prescribed so that these two glyphs remain distinct from <tt> and <ṉṉ>/<ṇṇ>, respectively. Why, and how did this happen? Let me first talk about <ṟṟ>. There is reason to believe that the merger of [t̠t̠] into [t̪t̪] happened before the merger of [r̠] into [r], through inscriptional evidence. Note that this is anecdotal (I was told by someone, who has read sources), since I don’t have a background in Tamil epigraphy, and neither do I have access to sources that discuss this. But this is a blog post by an amateur, not a journal article, so I don’t mind talking of anecdotal stuff (a reminder that I’m not to be construed as an expert on these matters…). Secondly, Old Kannada displays exactly this. By the time of our records of Old Kannada, [t̠t̠] has already merged with [t̪t̪], but only much later did [r̠] merge with [r]. This is our first clue. I propose that in Tamil too, [t̠t̠] merged with [t̪t̪], while [r̠] remained distinct until much later.

This merger of the alveolar stop with the dental, however, while it had no impact on the realization of [r̠], had a major impact on its phonology. It was no longer perceived as an allophone of a stop phoneme, but now a rhotic phoneme, distinct from the other rhotic phoneme [r]. At this point, the Literary Tamil tradition was faced with a situation in which they had to continue to distinguish between <ṟṟ> and <tt> somehow, even though they had merged in the living, spoken language. Note that at this point, intervocalic <-ṟ-> denoted a distinct rhotic phoneme [r̠]. This rhotic realization was assumed to be the unchanging, only realization of the glyph <ṟ>, and was artificially prescribed on to geminate <ṟṟ>, leading to its modern prescribed pronunciation of [t̠r̠]. Being that it is an artificial, prescribed pronunciation, speakers of Tamil who do not have the alveolar stop in their native varieties often tend to pronounce <ṟṟ> as [ʈr]. I believe that no naturally evolved Tamil speech would have ever had [t̠r] as a realization at all (disregarding loans from the high register).

I believe much the same happened in the case of <ṉṟ> as well. The rhotic realization of <ṟ> was taken to be the only realization, and was extended to a post-nasal position in <ṉṟ>. An epenthetic voiced stop between a nasal and a rhotic is a cross-linguistic phenomenon (by that I mean, across the world, not just in Dravidian) and so unremarkable that I don’t have to even mention it. However, while I am relatively confident of my account of <ṟṟ>, I have to say that I am less confident of <ṉṟ>. According to B Krishnamurti’s description of Konda, which does retain a distinct alveolar stop phoneme, the reflex of PDr *-nṯ- is realized as [n̠d̠r̠]. Further, while the Kannada-Tamil comparisons regarding *-ṯṯ- remain robust, the two languages have developed *-nṯ- differently. Kannada remains consistent in merging alveolars with dentals: it has made *-nṯ- into [n̪d̪]. In Tamil, it has become either [n̠n̠] or [ɳɳ], depending on the dialect. While these are certainly complications that I’m not entirely sure what to make of, I believe that my proposal is reasonably likely to be what has actually happened.

Now, are there parallels to this? I have two potential parallels to offer, one in Tamil itself, and another in Arabic. Both are cases that came to my notice through twitter, so I will link the tweets in question for reference as I discuss them. Firstly, the other Tamil case (link to tweet). Just as for all other obstruents, the phoneme /c/ in Old Tamil also had a similar allophonic distribution: [t͡ʃ] word initially, [t͡ːʃ] as geminate, [d͡ʒ] post-nasally, and [s] intervocalically. In Old Tamil, if a word with a word-initial /c/, when it occurred as the second element in a compound with the first element having a word-final nasal, then the /c/ in the second element would have the post-nasal allophone of [d͡ʒ], as is to be expected. That is, <pun> + <cirippu> = <puncirippu> /puncirippu/ [puɲd͡ʒirippɪ̈].

This has changed in many dialects of Modern Tamil. Particularly, word-initial /c/ has become [s], and only some dialects, particularly the western ones, that retain the affricate realization of initial [c]. Hence, <cirippu> is [sirippu]. Add to this the influx of [s] through borrowings, and this word-initial [s], historically /c/, is no longer perceived to be an allophone of /c/, but a distinct phoneme /s/: /cirippu/ has become /sirippu/, and the glyph <c>, when it occurs word-initially at least, is taken to represent /s/ [s]. As a result, when people read <puncirippu> today, they see /pun/ + /sirippu/, and not /pun/ + /cirippu/, and hence pronounce it as [punsirippu]. While this is not the exact same as what I’ve proposed for <ṟṟ>, it did strike me as being quite a similar process.

The second case, as I mentioned above, is in Arabic (link to tweet). In Old Arabic, the glyph <ḍ> was pronounced [ɮˤ], a voiced pharyngealized apico-alveolar lateral fricative. The glyph <ḏ> was [ð], and <ẓ> was [ðˤ]. In varieties of Arabic spoken in the Gulf, which retained the interdentals, <ḏ> and <ẓ> merged into [ðˤ]. In varieties spoken in the Egypt-Levantine regions, where the interdentals were lost, <ḏ> and <ẓ> both became [dˤ]. Thus, both glyphs had merged, but the product of the merger was different in different dialects. These two dialectal variants were “reappropriated to make a distinction between the two sounds”, so that in Modern Standard Arabic, <ẓ> is pronounced [ðˤ] and <ḍ> is [dˤ].

Neither of these are perfect parallels. The case of Tamil <ṟ>, as I propose, is one where a formerly allophonic realization of a phoneme, that was written with a phonemic glyph, has been prescriptively generalized to being a general realization of the glyph in all cases, because some of the allophones of the old phoneme have been lost due to mergers. But this is such a specific case that a complete parallel is very unlikely to exist. These two are the closest that we can get.

As a conclusion, I will say that even if this particular proposal doesn’t manage to convince you (and it’s not primarily intended to convince you, it’s intended to put my thoughts to text), please do take seriously the notion that Literary Tamil is not necessarily an older stage of the modern language. As Jean-Luc Chevillard put it in a tweet, “Literary Tamil is a language kept alive through voluntary training, and exists only as an ideal, living a half-life on top of one or several living natural languages. I call it an Embedded Language. The features of that Embedded Language are ideal and live only through the specification implicit in training (as it is available for the Divyaprabandham), or the cultivation of a grammar like the Tolkāppiyam, etc. Descriptive linguists must decide whether they want to study an Ideal Object or to study the substrate language on top of which the Embedded Language lives its half-life.” This is an absolutely beautiful way to describe the position of Literary Tamil. Linguistics, which purports to be descriptivist, must study the substrate living natural language, that is, it must study Spoken Tamil, and not the Idealized prescribed language that is Literary Tamil.

Finally, sources. Information on *ṯ and Proto-Dravidian reconstruction can be found in B Krishnamurti’s The Dravidian Languages (2003), and cognate pairs can be found in the Dravidian Etymological Dictionary (Revised). Just google “Dravidian Etymological Dictionary” and you’ll find the digitised database. For spoken Tamil data, you’ll just have to trust me, a native Tamil speaker.

The clitics =ē and =dān in Tamil

Tamil has two clitics, =ē and =dān (or =tān, the exact shape of the clitic is immaterial to the subject at hand) that are often called “emphatic clitics”, i.e. clitics that mark some sort of emphasis/focus. In this post I will try to synthesize all that I’ve read about the differences between these two, and add some of my own observations. I will mention right at the outset that all examples here will be from my own idiolect. I do not claim that my idiolect/dialect is the standard, nor that what I describe here necessarily holds for all varieties of Tamil. There will certainly be variations, please do inform me of them. Also, unless necessarily, I will not be doing a morpheme-by-morpheme gloss unless necessarily because the focus here is solely on the two clitics and extra information distracts. As for the two clitics in question, I will gloss them as =DĀN and =Ē.

Let me start with =dān, because it’s the easier of the two to describe. Jean-Luc Chevillard (1997) has an excellent paper on this very topic and he details the primary function of this clitic, which is to mark contrastive focus. In his words, it marks “expressivity, contrast and inversion” (“l’effet d’expressivité, de contraste, d’inversion”). Suppose that according to the intuitions of the speakers in a conversation, something A implies the occurence/existence of something B, but in fact in this particular case it is C that followed A, the clitic =dān would be added on C to add focus on the fact that it is inverting the expectations of the speakers (“it is not B, but C that follows A”). Some examples:

(1)    nāḷɛkki nān vara māṭṭēn, en tambi=dān varuvān
GLOSS: tomorrow I come I.will.not, my brother=DĀN he.will.come
TRANS: I will not come tomorrow, it's my brother (emphasis) who will come.

(2) vēlai paṇradu nān=dān, nī illai
GlOSS: work doing I=DĀN, you not
TRANS: It is me (emphasis) doing the work, not you.

(2) displays a typical cleft sentence construction, where the noun is a gerund (i.e., a verbal noun). In such constructions, the clitic =dān is optional, but most often is used. One can read more about these constructions in Slade (2018) (it’s cool stuff but entirely out of topic here).

But this is not the full story (when is it ever?). Schiffman (1999, pg. 192) describes another function of =dān: “[It] often functions in a discourse to indicate that new information is related to old information; it therefore functions as a communicative device that speakers use to establish solidarity”. Schiffman provides the following example:

A: nīṅga yāru? 'Who are you?'
B: nān tamiẓ āsiriyaru. 'I'm a Tamil teacher.'
A: ō, nān=um tamiẓ āsiriyaru=dān. 'Oh, I'm a Tamil teacher too.'

Here, =dān on āsiriyaru serves to mark that this new information (of A being a Tamil teacher) is related to information that has already appeared in the discourse (of B being a Tamil teacher). Here, it is not being used to add contrastive focus – there is no implication that A was assumed to be in a different profession. A is not emphasizing that no, they are not, say, a doctor, but they are a teacher – they are connecting this new information to something that has already been said. In my view, in many of Chevillard’s (1997) examples where he attributes =dān to be performing a contrastive role, it is actually performing this second role. Some examples (retranscribed):

(4)    atai.t=tān nān=um colrēn
GLOSS: that.ACC=DĀN I=also
TRANS: That's what I'm also saying.

(5) atukku.t=tān anācin koṭuttirukkēn=ē
GLOSS: that.for=DĀN aspirin I.have.given=Ē
TRANS: "That's what I've given aspirin for, you know."

In (4), I think it’s clearly used in the second manner that Schiffman describes – someone in the conversation has already said something, and this speaker is relating that which he is saying to the information already said. In (5), it could be adding a contrastive focus as well, but I can easily imagine (5) being a response to a sentence such as “I have a head ache” – the giving of the aspirin is related to the already mentioned information of the first speaker having a headache. As for =ē on the verb, I’ll get to that after this.

A third use of this clitic is on conditionals. With added to a conditional (‘if X VERBs’), it makes it an exclusive conditional, ‘only if X VERBs’. For instance (note =ttān and not =dān):

(6)    nī vandā=ttān nān pōvēn, illɛ=nā pōva māṭṭēn
GLOSS: you if.come=DĀN I I.will.go, if.not go I.will.not
TRANS: I will go only if you come [with me], otherwise I will not go.

Finally, it behaves idiomatically with adverbs of time. Consider:

(7)    nān nētikki=dān iṅga vandēn
GLOSS: I yesterday=DĀN here I.came
TRANS: "I came here just yesterday."

(8) nān nāḷɛkki=dān aṅga pōvēn
GLOSS: I tomorrow=DĀN there I.will.go
TRANS: "I'll go there tomorrow."

I find it difficult to see what contrastive focus =dān is adding here. To me, (7) has the implication that I could have come before yesterday, but I managed to (due to whatever reason) come here only yesterday. (8) has the implication that I can go there even before tomorrow, but I can only manage to go there tomorrow, no earlier. Note that in both these cases, I cannot use the completive aspect (is it an aspect? I’ll keep calling it that here, but I’m not sure):

(9)    **nān nētikki=dān iṅga vanduṭṭēn
GLOSS: I yesterday=DĀN here I.finished.coming
Intended: "I came here just yesterday."

(10) **nān nāḷɛkki=dān aṅga pōyiḍuvēn
GLOSS: I tomorrow=DĀN there I.will.finish.going
Intended: "I'll go there tomorrow."

I believe this is it for this clitic alone. A final thing that I shall mention is that =dān cannot be added to finite verbs, which I believe is due to its origin as the 3rd person reflexive pronoun, tān. Turning now to =ē, =ē is harder to define as clearly as is =dān. It is subject to a greater degree of idiomaticity, something which Chevillard also notes. Here are some examples of its use (and some of =dān to contrast them):

(11)    nān=ē eẓudinēn
GLOSS: I=Ē I.wrote
TRANS: I wrote it myself.

(12) nān=dān eẓudinēn
GLOSS: I=DĀN I.wrote
TRANS: I'm the one who wrote it.

(13) nān kaiyāla.y=ē eẓudinēn
GLOSS: I with.hand=Ē I.wrote
TRANS: I wrote it with my hand itself.

(14): nān kaiyāla=dān eẓudinēn
GLOSS: I with.hand=DĀN I.wrote
TRANS: It's with my hand that I wrote.

As Chevillard (1997) also notes, in (11), =ē adds the implication that I performed the action of writing alone by myself, without anyone else helping me. (12) of course implies that it was me who performed the action, not anyone else. Similarly, while (14) implies that it is with my hand that I wrote, not, say, with my foot or by typing, (13) implies that I wrote directly with my hand, without anything else (I suppose I could have dipped my nails into ink and written without a pen, but that would be terribly messy).

Importantly, =ē can be added on to finite verbs, unlike =dān. Some examples:

(15) nān nēttikki eẓudinēn=ē
GLOSS: I yesterday I.wrote=Ē
TRANS: I did write yesterday.

A: uṅgaḷukku tamiẓ teriyum=ā? 'Do you know Tamil?'
B: ō, teriyum=ē! 'Oh, of course I know Tamil!'

(17): nān nēttikki eẓudalai.y=ē
GLOSS: I yesterday did.not.write=Ē
TRANS: I did not write yesterday, don't you know?

=ē added to verbs has three interpretations. It can indicate emphasis on the content of the verb – very similar to emphasis in English using ‘do’ support, shown by (15). This is frequently accompanied by a rising pitch on the verb. It can also imply ‘of course, as you know’, as in (16). The third interpretation is of indicating that the content of the verb is thought of by the speaker to be something that the addressee ought to already know. This occurs when the verb has a “special intonation pattern that falls, rises again, then falls on the last syllable” (Schiffman, 1999, pg. 193). This is shown by (17), where according to me, the fact that I did not write yesterday ought to be already known to the addressee. The =ē in (5) is also being used in the same way.

=ē also behaves idiomatically with adverbs of time ((7) and (8) repeated for easy comparison).

(18)   nān nētikk=ē iṅga vandēn
GLOSS: I yesterday=Ē here I.came
TRANS: I came here just itself.

(19) nān nāḷɛkk=ē aṅga pōvēn
GLOSS: I tomorrow=Ē there I.will.go
TRANS: I'll go there itself.

(7) nān nētikki=dān iṅga vandēn
GLOSS: I yesterday=DĀN here I.came
TRANS: I came here just yesterday.

(8) nān nāḷɛkki=dān aṅga pōvēn
GLOSS: I tomorrow=DĀN there I.will.go
TRANS: I'll go there tomorrow.

(7) implies that I could have come yesterday, but due to whatever reason I could only come here today (it emphasizes the lateness of my arrival). (18) emphasizes on the other hand the earliness of my arrival – I arrived yesterday itself, unlike someone else who came only today. The same happens in the future – (8) emphasizes the tardiness of my departure (I’m only going there tomorrow, not today), (19) emphasizes the earliness (I’m going there tomorrow itself, unlike someone else who’s leaving next week). And unlike =dān, =ē can be used with the completive.

(20) nān nētikk=ē iṅga vanduṭṭēn
GLOSS: I yesterday=Ē here I.finished.coming
TRANS: I had already come here yesterday.

(21) nān nāḷɛkk=ē aṅga pōyiḍuvēn
GLOSS: I tomorrow=Ē there I.will.finish.going
TRANS: I'll have gone there tomorrow.

This is a natural consequence of what each clitic emphasizes. =ē emphasizes the earliness, that I have already performed the action, and the completive, well, implies that the action is completed. Both are complementary. =dān on the other hand emphasizes the tardiness, the emphasis is away from my completion of the action. That’s how I view this.

=ē can also be used on finite verbs to emphasize polarity or the aspectual content.

(22) nān eẓudalai
GLOSS: I did.not.write
TRANS: I did not write.

(23) nān eẓuda.v=ē eẓudalai
GLOSS: I write.INF=Ē did.not.write
TRANS: I did not write at all.

(24) nān eẓudinadillai
GLOSS: I having.written.not (have.not.written)
TRANS: I have not written.

(25) nān eẓudinad=ē.y-illai
GLOSS: I having.written=Ē-not
TRANS: I have never written.

(26) nān eẓudiṇḍu-irkēn
TRANS: I am writing.

(27) nān eẓudiṇḍ=ē.y-irkēn
GLOSS: I writing=Ē
TRANS: I keep writing./I continuously write.

In (23), the construction implies “absolute impossibility of the occurrence of the action, event state” (Lehmann, 1993, pg. 157). Kannada and Telugu also utilize such reduplicative constructions with their cognate =ē clitics (Krishnamurti, 2003, pg. 415). In (24), without the clitic, the verb is a perfect/resultative negative; in (25), it implies that I have never written. (27) emphasizes the durativity of the action, that I’m continuously writing.

These are all the functions of bare =ē to my knowledge (except one, which I will discuss later). Aside from just one clitic being used, both can be chained, as =ē=dān and =dān=ē. Let me tackle =dān=ē first. It has some idiomatic functions, such as in conditionals as in (28), where it has a meaning of ‘if only you had VERBed!’, with the additional implication of ‘I told you so!’ (Schiffman, 1999, pg. 163). Aside from this, it’s primary function is to combine contrastive focus and the implication that the addressee ought to know the information. (29) demonstrates that. With a rising pitch, it forms tag questions, such as in (29).

(28)   nī eẓudinā=ttānē!
GLOSS: you if.write=DĀN=Ē
TRANS: If only you write/had written/will write!

(29) nī=dān=ē eẓudināy.
GLOSS: you=DĀN=Ē you.wrote
TRANS: You're the one who wrote it, you know.

(30) nī=dān=ē eẓudināy?
GLOSS: you=DĀN=Ē you.wrote (rising pitch)
TRANS: You wrote it, didn't you?/It was you who wrote it, right?

With adverbs of time, =dān=ē functions similarly as =dān, with the additional aforementioned (and intonation-dependent) implication.

Finally, there is =ē=dān. I use =ē=dān the least out of these four possibilities (=dān more than =ē more than =dān=ē more than =e=dān), so I’m less sure of its implications than for the rest. I believe that it combines the emphases in (11) and (12). Examples of all four possibilities together:

(31) nān=ē eẓudinēn
GLOSS: I=Ē I.wrote
TRANS: I wrote it myself.

(32) nān=dān eẓudinēn
GLOSS: I=DĀN I.wrote
TRANS: I'm the one who wrote it.

(33) nān=dān=ē eẓudinēn
GLOSS: I=DĀN=Ē I.wrote
TRANS: I'm the one who wrote it, you know./I'm the one who wrote it, right?

(34) nān=ē=dān eẓudinēn
GLOSS: I=Ē=DĀN I.wrote
TRANS: I'm the one who wrote it myself.

I think that (34) has the implication that it is I who wrote (and not anyone else), and also that I wrote it personally, without any help from anyone else. I’m unclear on the functions of =ē=dān on adverbs of time, since =ē=dān is not something I use very often; I use it very little in usual speech, and I think I would never add it to an adverb of time.

Finally, for something completely different, there is the one function of =ē that I have left for the last. I did so since this function is very different from the rest that I have described here, which broadly deal with various kinds of focus. =ē can also be used a relativizer. (35) demonstrates the unmarked relativizing strategy in Tamil using hypotaxis, using an infinite verb often called a ‘participle’. (36) demonstrates another strategy using parataxis – this is usually used for indefinite relatives (‘whoever’, ‘whichever’, ‘whatever’, etc.). Finally, (37) demonstrates the relativizing strategy involving =ē.

(35)   nān [nēttikki pātta] paiyana innikk=um pāttēn
GLOSS: I [yesterday see-PAST-RELATIVE] boy-ACC today=also I.saw
TRANS: I saw the boy [whom I saw yesterday], today as well.

(36) nān enda paiyana nēttikki pāttēn=ō, avana innikk-um pāttēn.
GLOSS: I which boy-ACC yesterday see=Ō, he-ACC today=also I.saw
TRANS: Whichever boy I saw yesterday, I saw him today as well.

(37) nān nētikki oru paiyana pāttēn=ē, avana innikk=um pāttēn
GLOSS: I yesterday a boy-ACC I.saw=Ē, he-ACC today=also I.saw
TRANS: The boy whom I saw yesterday, I saw him today as well.

It isn’t difficult to see how this might have developed. As mentioned earlier, =ē adds an implication that the addressee ought to know the information already, as in ‘you know?’. The sentence in (37) might have originated as two sentences, “You know the boy I saw yesterday? I saw him today too”. In fact, I think it can be interpreted even now as such a construction, but I’ll defer to Lehmann’s (1993, pg. 353) syntactic description of this, since I know very little of formal syntax.

[edit: later addition to post]

Just today (16th July 2020, about a week after posting this originally), I came across a very interesting book chapter titled ‘Grammaticalization of Rhetoric Questions in Tamil’, by Susan Herring (1991). She calls =ē a “rhetorical tag marker”, and provides an example where “its role has shifted from that of a clause-final particle to a particle which may relate an attribute to a nominal head within a clause” (reglossed in the same way I’ve glossed everything here, see her chapter for her gloss).

(38) nāṉ pōy, [avaḷ niṉṟiruntāḷ=ē] anta iṭattil at=ē mātiri niṉṟu, kaṭalai veṟittu pārkkiṟēn
GLOSS: I having.gone, [she was.standing=Ē] that that=Ē manner having.stood, ocean-ACC having.stared I.see
TRANS: I went, and stood in the same place [where she had stood], and stared at the ocean in the same way as her.

Quoting Herring (1991): “Here the finite clause ‘she had stood’ modifies the noun phrase ‘that place’, with the suffix –ē indicating the subordinate relationship of the former to the latter; i.e. –ē translates the English relative pronoun ‘where’. We may further observe that the relativized clause is entirely embedded within, rather than simply preceding, the matrix clause. Behavior of this sort is associated with the participial RC [Relative Clause] type, but not with the tag type, which tends to preserve the order ‘old information’-‘new information’, and to present information one clause at a time. This fact alone indicates that –ē has undergone a qualitative shift in function in the direction of increased grammatical autonomy.”

She provides more evidence for this being a legitimate relativizing strategy through an argument involving prosody. Consider the example:

(39)   nāṉ appō coṉṉēn=ē uṅgaḷukku puriñcat=ā?
GLOSS: I then I.said=Ē, is.understood=QUES
TRANS: Do you understand (now) what I said then?

Again quoting Herring: “As in the case of WH- conjunctions, prosodic cues provide additional support for the grammaticalized status of –ē. In example [(39)] above, there is no break between coṉṉēn=ē and uṅkalukku, whereas if we were to literally interpret the first half as a tag question, we would expect either a pause or deceleration at that juncture. Moreover, the utterance is characterized by a single, rather than a two-part, intonation contour. While normal intonation for the tag –ē is rising-falling, the intonation for [(39)] is mid-high and level throughout, rising only at the end of the sentence to signal the Yes-no question. On the basis of this and the other types of evidence mentioned, it is clear that the suffix –ē must be accorded the status of a full-fledged relativizer.”

These are convincing to me. Her entire chapter is truly fascinating, I encourage the interested to read it! Much of it is unrelated to the two clitics that I’m concerned with in this post so I won’t go over it here.

[edit over, continuation of original post]

One final use of =ē as a complimentizer is with the postposition tavara ‘except, though’.

(40)   nān COVID-nāla mūnu māsamā nāḷ muẓukka vīṭṭula.y=ē okkāṇḍirukkēn=ē tavara, onn=um perisā sādikkalai

GLOSS: I because.of.COVID three months-ADV day entire in.home=ĒĒ though, one=also big-ADV did.not.achieve

TRANS: Though I've sitting at home the entire day for three months because of COVID, I haven't achieved a single thing (sad emoji face).

The =ē added on to vīṭṭula ‘in the house’ emphasizes that I have been completely inside my house, I haven’t gone out at all (well, except for groceries). =dān here would be emphasizing the fact that I’m sitting in my house, not in someone else’s (though I should be emphasizing that too, visiting someone is dangerous right now).

Those are all the functions of =ē and =dān/=tān that I know of. Do let me know if I have missed something, or if there are variations in whatever I have described here, in your varieties. Most of the above is from a combination of Chevillard (1997), Schifman (1999) and Lehmann (1993), very little are my additions. I was very confused about a lot of this until I read these sources, hence the decision to combine it all into one source for future reference.

Finally, there is the matter of diachronics. Chevillard discusses briefly about this. According to him, Old Tamil had only the clitic =ē, =dān had not grammaticalized from the reflexive pronoun yet. Interestingly, =ē in Old Tamil more often has the meaning that is today ascribed to =dān. Old Tamil grammatical tradition names five functions of =ē: tēṟṟam ’emphatic affirmation’, viṉā ‘interrogation’, pirinilai ‘contrast’, eṇ ‘enumeration’ and īṟṟacai ‘final punctuation’. In Modern Tamil, the newer =dān has taken over two of the functions, specifically tēṟṟam ’emphatic affirmation’ and pirinilai ‘contrast’.


  1. Chevillard, Jean-Luc. (1997). Les particules énonciatives -ee et -taa en tamoul. Faits de langues n°10, 201-208.
  2. Herring, S. C. (1991). The grammaticalization of rhetorical questions in Tamil. In E. Traugott & B. Heine (Eds.), Approaches to Grammaticalization, Vol.1 (pp. 253-284). Amsterdam: John Benjamins.
  3. Krishnamurti, Bhadriraju. (2003). The Dravidian Languages. Cambridge: Cambridge University Press.
  4. Lehmann, Thomas. (1993). A Grammar of Modern Tamil. Puducheri: Pondicherry Institute of Linguistics and Culture.
  5. Schiffman, Harold (1999). A Reference Grammar of Spoken Tamil. Cambridge: Cambridge University Press.
  6. Slade, Benjamin. (2018). History of focus-concord constructions and focus-associated particles in Sinhala, with comparison to Dravidian and Japanese. Glossa: a journal of general linguistics, 3(1): 2, 1–28.


The dative, locative and ablative in Tamil

A couple of weeks ago, while thinking of dialectal differences between the variety of Tamil I speak at home and the dialect that I learnt in school in Chennai from my peers, I realized that there was a systematic difference in how the so-called “locative” works between these two varieties. This post is borne out of those thoughts. I’ll describe the differences between my home variety (through my idiolect) and the common variety that Schiffman (1999) describes in his grammar of spoken Tamil (he claims to describe what he calls ‘Standard Spoken Tamil’).

Let’s start Schiffman’s description (Standard Spoken Tamil = SST). The “dative” case suffix is –kku; it is used for motion towards (as an allative), the beneficiary of an action (the English preposition ‘for’), and to mark the experiencer/subject in dative-subject constructions, which are rather common in South Asian languages. But, its use to mark an indirect object is more complicated, and I’ll get to that in just a moment.

The locative is where most of the complication lies. According to Schiffman, SST displays Differential Argument Marking for the locative – i.e., human and non-human nouns take different suffixes. For non-human nouns, the locative is a simple suffix –la, which performs the usual functions that locatives generally do. For non-human nouns, according to him, the locative suffix is –kiṭṭa, and means ‘in the possession of, on the person of’. For human nouns, according to him, the human locative –kiṭṭa and the dative –kku can both be used to mark human possessors and human indirect objects.

In possession, a sentence such as “I-HUM.LOC money is” translates to “I have money (temporarily on my person)”, while “I-DAT money is” translates to “I have money (I am a rich person”. The human locative implies “temporary possession or actual real-time possession, while use of dative implies permanent, habitual, or inalienable possession”.

To mark human indirect objects to a verb such as ‘to give’, the human locative implies that the object is being given back to a person who originally owned it, while the dative implies that the ownership is being transferred irrevocably. For verbs such as ‘to say’, according to Schiffman, the human locative implies that the speaker is deferential to the addressee, while the dative implies a more “direct, blunt, brusque” tone.

Note that these are only for human possessors and indirect objects: non-human indirect objects are marked always with the dative.

Finally, the ablative is marked by an affix added on to the simple locative (for non-human nouns) and on to the human locative (for human nouns). This is due to diachronic reasons; the ablative in Modern Tamil is grammaticalised from a construction of ‘having been at X’, where ‘having been’ is a converb form of the verb ‘to be, stay’. So, ‘having been at my house’ has grammaticalized into ‘from my house’: it’s this converb of ‘to be’ that has phonetically reduced into an affix added to the locatives to form the ablative. The exact form of this suffix differs between varieties, and it’s not important for this post, so I’m not going to mention its form.

Now, I agree with Schiffman’s analysis of the dative and also of the ablative, but I believe he misanalyses the locative. In my view, the supposed non-human locative –la and the supposed human locative –kiṭṭa are entirely different suffixes; the latter cannot be considered a locative at all. For one, –kiṭṭa as a suffix can also appear on non-human nouns, in which case it means ‘near, next to’: house-LOC ‘at/in the house’ & house-kiṭṭa ‘near/next to the house’. I believe that in SST, the locative –la can only appear on non-human nouns, while –kiṭṭa is an entirely distinct case which marks proximity. For non-human nouns it marks physical proximity as ‘next’ or ‘next to’, while for human nouns it marks possession, as in ‘on my person’. Since the locative cannot be used for human nouns, –kiṭṭa acts as its functional counterpart for humans. Hence, the ablative is formed on top of -la for non-humans and of –kiṭṭa for humans.

That’s the analysis of SST. Now on to my idiolect (MI). MI has the same dative, –kku, which performs all the same functions as SST, with the exception that it is never used to mark a human indirect object, unlike in Schiffman’s SST which uses it to mark human IOs in certain semantic conditions.

The locative in MI is very different from that of SST. Locative on non-humans is –la here as well, with no differences from SST. For humans, I argue that MI actually does have Differential Agent Marking in the locative, with the suffix –ṭṭa (diachronically from *-kiṭṭa with loss of the –ki– syllable). While the suffix -kiṭṭa can appear on both humans and non-humans in SST, its counterpart in MI –ṭṭa can only appear on human nouns; using -ṭṭa on a non-human noun implies that the object has some sort of animacy. Human indirect objects must all be marked with –ṭṭa, while non-human IOs continue to be marked with the dative –kka. Finally, the dative is formed in the same way, by adding an element to the locative –la for non-humans and –ṭṭa for humans.

The situation in MI has clearly developed from the situation that still exists in SST – MI is more innovative. One can view SST as in the transitional process of extending –kiṭṭa as the marker of indirect object for all human nouns. MI (aside from reducing *-kiṭṭa to –ṭṭa) has completed that process. It has made –ṭṭa the human counterpart of the locative –la; since it cannot be used on non-humans, it behaves as a true human locative. MI has replaced –kiṭṭa on non-humans with another postposition –pakkattula meaning ‘close, near by’.

To summarize, here’s the situation in SST as described by Schiffman:

FunctionsHuman nounsNon-human nouns
Motion towards,
experiencer verbs,
Indirect objectkiṭṭa (returning object)
kku (ownership transfer)
Possessorkiṭṭa (alienable)
kku (inalienable)
‘near’/’on person’kiṭṭa (on X’s person)kiṭṭa (near, close by)
The situation in SST

And here’s the situation in my dialect:

FunctionsHuman nounsNon-human nouns
Motion towards,
experiencer verbs,
Indirect objectṭṭakku
Possessorṭṭa (alienable)
kku (inalienable)
Locativeṭṭa (‘on X’s person’)la (‘in/at/on)
‘near/close by/next to’pakkattulapakkattula
The situation in MI

This is to me one of the most noticeable differences between my dialect and SST.

The numeral 9 in Dravidian

The numeral 9 in Dravidian is rather strange, both in the base numeral 9, and also its multiples by 10 and 100, i.e., 90 and 900. Before I explain how it is strange, I should mention that reconstructing numerals in Dravidian is very difficult because outside South Dravidian and Telugu, all the other languages, spoken by underprivileged tribes, have borrowed higher numerals from the neighbouring Indo-Aryan and in some cases Munda languages. The highest native numeral varies from language to language, but the highest native numeral in any tribal language not in South Dravidian is 7. Data for reconstructing the numeral 9 is therefore restricted to South Dravidian and Telugu, a fact that must always be kept in mind while dealing with this.

First and foremost, let me present the data (data obtained from Andronov (2003), Emeneau (1984) and Krishnamurti (2003)):

‘9’ in Dravidian: Old Tamil oṉpatu/toṇṭu, Tamil ombadu, Malayāḷam ombadu, Kota onbād/orbād, Toda wïnboθ, Koḍava oymbadï, Kannaḍa ombattu, Tuḷu ormba, Telugu tommidi

’90’ in Dravidian: Tamil toṇṇūru, Malayāḷam toṇṇūṟu, Kota tombat, Toda ēṇboθ, Koḍava tombadï, Kannaḍa tombattu, Tuḷu soṇpa/sonpa, Telugu tombhay1

‘900’ in Dravidian: Tamil toḷḷāyiram, Malayāḷam toḷḷāyiram, Toda wïnbonūṟ, Koḍava ombainūrï, Kannada ombhaynūru, Tuḷu ormbanūdu, Telugu tommannūru/tommidi vandalu

’10’ in Dravidian is reconstructed as *paḥ/*paḥ-tu, with the derivative -tu suffix added. ‘100’ is reconstructed as *nūṯu, which becomes *nūṟu in South and South-Central Dravidian (Krishnamurti, 2003).2

From this, three groups of languages emerge. The first group is Tamil-Malayalam, which form the numeral 9 using a “one less than 10” construction: *onpaḥtu, where *on– can be related to *on– in *onṯu ‘1’, and *paḥtu is ’10’. This group constructs ’90’ by prefixing *toḷ– to *nūṟu ‘100’, and ‘900’ by prefixing *toḷ– to āyiram ‘1000’.3 However, note that Old Tamil does attest another basic numeral 9 as *toṇṭu, which appears to be derived from the root *toḷ-.4

The second group is of Kota, Toda, Koḍava, Kannaḍa, and Tuḷu. These languages also form ‘9’ using the construction of “one less than 10”. The exact form is reconstructable to *onpaḥtu for Kota, Koḍava and Kannaḍa. Toda ēṇboθ is difficult to explain; Emeneau (1984) leaves it unexplained. In the case of Tuḷu, the initial orm– may be related to the adjectival oru ‘1’, and the final –ba is clearly from *paḥ ’10’. The alternative word for ‘9’ in Kota, orbād, may also have the same *oru prefixed. In any case, while the precise construction is not common, the pattern is visibly the same. However, the major difference between the first and this group is in their constructions of ’90’. 90 in this group is constructed by prefixing *toḷ– to *paḥtu ’10’, or *paḥ ’10’ in the case of Tuḷu. Notice that while Tamil-Malayalam form ’90’ by prefixing *toḷ– to the numeral 100, the Kota-Tuḷu group prefixes *toḷ– to the numeral 10! Finally, this group constructs 900 through transparent compounds of the basic numerals and the numeral 100.

The third group consists of only Telugu, which constructs the basic numeral 9 with what appears of be a compound of *toḷ and padi ’10’: *toḷpVdi > *toṇbVdi > *tombVdi > Modern Telugu tommidi ‘9’.5 It then constructs 90 by prefixing what appears to be the same *toḷ– to *paḥ to arrive at tombhay ’90’. While Telugu 90 is identical to Kota-Tuḷu 90, its 9 is markedly different. Finally, Telugu 900 is also identical to Kota-Tulu 90 in that it is also a transparent compound of the basic numeral 9 and 100.

Two puzzles emerge from these patterns. The first is the difference in construction of 90 between the first group on one hand and the second and third groups on the other, and the second is the difference between construction of 9 between the first and second groups on one hand and the third group on the other.

I’ll first look at the first puzzle. How do earlier studies try to explain it? Zvelebil (1977) merely reconstructs *toḷ/*toṇ as the basic numeral ‘9’, and does not try to explain the difference in the numeral 90 between Tamil-Malayalam and Kota-Tuḷu. Krishnamurti (2003) reconstructs for the root *toḷ/*toṇ two meanings – ‘nine’ and ‘nine-tenths’. However, this is unsatisfactory. If indeed *toḷ was originally ‘9’, what could motivate a semantic shift from a basic numeral such as 9 to a fraction such as ‘nine-tenths’? I imagine such a semantic shift is cross-linguistically very uncommon. If one proposes the vice versa, that *toḷ was ‘nine-tenths’ and that it underwent semantic shift to mean ‘9’, why would a language contain a root for a relatively complex fraction such as ‘nine-tenths’ but not similarly complex fractions such as ‘eight-tenths’ or ‘seven-tenths’? Finally, Andronov (2003) does not attempt to explain these patterns at all.

I shall now present my hypothesis. I believe that the original root for the numeral 9 was indeed *toḷ, with 90 being constructed by a transparent compound of *toḷ and *paḥ or *paḥtu ’10’. In South Dravidian (encompassing the aforementioned first and second groups), 9 was replaced by a subtractive construction of “one less than 10”, which in all languages except Tuḷu can be reconstructed to *on-paḥ-tu, as mentioned earlier. In Tulu also, the final –ba can be identified as *paḥ. The original numeral still survived in Old Tamil as toṇṭu, though likely only in a literary register, having mostly been replaced by oṉpatu.

In Tamil-Malayalam, I believe that a hypothetical earlier **toṇpaḥtu was replaced by a new construction of *toṇṇūṟu due to analogy with oṉpatu ‘9’. By this point, the original basic numeral toṇṭu might have become archaic or disappeared entirely; while it does survive in Old Tamil, its use is restricted and not as widespread as oṉpatu, it may hence have been a conservatism in poetry but no longer a feature of regular speech. As a result, while the other multiples of 10 such as *aympaḥtu ’50’, aṟupaḥtu ’60’, eṇpaḥtu ’80’ etc. were morphologically transparent, this hypothetical **toṇpaḥtu would have been opaque, as the numeral 9 at this stage was *onpaḥtu. Given that *onpaḥtu ‘9’ is constructed by prefixing an element to the next highest multiple of 10, here ’10’ being *paḥtu‘, this pattern was extended to 90 as well. Hence, **toṇpaḥtu was replaced by an analogical *toṇṇūṟu, constructed by prefixing an element to the next highest multiple of 10, here the multiple being 100, *nūṟu. During this process of backformation, the prefix element of *toṇ– was maintained.

Why did this occur? Perhaps because 90 was the only multiple of 10 that was morphologically opaque, and hence anomalous: 9 was *oṉpaḥtu and 90 was **toṇpaḥtu. To bring some order, the pattern in 9 (a prefix added to next highest multiple of 10) was extended to 90 as well. Subsequently, this pattern was then extended to 900, constructing *toḷḷāyiram ‘900’. In the Kota-Tuḷu group, there was no such productive (or even semi-productive) pattern, and hence 900 was formed using a transparent compound of 9 and 100, which for Toda, Koḍava, Kannaḍa and Tuḷu can be reconstructed as *onpaḥnūṟu.

Now we come to the second puzzle, that of Telugu tommidi ‘9’. I think that the replacement of the earlier numeral *toḷ– by “one less than 10” occurred only in South Dravidian, while Telugu is a South-Central Dravidian language which has undergone adstrate influence from South Dravidian, primarily the literary languages of SCDr (Krishnamurti, 2003, pp. 497, 211). In Telugu, I think that it was South Dravidian influence that caused padi ’10’ to be suffixed to *toṇ ‘9’, in analogy to SCDr *on-paḥtu with a (perhaps morphologically opaque) prefix added to *paḥtu ’10’. This hypothesis is supported by the fact that the numeral 8 in Telugu is enimidi – the pattern of prefixing the original numeral to padi ’10’ was a pattern extended not only to 9, but also 8 in Telugu. Note that 8 in Dravidian is reconstructed to the root *eṇ-, cf. Tamil eṭṭu, Kannada eṇṭu, Tulu eṇma/enma.6 The original 90 in Telugu, tombhay, survived untouched. 900 in Telugu was constructed by the same process as it was in the second group (Kota-Tuḷu).

I believe that this explains all of the data. One explanation that I have come across for Telugu having tommidi as 9, versus South Dravidian (mostly) having *onpaḥtu, is that these prefixes are the same: specifically that in South Dravidian, *ton– > *con– > *on– (note lack of retroflexes). However, both the stages of this proposed change are suspect. Firstly, it requires that initial *t– have affricativized to *c– throughout Proto-South-Dravidian. While *c– > *t– is indeed known in Dravidian, *t– to *c– is much more sporadic a change. Emeneau (1988) devotes a two-page appendix to this in his study of Proto-Dravidian *c. In summary, most changes of *t– to *c– in Dravidian are either assimilations to a succeeding palatal in the word, or due to a following front vowel (/i/ or /e/). Unexplained instances of *t– > *c– are sporadic, with individual words undergoing the change in at most one or two languages. Note that Tuḷu sonpa ’90’ ( < *toṇpaḥ) is due to a regular merger of *t– and *c– in Common Tulu; it is unrelated these sporadic instances of *t– > *c-.

Secondly, while lenition of *c– is indeed well-documented in South Dravidian, it is an isogloss that also extends to Telugu; all words with this change in the former also undergo it in the latter. If *c ( < *t– ) was lenited in South Dravidian, why wasn’t it in Telugu? There is also no evidence for even a single instance of a chain shift of *t– > *c– > *-. Further, under this explanation, the difference in the numeral 90 are even harder to explain.

While I am relatively confident of my explanation for the Tamil-Malayalam 90, I’m much less sure of Telugu tommidi ‘9’. If we had data from other South-Central Dravidian languages, this would be much easier, but alas they’ve all borrowed their numerals even for 9, much less 90.


  1. Telugu tombhay indeed has a breathy voiced /b̤/ there, at least in the standard language. Krishnmurti explains this as being due to the laryngeal in *paḥ; it is in fact one of his primary arguments for the existence of the laryngeal /ḥ/ in Proto-Dravidian.
  2. Proto-Dravidian intervocalic *-ṯ- became a rhotic *-ṟ- in South and South-Central Dravidian, distinct from *-r-. *-ṟ- remained distinct from *-r- in older stages of Tamil, Kannada and Telugu, but has merged with the reflex of *-r- today in these three. /ṟ/ and /r/ remain distinct in Malayalam and some dialects of Tamil (hence why I transcribe ‘100’ as <nūṟu> for Malayalam and <nūru> for Tamil). In Tuḷu, *-ṟ- becomes either -d- or -j- (cf. Tuḷu nūdu ‘100’), I do not know the conditioning environments.
  3. Clusters of *ḷ and a plosive lead to the *ḷ nasalising into *ṇ (similarly for *l > *n before plosives). *toṇ- and *toḷ- are therefore the same root.
  4. A beautiful example of Old Tamil toṇṭu is in Paripāṭal 3:79 (Rajam, 1992). The entire poem in which it occurs is translated here (one has to CTRL + F for “75”, the first result is this poem).
  5. Telugu padi seems to be an extention of the oblique stem of pattu ( < *paḥtu), padi-n- to the nominative, cf. Tamil pad-in-ēẓu ’17’ and Kannada had-in-āru ’16’.
  6. Intervocalic -ṇ- and -ḷ- deretroflex to become -n- and -l- in Telugu; they do survive in other circumstances.


  1. Andronov, Mikhail. (1977). A Comparative Grammar of the Dravidian Languages, Moscow: Institute of Oriental Studies, Academy of Sciences, pp. 143-146.
  2. Emeneau, M. B. (1984). Toda Grammar and Texts, American Philosophical Society, pp. 105.
  3. Emeneau, M. B. (1988). Proto-Dravidian *c- and Its Developments. Journal of the American Oriental Society, 108(2), 239-268.
  4. Krishnamurti, Bhadriraju. (2003). The Dravidian Languages, Cambridge University Press.
  5. Rajam, V. S. (1992). A Reference Grammar of Classical Tamil Poetry, American Philosophical Society, pp. 445.
  6. Zvelebil, Kamil. (2003). A Sketch of Comparative Dravidian Morphology: Part One, Mouton, pp. 33.


Dr. Shahzaman Haque has translated a French song, Le déserteur, written by Boris Vian, into Urdu. To read the lyrics of the original French song and Dr. Haque’s Urdu translation, see his tweet here. All credits for the translation go to Dr. Haque. I wanted to transcribe the translation into Devanagari, so here it is:


मैं आप को लिख रहा हूँ ख़त
आप शायद पढ़ लें
गर हो फ़ुर्सत।

अभी अभी मुझे
मिले फ़ौजी हुक्म नामे
कि जाऊँ महाज़-ए-जंग पर
बुध रात से पहले।

मैं चाहता नहीं जंग लड़ना
मक़्सद नहीं मेरा
मासूमों को मारना।

आप ख़फ़ा ना हों
मगर लाज़िम है बताना
है पुर-ए-अज़्म-ओ-पुख़्ता-ए-इरादी
कि मैं बनूंगा फ़िरारी।

जब से हुई मेरी पैदाइश
बाप को मरते हुए देखा
भाईयों को बिछड़ते हुए देखा
और बच्चों को रोते हुए देखा।

मेरी माँ मर गयी रंज में
अब लेटी है क़ब्र में
कोई परवाह ना फ़र्क़ है उसे
बम फटे या बिजली गिरे।

जब मैं बना क़ैदी
वो मेरी बीवी को ले गये
वो मेरी रूह को ले गये
और मेरे पूरे माज़ी को।

कल सुबह सवेरे
मैं अपना दर्वाज़ा करूंगा बन्द
साल-ए-ख़ुर्दा के मुँह पर
और सड़कों पे लगाऊँगा चक्कर।

मैं फ़क़ीरी करूंगा
फ़्रान्स की सड़कों पर
ब्रिटनी से प्रोवेन्स तक
लोगों को आगाह करूंगा,

हुक्म मत मानो
साफ़ इन्कार करो
जंग मत जाओ
घर छोड़ कर ना जाओ।

गर किसी को दैना है ख़ून
तो आप अपना ख़ून दें
आप हैं सच्चे तक़ी।

गर हुआ मेरा तआक़ुब
तो कह दें सिपाहियों से
मैं रहूंगा निहत्था
और वो गोलि चला दें।

Here’s a glossary for some Perso-Arabic vocabulary that may be unfamiliar:

फ़िरारी: deserter
जनाब-ए-सद्र-ए-आली: Dear esteemed President
फ़ुरसत: reprieve, free time
महाज़-ए-जंग: battlefield
ख़फ़ा: displeased, irritated
लाज़िम: necessary
पुर-ए-अज़्म-ओ-पुख़्ता-ए-इरादी: full of conviction and strong of intention
रंज: grief, sorrow
क़ब्र: coffin
माज़ी: past
साल-ए-ख़ुर्दा के मुँह पर: on the face of the departing year
आगाह: aware, alert
तक़ी: pious
तआक़ुब: pursuit, chase
निहत्था: armless, without arms

Yes, I know निहत्था is not a Perso-Arabic borrowing but I included it in the glossary anyway.

I wanted to make a comment on the last paragraph in the translation. In the original French, the connotation is one of pacifism and the speaker intending to not take up arms against the state, but he never comes off as suicidal. Here in the translation, it comes off more as him wanting to die. So, I suggest a slight modification to the last paragraph, which now rhymes as well:

गर हुआ मेरा तआक़ुब
तो सिपाहियों से कह दें
मैं रहूंगा निहत्था,
और वो गोलि चला सकें।

I’m unable to type this in the Perso-Arabic script here due to line order issues (it being right-to-left and all).

யாதும் ஊரே; யாவரும் கேளிர்

This is my attempt at a gloss and line-by-line analysis and explanation of Puṟanāṉūṟu 196, more commonly known by its first line, Yātum ūrē yāvarum kēḷir.

I’ll analyse the poem not line-by-line but phrase-by-phrase, lines in poems are decided by metres and such stuff which are not of importance when trying to understand it. I use the translation and interpretation from Chenthil Nathan at Old Tamil Poetry. All credits to him for the translations; only the glosses and morphological analyses are by me. I couldn’t do any of this were it not for his excellent translation, anyhow.

I use the ISO standard transcription except for one change – I use instead of .

Line 1: yātu-um ūr-ē, yāvar-um kēḷir
Translation:  Every town’s our home town; every man, our kinsman
Explanation: The behaviour of the coordinative clitic -um is difficult to explain. When added to an interrogative pronoun/adverb like here, it makes it 'any', or here rather translated as 'all'. -ē is an emphatic clitic, used often in Old Tamil on predicates. The noun ūr is very versatile; it means 'city' or 'town', but it has the connotation of 'hometown'. You see cities across southern India with the Dravidian suffix -ūr (eg., Bengaḷūru, or Bangalore).

Note: From now Coordinative clitic glossed as "COORD" because it'll occur a lot.

Line 2: tī-tu-um naṉ-ṟu-um piṟ-ar tara vār-ā
Translation: good and evil happen not because of others
Explanation: Think of it as "evil and good do not come given by others". naṉṟu is nal 'good' (cf. Modern Tamil nalla) + -tu 'non-human suffix'. naṉṟu can mean 'a good thing' or the concept of goodness itself.

Line 3: nō-tal-um taṇi-tal-um ava-ṟṟu ōr aṉṉa
Gloss: to.feel.pain-VERBAL.NOUN-COORD they.NONHUMAN-OBLIQUE one like
Translation: pain and relief happen on their own 
Explanation: Literally more like "suffering and relief are also similar" (similar to good and evil, in the previous line in not being due to others). avaṟṟu is the oblique of avai, the non-human plural third person pronoun. ōr is 'one' & aṉṉa is a particle of comparison (Rajam, 1992), and together they mean 'similar'.

Line 4: cā-tal-um putu.v-atu aṉ-ṟu-ē
Translation: dying isn't something unknown
Explanation: "Dying also isn't something new". aṉṟē = product of Sandhi from al-tu-ē, where al- is the other negative copula alongside il-, and -tu- is the 3p singular non-human marker. il- and al- are two different verb roots.

Line 5: vāẓ-tal iṉi-tu eṉ-a makiẓ-ntu-aṉ-ṟu-um il-am-ē
Translation: neither do we rejoice that life is a joy
Explanation: "We do not rejoice/are not happy that (just because) life is joyous/sweet". Eva Wilden (2018, pp. 151) calls this manner of negation as "negation of fact" - literally means "we were without rejoicing" = "we did not rejoice". The -aṉ- affix there is an euphonic particle in Old Tamil; more on that below.  Also, the use of the verb en- meaning 'say' as a complementizer is a Dravidian feature.

Line 6: muṉivu-iṉ, iṉ.ṉ-ā-tu eṉ-ṟal-um il-am-ē
Translation: nor in disgust, do we call it a misery
Explanation: "we do not say that living won't be joyous, out of disgust". muṉivu can have several meanings - Chenthil Nathan translates as 'disgust' and I keep that, this stuff is beyond me. See his blog post for comments on this word. Also, Eva Wilden (2018, pp. 151) calls the manner of negation here - the negation of a verbal noun - as "negation of action".
The next few lines are description of a metaphor. It is to be noted that here, as is common in Old Tamil poetry, most of the work is done by syntax and there is not much morphology. Also, Chenthil Nathan translates this entire section in exactly the opposite syntax of the original: the first line of the original is the last bit of his translation of the metaphor. See Chenthil's post for comments on his translation of the metaphor.

Line 7: miṉ.ṉ-oṭu vāṉam taṇ tuḷi talai-i
Gloss: lightning-SOCIATIVE sky cool drops to.rain-CONJUNCTIVE
Translation: after a downpour from lightning slashed skies
Explanation: "cool drops having rained (down) from lightning-streaked skies". talai here is not the word 'head' as is common, but the verb 'to rain'. The -i here is a conjunctive suffix, also known as adverbial participle.

Line 8: āṉātu kal poru-tu iraṅku-um mallal pēr yāṟ-ṟu nīr vaẓi-p.paṭu-ū'um puṇai pōl
Gloss: be.content-NEGATIVE.CONJUNCTIVE stone collide-CONJUNCTIVE descend-PARTICIPLE strong large river-OBLIQUE water make.way-PARTICIPLE boat like
Translation: like rafts following the course of a mighty river clattering over rocks
Explanation: "like rafts making their way across a ferocious strong, large river that descends, colliding with rocks". VS Rajam (1992, pp. 889) analyses āṉātu as being a negative adverbial participle/conjunctive of a verb 'to be content/calm', i.e., 'without being content/calm'. Chenthil translates it in another poem as 'incessant'. I understand it as the river being ferocious - the opposite of a calm and slow-moving river. -tu in poru-tu seems to be a Conjunctive affix, perhaps an alternative form of -ttu, degeminated for metrical purposes. The -ū'um form of the participial suffix is for metrical purposes, called aḷapeṭai in Tamil grammatical tradition. 

Line 9: ār uyir muṟai vaẓi-p.paṭu-ū'um
Gloss: ??? life order makes.way-NONPAST.THIRD.PERSON.NONHUMAN
Translation: Our precious lives follow their destined course
Explanation: "??? life makes its way in the (same) manner". I'm not sure what ār means here. 

Line 10: eṉ-p-atu tiṟa.v-ōr kāṭci.y-iṉ teḷi-nt-aṉ-am āk-al-iṉ
Translation: since we know from words of the wise
Explanation: "since we know, from what the wise have shown us, that...". As Tamil is left-branching, this complementizer refers to the entire metaphor to the left. Also, the -aṉ- affix in the verb is an "euphonic particle" (like the one above) in Old Tamil, which appears in no other Dravidian language. It is not clear whether this affix and other such affixes meant something, or what they did mean. I myself like Sanford Steever's derivation of -an- from the Dravidian root *man, but I'm digressing.

Line 11: māṭci.y-iṉ peri.y-ōr-ai viya-ttal-um il-am-ē
Translation: we are not impressed by the mighty
Explanation: "we do not admire the large ones due (only) to their greatness". Again you see negation of action, as in Line 4. You'll see in in the next line also.

Line 12: ciṟi.y-ōr-ai ikaẓ-tal atu-aṉ-iṉ-um il-am-ē
Translation: more importantly, we do not scorn the lowly
Explanation: "we do not scorn the small ones even more than that" The "equative/ablative" case in Old Tamil is used for comparisons among other things - here, it compares scorning the lowly to admiring the mighty, that scorning the lowly just for their "lowliness" is worse than admiring the mighty just for their "mightiness".

3rd Person Non-Human conjugations in Tamil

Tamil, as with almost all Dravidian languages with a major exception of Malayalam, conjugates its finite verbs for person (first, second and third), number (singular, and plural), and gender (male human, female human, and non-human). Gender is marked only in the third person, and in Third Person (3P henceforth) Non-Human (NH), singular and plural numbers are not differentiated in Modern Tamil. 3P NH singular is used for both singular and plural. But this is not very surprising or unusual. What is unusual, is the morphology of the third person non-human, across all tenses.

Let me begin with the future. In the future tense, the other Person-Number-Gender (PNG) combinations are formed with a future tense suffix (-v-, –pp-, or –p-, depending on the verb class) and a PNG marker. In the 3P NH, however, a single suffix –uṁ is added on to the infinitive stem of the verb. This –uṁ not only marks the future tense but also 3P NH. For instance,

  • pēsuvēṁ ‘I will speak’, but pēsuṁ ‘it will speak’
  • āvēṁ ‘I will be, become’, but āguṁ ‘it will be, happen’
  • ppēṁ ‘I will see’, but kkuṁ ‘it will see’

The second morpheme in the 3P NH forms, with allomorphs of –-, –g– and –kk-, marks the infinitive stem of the verb. Don’t confuse this –kk– with the markers of the present tense or the self-benefactive mood.

This singular suffix combining both the PNG and tense in 3P NH plural is a remnant of an Old Tamil non-past tense, which had similar suffixes marking both tense and PNG for all PNGs. For instance, for the verb cey ‘to do’, you get the forms ceyku ‘I do’, ceyti ‘you do’, ceykum ‘we do’, ceyyum ‘it does’, etc. Out of these only ceyyum has survived in Modern Tamil.

While we remain on the topic of the future tense, 3P NH future tense is also the only PNG + Tense combination in Modern Tamil which retains a fusional negative form in a finite verb. In all other combinations, negative forms are constructed periphrastically. For instance,

  • pēsa māṭ-ēṁ ‘I will not speak’, but pēsādu ‘it will not speak’
  • ā-g-a māṭ-ēṁ ‘I will not be, happen’, but ā-g-ādu ‘it will not happen’
  • kka māṭ-ēṁ ‘I will not see’, but pā-kk-ādu ‘it will not see’

Again, the second morpheme in the negative future 3P NH form is not a tense marker, but marks the infinitive stem.

Now we move to the present tense. In the present tense, in Modern Standard Literary Tamil (High Tamil), the present tense markers are –kiṟ-/-kkiṟ-, and 3P NH marker is –adu. This –kiṟadu/-kkiṟadu becomes a combined -(g)udu/-kkudu in Modern Common (spoken, or tadbhava) Tamil. In other PNGs, the present tense suffix is –ar/-kkar. For instance,

  • pēsarēṁ ‘I speak’, but pēsudu ‘it speaks’
  • ārēṁ ‘I become’, but ā.v-udu/ā-gudu ‘it becomes’
  • kkar-ēṁ ‘I see’, but kkudu ‘it sees’

This is for the present and future tenses. In the past tense, the morphology exhibits some puzzling developments. But before I move on to the past, let me mention that in my dialect the present tense 3P NH is regular, taking the expected –aradu and –kkaradu suffixes.

Moving on to the past tense, then. In Old Tamil, in the 3P NH past tense, one specific verb class is irregular, the class which takes the past suffix of –iṉ-. Given this past marker, and the –adu 3P NH singular marker, one would expect together it would become –iṉadu. But in fact it becomes iyadu. For instance,

  • pēciṉēn ‘I spoke’, but pēci.yadu ‘it spoke’
  • ōṭiṉēn ‘I ran’, but ōṭi.yadu ‘it ran’

Such irregulariity continues in Modern Tamil, but in a different form. In Modern Tamil, 3P NH in the past tense is formed by adding a suffix –ccu to the conjunctive form of the verb, not to the verb withh the past tense suffix added. While the conjunctive form is identical to the verb with past tense suffix for almost all verb classes, they’re different for one class, Arden’s Class III. These verbs have the –i ending in the conjunctive, but their past suffix is –in-, and –ccu is added on to –i, and not –in. For instance,

  • va-nduccu ‘it came’
  • kudiccuccu ‘it drank’
  • āḍiccu ‘it danced’
  • ōḍ-iccu ‘it ran’

This –ccu derives from a suffix of –iṟṟu in Old Tamil, which itself is from a result of the Sandhi of –iṉ– and –tu (or –ttu), a 3P NH marker. The combined –iṉtu Sandhified to –iṟṟu. As I mentioned earlier, originally Old Tamil had only one suffix for 3P NH in the past tense: –i.yadu. However, in Late Old Tamil, the –in– was extended to 3P NH as well, from the other PNGs. Hence, –iṉtu > iṟṟu is the product of analogy in Late Old Tamil. This –iṟṟu then became –ittu through a regular sound change, and then -iccu by /i/ palatalising the following geminate /t/. This –iccu was then reanalysed as –i-ccu, with –ccu being analysed as a past tense suffix. This reanalysed –ccu was subsequently extended by analogy to all verbs, not just of Class III, creating a general –ccu past tense suffix.

In my dialect, matters are a bit different. My dialect has not partaken in the generalisation of the –iṟṟu to verbs beyond Class III. Other verbs remain regular, while Class III verbs take the –itu, which clearly comes from –ittu < –iṟṟu.

The whole point of this post is to highlight how, out of all Person-Number-Gender combinations in Modern Tamil (and in one occasion, in Old Tamil as well), the Third Person Non-Human (Singular and Plural) alone tends to exhibit irregularities in morphology. Now that we know what has happened, the natural question that arises is, why did this happen? Why is it only this combination that is irregular? I cannot think of anything concrete.

ஏமாந்துட்டேன் – ēmānduṭṭēn

Recently, I came across the word ஏமாந்துட்டேன் ēmānduṭṭēn ‘I got cheated’, and upon thinking for a few seconds about it, I realized how truly strange this verb is. Let me explain why.

Before I get to this verb itself, I have to establish some pre-requisite knowledge.

Firstly, many verbs in Tamil, 60% of them according to Steever (1984), occur in pairs. Many sources, including Schiffman (1999), describe the contrast between the elements of each pair as one of transitivity – that one of the pair is intransitive, and the other transitive. Paramasivam (1979), however, argues that the contrast is one of affectivity and effectivity: “an affective verb is one the subject of which undergoes the action described by the verb stem”, and “an effective verb can be negatively defined as one the subject of which does not undergo the action described by the verb stem”. Either way, whether it is transitivity or effectivity that distinguishes the elements of each pair, is immaterial to our discussion of the verb ēmānduṭṭēn. I mentioned all this as I will be referring to effective & affective verbs throughout this answer.

Secondly, the morphological methods by which effective verbs are constructed from affective verbs. In particular, two methods are relevant to this answer. One is gemination of the final stop.

  1. āḍu ‘to move, shake’ & āṭṭu ‘to shake (something else)’.
  2. māṟu ‘to change’ & māṟṟu ‘to change (something else)’.
  3. ūṟu ‘to ooze, percolate’ & ūṟṟu ‘to pour’.
  4. paravu (< *parapu) ‘to spread’ & parappu ‘to spread (something else)’

A notable characteristic of all such verbs is that all of them belong to the same verb class, which Arden calls Class III (Agesthialingom, 1971). This class takes the -in– past tense suffix, and the -v- suffix in the future tense. These suffixes are used by both effective and affective verbs.

The other method relevant here will be clear with examples:

  1. tirumbu ‘to turn around’ & tiruppu ‘to turn (something else)
  2. tirundu ‘to correct (oneself)’ & tiruttu ‘to correct (someone else)’
  3. iṟangu ‘to descend, reduce’ & iṟakku ‘to lower, reduce (something else)’

Essentially, the stem with the -NB (nasal + voiced stop) ending forms the affective verb, and the -PP (geminate voiceless stop) forms the effective. Diachronically, this mechanism originates from the same source as the previous one. For instance, tirumbu comes from *tirumpu and tiruppu from *tirumppu; gemination of the final stop. *-mp- and *-mppu became –mbu– and –ppu– in Tamil (Bh. Krishnamurti, 2003). Hence, these verbs also fall under the Verb Class III, taking the –in- past and the –v– future.

Now finally, on to the verb in question. One pair of affective and effective verbs in Literary Tamil (referred to as LT henceforth) is ēmāṟu ‘to be cheated’ & ēmāṟṟu ‘to cheat (something else)’, respectively. As an aside, this verb seems to derive from a noun, ēmām ‘bewilderent, perplexity’ [DEDR 898].

In Modern Spoken Tamil (ST henceforth), ēmāṟṟu becomes ēmāttu ‘to cheat (someone else)’ in Tamil, following the regular change of intervocalic geminate –ṟṟ- to –tt-. Now, you’d expect ēmāṟu to become ēmāru ‘to be cheated’, also following the regular change of intervocalic single –ṟ- to –r-. And indeed, both ēmāttu ‘to cheat’ and ēmāru ‘to be cheated’ do exist in ST. However, then, where does the conjunctive form ēmāndu– come from, in ēmānduṭṭēn ‘I got cheated’? Considering a verb stem ēmāru, one would expect, given regular morphology, that the form would be ēmāriṭṭēn ‘I got cheated’. Indeed, ēmāriṭṭēn exists in speech, but at the same time ēmānduṭṭēn also exists. Where does it come from?

The most likely answer has to do with reanalysis. The effective verb, ēmāttu, though belonging to the category of verbs that geminate the final stop (from a diachronical perspective, anyway), was reanalysed as a verb that forms its effective verb through -NB- and -PP- alternations. Hence, an affective verb, ēmāndu ‘to be cheated’ was backformed as a result of the reanalysis of ēmāttu.

Now, this couldn’t have happened in Old or Middle Tamil. In Middle Tamil, both –ṟ- and -ṟṟ- remained intact. Had Middle Tamil ēmāṟṟu been similarly reanalysed and a hypothetical verb ēmāṉṟu been backformed in Middle Tamil, this ēmāṉṟu would have then become ēmānnu or ēmāṇṇu in Modern ST, again as per regular sound changes. But that is not we see (or rather, hear) in Modern ST, we instead see ēmāndu, which implies that this backformation has happened more recently, after the Middle Tamil period.

Another question that may arise is, why has this backformation happened in this particular verb and not in other verbs, like say ārambi ‘to start’ (< Sanskrit ārambha)? The answer to this likely lies in verb classes. As mentioned earlier, ēmāttu is in Verb Class III, and so are all of the -NP/-PP alternating verbs. On the other hand, ārambi falls under Verb Class VI, taking the –tt- past tense suffix (borrowed verbs all fall under Class VI). Hence, ēmāttu was susceptible to reanalysis, but not ārambi.

In addition to all of this, another form of the verb is known: ēmāndēn ‘I was cheated’. This one, in fact, is even more interesting. In the earlier verb, ēmānduṭṭēn, ēmāndu- is the verb stem in its conjunctive form, and –uṭṭēn is the verb conjugation for the completive aspect, past tense, first person singular. The completive aspect suffix must be added to the verb in its conjunctive form, for diachronic purposes. This conjunctive form, as mentioned, can be explained as being backformed from the conjunctive form of the the transitive verb, ēmāttu.

However, the verb ēmāndēn ‘I was cheated’ is not in the completive aspect. It is a verb in the simple past, and in the simple past, past tense suffix, –in– in this case, is directly added on to the bare verb stem. Provided that the verb stem is ēmāndu, one would then expect that the verb would be ēmāndinēn ‘I was cheated’. However, that is not what you see. You see instead, ēmāndēn ‘I was cheated’, where it seems that the first person singular marker –ēn is added directly to the verb stem, ēmāndu. This is utterly bizarre within Tamil verbal morphology. So what is happening here?

What has happened, is that the –ndu in ēmāndu was again reanalysed as a past tense suffix (cf. naḍandēn ‘I walked’, viẓundēn ‘I fell’, verbs of Arden’s Classes VII and II, respectively). So, ēmāndu, a verb stem unto itself, was reanalysed as ēmā-ndu, another verb stem with a past tense suffix added. The personal marker –ēn was therefore then added directly on ēmānd-, without an additional past marker –in– being added, as one would expect otherwise.

I think that this second reanalysis has occurred because ēmāndu-, as the product of the first reanalysis & subsequent backformation, is an anomaly. Steever (1984) remarks that such backformations are rare, providing just one example of one. As a Tamil speaker, when I try to backform other verbs similarly, the backformed verbs are not grammatical; they’re merely theoretical constructs. ēmāndu, and specifically the –nd– in it, therefore, is an anomaly. To make sense of this anomalous –nd-, it was once again reanalysed as a past tense suffix.

Note that alongside the backformed and reanalysed ēmāndēn ‘I got cheated’, the usually expected verb ēmārinēn (ēmār-in-ēn) ‘I got cheated’ also exists and is very much grammatical, if not less common in my speech than the former. But the very fact that ēmāndēn exists as a finite verb in the past tense, is testament to how truly weird morphology can become.

And finally, this can probably be looked at from a theoretical morphology perspective. Through backformation and reanalysis, a suffix marking solely person-number-gender marker has come to be added to what is a bare stem, violating the rule that expressly forbids this. I know nothing of theoretical linguistics to do that.


  1. Bhadriraju Krishnamurti. (2003). The Dravidian Languages. Cambridge: Cambridge University Press.
  2. Harold Schiffman. (1999). A reference grammar of spoken Tamil. Cambridge: Cambridge University Press.
  3. K. Paramasivam. (1979). Effectivity and causativity in Tamil. Trivandrum: Dravidian Linguistics Association.
  4. S. Agesthialingom. (1971). A Note on Tamil Verbs. Anthropological Linguistics, 13(4), 121-125.
  5. Sanford B. Steever. (1984). Review [Review of the book Effectivity and causativity in Tamil, by K. Paramasivam]. Language, 60(1), 195–196.

Is Kannada closer to Telugu, or Tamil?

Many individuals have the misconception that Kannada is closer linguistically to Telugu than Tamil, primarily due to the fact that both Kannada and Telugu have a significantly higher degree of Sanskritic influences (and hence shared Sanskritic lexicon) than Tamil. However, as a matter of fact, Telugu forms a part of the South-Central Dravidian subfamily of the Dravidian family of languages (of which all these three, along with Malayalam, Tulu, Toda, Gondi, Kurux, Kolami, etc.), while Kannada and Tamil are both part of the South Dravidian family. Telugu, by virtue of belonging to a different subfamily of Dravidian, contains certain features that make it different from South Dravidian languages.

Firstly, Telugu’s gender system (rather, noun class system, but that’s not important for now) is very different from that of South Dravidian (that is, that of Kannada & Tamil). In South Dravidian, in the singular, there are three genders/classes: male human singular, female human singular, and non-human singular. In the plural there are two: human plural and non-human plural.

In Telugu, instead, in the singular there are two: male human singular as one noun class, female human and non-human singular as the other. In the plural there are also two, but not symmetrically: one is human plural and the other is non-human plural. It is likely that Proto-Dravidian’s noun class system was something closer to what Telugu has today, and that South Dravidian innovated the feminine human in the singular. For instance, Tamil avan and Kannada avanu ‘he’ are cognate to Telugu vāḍu ‘he’, and Tamil/Kannada adu ‘it’ is cognate to Telugu adi. Tamil/Kannada avaḷ(u) doesn’t exist in Telugu.

Noun class (gender) system in Kannada and Tamil.

In Telugu, instead, in the singular there are two: male human singular as one noun class, female human and non-human singular as the other. In the plural there are also two, but not symmetrically: one is human plural and the other is non-human plural.

Noun class (gender) system in Telugu. The *awaṉṯu here and *awan in the image above are related – I elaborate on this below.

It is likely that Proto-Dravidian’s noun class system was something closer to what Telugu has today, and that South Dravidian innovated the feminine human in the singular. For instance, Tamil avan and Kannada avanu ‘he’ are cognate to Telugu vāḍu ‘he’, and Tamil/Kannada adu ‘it’ is cognate to Telugu adi. Tamil/Kannada avaḷ(u) doesn’t exist in Telugu.

Likely noun class (gender) system in Proto-Dravidian.

Related to this is the male human marker in Telugu, which is –ḍu. This is cognate to the Tamil/Kannada marker –an(u). In Proto-Dravidian, this marker was *-aṉṯu. South Dravidian deleted the –ṯu in the suffix and was left with just *-an. In Telugu, *-ṉṯu became –ṇḍu, and then –ḍu through regular sound changes. So Tamil and Kannada avan(u) and Telugu vāḍu go back to Proto-Dravidian *avaṉṯu.

As an example, Telugu pillagāḍu (< pilla ‘child’ + –vāḍu ‘male human marker’) ‘boy, lad’ is a morpheme-to-morpheme cognate of Tamil piḷḷaiyan (< piḷḷai ‘child’ + –an ‘male human marker’) and Kannada piḷḷeyanu (< piḷḷe ‘child’ + –anu ‘male human marker). Continuing the same case, Telugu pilladi (< pilla ‘child’ + –di ‘female human cum non-human marker’) ‘girl, lass’ would translate morpheme-to-morpheme in Tamil as piḷḷaiyaḷ (< piḷḷai ‘child’ + –aḷ ‘female human marker’) and in Kannada as piḷḷeyaḷu. Note that Kannada and Tamil don’t use a cognate suffix in the latter case unlike in the former, they instead use –aḷ. This is an example of differing noun class systems in the two subfamilies. The same noun class covers both female human and non-human semantic objects in Telugu (pilladi is semantically human female, a gril), but in Tamil and Kannada semantically human female nouns must take the human female noun class and the appropriate suffix.[1]

As an aside, if anyone is wondering, Telugu went through a stretch of losing retroflex lateral approximants and retroflex nasals ( and ) in many words beginning about a millennium ago. That’s why Tamil and Kannada piḷḷai/piḷḷe/hiḷḷe correspond with Telugu pilla.

The second key difference that Telugu has by virtue of it being South-Central Dravidian is that it has gone through apical displacement, a phenomenon which is characteristic of the subfamily. The phenomenon is described thus:

  1. V₁RV₁ → RV₁/RV₁V₁
  2. C₁V₁RV₂ → C₁RV₁/C₁RV₁V₁

Here R is any non-nasal apical consonant. Through this phenomenon, formerly intervocalic apical consonants moved to word-initial position if no word-initial consonant preceded, and formed initial consonant clusters if a consonant did precede. In addition, if V1 and V2 were the same or if V2 was a low vowel (/a/), V1 would double following the apical consonant.

There are several examples.

  1. Proto-South Dravidian *eraṇḍu ‘two’ → Telugu reṇḍu ‘two’, which is perhaps from an older unattested rēṇḍu which lost the long vowel before the consonant cluster. Compare Kannada eraḍu and Old Tamil iraṇḍu; Tamil also seemingly underwent this phenomenon in this word, which is reṇḍu in Modern Tamil.
  2. PSDr *maran ‘tree, wood’ → Old Telugu mrānu ‘tree’ → Mn Telugu mānu. Compare Tamil and Kannada mara(m).
  3. Proto-Dravidian *tir ‘to turn’ + derivational suffix → Old Telugu trippu ‘to cause to turn’ () → Modern Telugu tippu. Compare Tamil tiruppu and Kannada tirupu.
  4. Proto-Dravidian *uḻu ‘to plough’ → Old Telugu ḍu- ‘plough’ → Mn. Telugu dunnu ‘to plough’. Compare Tamil uḻu.
  5. PD *caracu ‘cobra, snake’ → Old Telugu trācu → Mn. Telugu tācu ‘snake’ (Initial cr– became tr-). Compare Tamil and Malayalam aravam, with loss of c- and of –c-, both very common in South Dravidian and especially Old Tamil for the latter.
  6. PD *koḻ-/kōḻ– ‘young, tender, fresh’ → Old Telugu krotta (< *koḻ-tt) ‘new’ → Mn. Telugu kotta ‘new’. Compare, from this same PD root, Tamil kuḻandai ‘child’ & koḻundu ‘tenderness, tender twig, tendril’, Kannada koḍa ‘tenderness’ & koṇasu ‘young of wild beasts’, and Tulu korè ‘weak, small’. > is found often in Kannada, and > r is regular in Tulu in its non-Brahmin dialects.
  7. PD *varay ‘write, draw’ → Telugu vrāyu ‘write, draw’. Compare Tamil varai and Kannada bare.
  8. PD *vaHra– ‘come’ → PSDr *vara → Pre-Telugu(?) vrā- → Telugu rā-. This verb is one of the only two verbs that have irregular stems in PD: /vaHra. The latter became ‘come!’ in Telugu, while the former remains as va-, as in vaccæḍu ‘he came’. Compare Tamil /vara and Kannada /bara.

As a result of the apical displacement, Old Telugu had r-, ḍ- and ṟ- word-initially. Modern Telugu has merged ṟ- and r- word-initially (and intervocalically as well), and word-initial ḍ- has merged with d-. In addition Modern Telugu has simplified the initial consonant clusters that were created by the phenomenon.

This phenomenon of metathesis also seems to have occurred in a few scenarios when the second consonant wasn’t apical. The pronouns are major examples: *avaṉṯu ‘that man’ becoming vāḍu, and *ivaṉṯu ‘this man’ becoming vīḍu.

Finally, Telugu also has certain lexical differences as compared to South Dravidian. Krishnamurti (2003) suggests that Proto-South-Dravidian borrowed Indo-Aryan vocabulary at an undivided stage, but by which point it had already separated from Proto-South-Central Dravidian. So Telugu, being SC Dravidian, does not share some of the earliest Indo-Aryan borrowed lexicon that South Dravidian languages do – like the word for the numeral 1000 – and has borrowed from Indo-Aryan independently.

Beyond all of these, there is of course the fact that as a Tamil speaker I can understand some Spoken Kannada, and a fair bit of written Kannada (with my knowledge of Tamil-Kannada phonetic & grammatical correspondences and my knowledge of Sanskrit), while my intelligibility of Telugu, spoken and written, is much lower. If one disregards the Sanskritic lexicon, Kannada and Tamil share much more in common than Kannada and Telugu.


  1. Bhadriraju Krishnamurti. (2003). The Dravidian Languages. Cambridge Language Surveys. Cambridge: Cambridge University Press.
  2. A Telugu-English Dictionary.