Modern Persian belongs to the Western Iranian branch of the Indo-Iranian group of the Indo-European language family. It is a descendent of the Middle Persian, the official language of the Sasanian Empire (third century BCE-seventh century CE) and Old Persian, the language of the Achaemenid Empire (sixth-fourth century BCE). Early Modern Persian inherited both the writing system and much vocabulary from Arabic. Persian has a couple of varieties and the one spoken in Iran is called Farsi and is spoken as a native language by the majority of the 90 million population in Iran where it enjoys the status of an official or state language and is also widely used as a language of communication, education, and commerce by non-native speakers.
Farsi exploits both flagging (case or adpositional marking) and/or indexing (agreement, bound person marking associated with the verb) for distinguishing relations in the clause. This language is a nominative-accusative language. Arguments and adjuncts are almost exclusively coded by adpositions, which are used to express case relations. Farsi is a pro-drop language, in which not only a subject may be dropped, but also a direct object pronoun. Beside the full pronouns, there is a series of pronominal clitics. The pronoun clitics may be suffixed to verbs, prepositions and nouns, and serve the same grammatical functions as the full pronouns, with the exception of sentence subject. Farsi nouns distinguish number and definiteness, but there is no morphological case system, nor a distinction of animacy or gender. The NP bare noun is generally neutral with respect to definiteness and number; e.g. ketāb may mean ‘book, a book, the book, books’. One of the specific features of Farsi syntax are the ezāfe constructions. Ezāfe, from Arabic iḍāfa ‘addition, adjunction’, designates an enclitic realized as =(y)e, which occurs within the noun phrase and links the head noun to its modifiers and to the possessor NP.
Indexing is also used for distinguishing relations in the clause. The verbs agree with the subject in number and person. Verbs are marked for tense and aspect and are either simple or compound. Certain simple verbs (e.g. zadan ‘strike’) can combine with nouns or adjectives or prepositions or preverbs (e.g. ḥarf ‘letter, speech’) to form “light verb constructions” or “complex predicates” which are very widespread in this language and correspond to simple verbs in many other languages (ḥarf zadan ‘speak’, lit. ‘to strike a word/speech’). The light verb takes all the inflectional and derivational affixes and the non-verbal element (NVE) that usually precedes it, may be a noun, an adjective, a preposition, or a preverb. There is a large and increasing number of verbal constructions in colloquial Farsi formed with a third person singular form of the verb and an experiencer, often a personal clitic. These constructions express physical and psychological processes.
The canonical word order is subject-object-verb (SOV). The surface word order pattern is strongly head-initial within the NP. Farsi has a large set of prepositions though the object marker is a postposition, or clitic, =rā (or coll. =ro, =o) which marks predominantly definite direct objects, and it has post-nominal adjectives, genitives, and relative clauses, features generally associated with the head-initial languages. Yet the verb occurs in the final position of the clause, in particular in written and formal registers.
Data were gathered in 2024-2025 for the 80 core meanings in PaVeDa. Counterparts were selected based on the frequency. The examples in the database are taken from contemporary written sources gathered in the Leipzig Corpora Collection. The Leipzig corpora in Farsi is available as plain text files and contains selected sentences from 10,000 sentences up to 1 million sentences. The sources are either newspaper texts or texts randomly collected from the web during 2010-2020.
Raheleh Izadifar has collected the data. She is PhD graduate of General Linguistics from BuAli Sina University, Iran, and is a lecturer at the same university. Prof. Omid Tabibzadeh, the supervisor of the data and the paper, is professor of General Linguistics at the Institute for Humanities and Cultural Studies, Iran.
| Verb form | Verb Meaning | Basic coding frame | Comment |
|---|---|---|---|
| Coding frame | Type | Comment |
|---|---|---|
| Coding set | # Coding frames | # Verbs | # Microroles | Comment |
|---|---|---|---|---|
| Alternation | Alternation class | Type | Description |
|---|---|---|---|
| # | Primary text | Analyzed text | Gloss | Translation | Comment | Details |
|---|---|---|---|---|---|---|