Declaration of intent: minimal grammar

      To learn a new language, one must acquire new words. This is unavoidable and requires dull learning by rote and some semantic effort: there is no French word for 'shallow', but 'know' splits into 'connaître' and 'savoir', etc. In addition, one must learn 'new grammar'; e.g. French nouns have gender, which need not be related to sex, and which must be considered in adjective concord[1]. My purpose is to reduce this new grammar as much as possible, so a minimal amount of work is needed beyond word memorization. As a slogan:

      If you know what the word means, you can use it.

This, of course contains a huge fallacy: the meaning of a word is precisely the set of all the occasions when it may be used. But what I mean[2] is that if you knew what 'be' means, you could deal with it, without having to know about 'is', 'was', 'been', etc.

      Another slogan for minimal grammar:

      If a natural language makes do without it, so can I.

This says that some features may be redundant. For instance, I won't worry about translating 'the', because Russian, Chinese and even Latin manage without it.

      Ideally, a word is a word is a word; I don't want it classified as 'noun', 'verb', 'preposition' etc. I don't even want a role for the word in the sentence: the grammar must manage without the terms 'subject', 'object', etc[3]. Then, all that is left is the subordination relation: a word may modify another. To define 'modify' I will use a naive grade school attitude: a modifier answers a question about the head.

      that book

            that what? that book    : 'book' modifies 'that'

            which book? that book   : 'that' modifies 'book'

      In this example, the relation is symmetrical, which is just fine. Actually, in grade school we are not allowed to ask 'that what?' -- we must have 'that' subordinate to 'book' and not vice-versa, which makes parsing diagrams directed graphs. However, my grammar does not need such precision. Of course, there is a big difference in meaning: 'book' is more complete than 'that', which needs finger pointing; this is the reason why 'that what?' is a no-no. But I would like to ignore semantics as much as possible -- alas, it won't let itself ignored.

Here is yet another slogan, this time about vocabulary:

      The Greeks had a word for it, and so do I.

Philologists have gone on for centuries about the richness of Greek, or Sanskrit or whatnot. It seems to me that they liked the ability of such languages to create new words on the fly; even more they loved discussing the possible meanings of these words. So, in order to ease the process of word acquisition,   I will put in some rules for creating compound words, and leave the meaning hazy, to the utmost gratification of philo/sophers/logists. I give up any pretense of precisely defining word sense -- that would involve such notions as verbs of motion, transient occurrences, animate/inanimate, affect -- all the 1000 Roget categories and more. So I will depend on common sense, and all I can say is that my compound words are vague, possibly ambiguous, and are offered as suggestions only.


      Nor will I try to produce a minimal set of basic words, from which the others may be composed. Consider the family:

two, double, twice, both, dual, dyadic, pair, even, twin, twine, twist, duplicate, second, the one... the other[4]

All of them are semantically related to 'two', but it would be hopeless to define precisely the relations, in the hope of generalizing.

      Among natural languages, Chinese and Malay seem nearest to my type of language --- there is fluidity between parts of speech, and little inflection[5] to memorize. However, in both languages the classes noun/verb are quite evident, and there are syntactic paradigms to learn, i.e. there are such classes as subject, object, complement etc.

      Among artificial languages I know of, the nearest to mine seems Allnoun. But here comes my pet peeve: I cannot stand recursion and long to infinite series of parentheses. These are OK for computers, but my memory, being neither magnetic or silicaceous, boggles. Still, I like Allnoun in its intransigence, and I adopted as the first rule of translation from English 'replace every word by a noun'.

      Finally, the vocabulary is mostly English[6] -- just as Volapük is mostly English -- and the language is called, obviously, LAN.


Some more slogans

      Etymology is semantics.


Of course it ain't, 'nice' does not mean 'ignorant', and 'artery' does not carry air. One must ignore usage, diachronic etymology -- the fun part -- and especially shun such compounds as 'understand', which has nothing to do with 'under' or 'stand', but clearly proceeds from both. Such self-explaining compounds, as 'lukewarm' = 'body' + 'warm' = 'body temperature' may be rare.

Then maybe derivation is semantics ?

Although poetry is lost, one can gain much flexibility from LAN compounds like:

      fat+disapprove = obese

      fat+approve = portly, embonpoint

Since there are only autosemantemes in LAN, morphological derivation is word-compounding (no slogan here) and LAN etymology is basically decomposing compounds -- the reverse of word derivation.


      English word = LAN word.

That would really be great, especially for automatic translation. Unfortunately, it won't even work in the form 'English word = English word'; consider 'the human race' vs. 'the rat-race' and some rarer 'race' meaning 'pluck'. Even extremely technical words carry this ambiguity: the biceps of the anatomist is not the biceps of the prosodist, although both terms are as precise as they get. So I will just take the easy way out, providing a simple-minded automatic translator, and lazily avoiding race1, race2 and race3; to ease my conscience somewhat, I also translate LAN formally via triplets, q.v.

      Sense beats grammar any time.

Or, if you prefer, poetic license. Although word order in LAN prescribes strict dependency, if that does not make sense, think of a different ordering. This, of course, may be more than exponentially ambiguous -- that's why we have human brains.

Yet another slogan

      All languages are foreign languages.


A very good approximation -- after all, few of us are native speakers of even 0.1% of the various existing languages (about 6 different ones, to the last count). And it seems to me that this shifts the practical import of linguistics from a detailed description of a given language to a coarse model of all the languages, or to the invariants of translation. It is not particularly interesting to flag 'ungrammatical utterances', because most speakers (who are mostly foreign speakers) will keep uttering them; rather find out how is it possible that they get understood, or to what degree do they get understood. 

      What I have in mind is a linguistics of meaning, and I think that meaning is defined by translation (Huh?) Or at least we would have a very different idea of what meaning is if everybody spoke the same language. Like the Port-Royal gentlemen, we would find it obvious that pure reason dictates the accords of past participles in compound tenses. Realizing the differences when the same thing gets said in different languages[7], one is lead to the idea that under the variety of forms there is an invariant -- the meaning.

A famous German philosopher[8] once  said, “Some concepts cannot be expressed simply and some concepts cannot be expressed in French”. I couldn't agree less -- if the idea cannot be translated, then it's a very poor idea, IMHO with little meaning. On the other hand, it is true that the translation/explanation may involve clumsy paraphrases, boring examples, whatever; it may certainly fall short from aphoristic brilliance or any semblance of style. Still, I feel that the philosopher owes us this kind of explanation, even if he'd rather say 'Dummkopf!' and go on with his parerga and paralipomena. The penalty for not supplying the explanation is being ignored.

This has taken us rather far from the original discussion of meaning, but what better connection than a philosopher?

One more slogan:


Lexical functions are the enemy!


In English one says: “dead center”, “deep silence”, “deep hatred”, and the expressions mean: “precisely at the center”, “much silence”, “much hatred” -- nothing to do with death or depth!

These are all examples of lexical functions, i.e. the qualifier is a function of the head noun. In other words, the noun requires that particular qualifier for the generic meaning of “intense”, in this case.

Such expressions mark one as a fluent speaker, but they contribute little to the meaning (“deep silence”), or are idiomatic, therefore untranslatable. So let's dump them! Replace by very broad generics: “much silence”, or by explanatory forms: “privative + sound” means precisely no sound at all, while “silence” may be just “too_little + sound”.

Some examples of a language codifying lexical functions are, of course, Esperanto “--id” (progeny) and “--ar” (group_of):

kid = progeny + goat,

lamb = progeny + sheep,

calf = progeny + bovine  … etc.

an exaltation of larks = group_of + lark,

a pride of lions = group_of + lion,

a school of fish = group_of + fish  … etc.

In natural languages they might appear as affixes, e.g. English “un-“ and  “-ly” ( and the Romance “mente”, very similar to “--ly”). But they are not always applicable and not always carry the same meaning: “hard work” is not “hardly working”!




Phonology I: primary and secondary phonemes


      In tokens (words of LAN origin, which cannot be decomposed; 'atomic' words) only the following sounds/letters appear:


1. 5 vowels: A, E, I, O, U (like the same letters in Spanish).


2. 15 consonants: B, D, F, G, H, K, L, M, N, P, R, S, T, V, Z. These are pronounced more or less like in English, but:


·         G is always hard (egg, give)

·         R is trilled (un-English), not too long (the Spanish 'ere')

·         S always as English 'bliss'

·         There is no phonemic difference between aspirated and unaspirated consonants (English 'pin/spin'), or alveolar/dental versions of /d/, /t/, etc...


These are the primary sounds/letters of LAN, which may also be called token-letters.


There are several secondary sounds/letters which appear in compound words and foreign words:



6 : an, French le (schwa)

4 : French tu, German für

9 : French peu, oeuf, German schön


Y : yet, day ( German j )

W : wet, German Au


X : she

3 : measure

C : chip

Q : thing

J : jet

7 : Italian pizza

8 : thin

2 : this



Morphology I. Word classes


Like Gaul, LAN vocabulary is subdivided into three parts:


a. tokens – a closed class of words, following strict phonetics rules, not analyzable into smaller morphemes or semantemes; all will be found in the dictionary. E.g. :

      PI = person, FE = female, UDU = two


b. foreign words – an open class of words, not analyzable into smaller morphemes or semantemes; such words may violate some of the token phonetic rules, and are bracketed by []. These are basically technical terms, proper names, etc. All must appear in the dictionary. E.g.:

      [PARI] = Paris, [KROMOSOMI] = chromosome, [JAI ALAI] = jai alai


c. compounds – an open class of words, which need not appear in the dictionary. Their phonetics allows each one to be decomposed into a set of tokens or foreign words, and their meaning is suggested by these components. Examples:


FE+PI = female person = woman, she

PI+UDU = two persons = couple

[PARI]+FE = Paris female = la Parisienne


      The token class is closed because a token must consist of up to 5 primary letters, with at least 1 and at most 2 vowels, such that vowels and consonants alternate (see again the examples above). Here are all the possible forms of a token:




where A stands for an arbitrary primary vowel and B for an arbitrary primary consonant. Some examples for each type:


      type ‘A’:         A = but

      type ‘AB’:        AL = all

      type ‘BA’:        PI = person, FE = female, BO = beauty

      type ‘ABA’:       UDU = two

      type ‘BAB’:       FID = find

      type ‘ABAB’:      AGEN = again

      type ‘BABA’:      NARO = narrow

      type ‘BABAB’:     BUSIN = business



In foreign words, such as




the brackets [] are part of the language, realizable as sound: [ is the token LA and ] is the secondary consonant X. The brackets signal that the word, although not a token, is not compound, and should not be further analyzed. International words are modified to contain, if possible, only token phoneme/letters, and end in a vowel. Thus:


      [PARI] = la Parix

      [PRESIDE] = la presidex

      [JAI ALAI] = la jai alaix


Non-token phonetics and phonotactics (jai alai, presidex) set such words apart from tokens; the X‑ending clearly shows the end of the foreign word, which may contain breaks (la jai alaix).


Tokens and foreign words may be combined in compounds, e.g.


      TI+PE = this , place = here

      UN+BO = oppposite, beauty = ugly, ugliness

      SI+NOGU = the one who, know = expert

      BO+BUSIN = beauty, business = art

      PI+BO*BUSIN = person , beauty , business = artist

      [JAI ALAI]+LAGI+PE = jai alai, play, place = jai alai court


The + and * signs (called the loose bond and the tight bond, respectively) are also part of the language, and are realized in various ways, depending on the tokens they join. The result is a pronounceable combination which still keeps clear the boundaries between the components. The meaning of a compound may be vague, and sometimes is independent of the ordering of the components, e.g.


      TI+PE = PE+TI = here


The sign * shows a stronger bound than + (just as in arithmetic * has higher precedence):


      PI+BO*BUSIN = person, (beauty, business) = person of art = artist


That may be compared to:


      PI*BO+BUSIN = (person, beauty), business = feeding and care of beautiful      people (?!)


It goes without saying that * should be avoided whenever possible.


Here are some examples of bound realization:


fyepi = FE+PI = female person  = woman, she

piudu = PI+UDU = two persons = couple

la Parixfe = [PARI]+FE = Paris female = la Parisienne


The boundaries appear as


  • two consonants in contact (la-Parixfe),
  • two vowels in contact (piudu),
  • a semivowel (fyepi),


all of which are not allowed in tokens.


      These three categories -- token, foreign words and compounds -- are the untagged words of the language, as opposed with words tagged with the syntactical prefixes and suffixes that we describe below. 'Word', unqualified, will mean untagged word. After describing word morphology, one may summarize word semantics: tokens cannot be analyzed, foreign words are specially marked so one does not try to analyze them, and compounds must be analyzed to be understood.







      The words are invariable: they are not modified to show such categories as plurality, verb tenses, noun cases or degree of comparison in the adjective.

There are no parts of speech: GU is 'good' (adjective), but also 'well' (adverb) and 'goodness' (noun); SO means 'with' (preposition), 'together' (adverb) and 'join' (verb).

      The whole grammar is actually syntax. The only relation shown is the very vague one of subordination between head and modifier, e.g. in the English phrase 'two white kittens', the word 'kittens' is the head and 'two' and 'white' are modifiers, subordinate to 'kittens'.


Grammar Rule #1: if word2 immediately follows word1, then word2 is a modifier of word1.


By obvious analogy, the modifier may be called ‘tail’, as it follows its head according to this rule. So we may start to talk about kittens in LAN:


Kitten = KAT+ID (= cat, young one)

White = HITE

Two = UDU


And so:

      Two kittens = KAT+ID UDU

      White kitten = KAT+ID HITE


      In the statement of Rule #1, order is essential (precedes, follows). In practice, it does not always matter. UDU KAT+ID may be translated as 'a pair of kittens', with precisely the same meaning as KAT+ID UDU[7]. On the other hand HITE  KAT+ID may be a little different 'the whiteness of kittens' a nice poetical concept not quite the same as 'white kitten'. Again, there is no way to separate the meanings 'white', 'whiteness' of the token 'HITE'. If you must make the distinction, use compounds or modifiers; and always consider: is that detail truly meaningful?


Now we proceed to 'two white kittens'. It may be:


      UDU KAT+ID HITE = a pair of kittens white


In this order, KAT+ID modifies UDU, and HITE modifies KAT+ID; quite logical (and very un-English: the modifier 'white' follows, instead of preceding). Other orderings:



      KAT+ID HITE UDU = kitten, double whiteness?


UDU now modifies HITE. Not particularly meaningful, might be considered nonsense (but grammatically correct: colorless green ideas sleep furiously). It is easy to give examples where several orderings are meaningful. Introduce some new words:


      LIL = little

      LAGI = play



      LAGI KAT+ID LIL = the play of little kittens

      KAT+ID LAGI LIL = the kitten plays a little

      KAT+ID LIL LAGI = a kitten of (little play) = a not very playful kitten


The last translation emphasizes the fact that LAGI is a modifier of LIL, not of KAT+ID. Plain sequencing of words one after another represents this logical connection, in which each word has at most one modifier (following it) and at most one head (preceding).


To deal with more complex situations, use


Grammar Rule #2a: to show that word1 is the head of word2, append the initial open syllable of token1 as a top prefix to word2.


Grammar Rule #2b: to show that word1 is the modifier of word2, append the initial open syllable of token1 as a bottom prefix to word2.


      Notice the shift from word1 to token1; it means that, if word1 is itself prefixed, the syllable following the prefix, which is actually the first in a token, will be used. Schematically:


Rule #2a:

      A__ word = A__ A/word = A/word A__

      BA__ word = BA__ BA/word = BA/word BA__

                        (top prefixes A/, BA/)


Rule #2a:

      word A__ = A__ A\word = A\word A__

      word BA__ = BA__ BA\word = BA\word BA__   

                        (bottom prefixes A\, BA\)



      The underscores denote the rest of the word; for instance, A__ is any word starting with A. The ellipses denote several intervening words; if the prefix is used, the modifier and head may appear in any order. The notation above explains the strange names ‘top prefix’ and ‘bottom prefix’: in BA\ the syllable is under the sign, thus ‘bottom’, and in BA/ it is over the sign, thus ‘top’. The signs are, of course, realized as sound.


      The two parts of rule #2 are very similar, but #2a is used much more frequently than #2b; so using the unmodified word 'prefix' we mean top prefix, an instance of rule #2a.


Using Rule #2, we can express 'two white kittens' in any of the forms:





4. KA/HITE KA/UDU KAT+ID, etc...


The prefix makes clear that, in (3) UDU modifies KAT+ID, not HITE.

On the other hand, (4) is really too KA-KA-phonic! What if we had to translate: 'two male white kittens' (male = MA). Even more KA- syllables!




      KA/UDU KAT+ID MA KA/HITE, etc...


Notice that there is no particular order of the modifiers, (unlike English where 'white male two kittens' would be ungrammatical, and one must say 'two white male kittens').


To avoid the repetition of prefixes, use


Grammar Rule #3: to show that word1, word2, ..., wordN have the same head, use the chain

      word1< word2< ... wordN


The words in the chain - except the last - are tagged with the tag < ; they must be consecutive. The tag is realized as -NS, after vowels and as –AY after consonants. Using this rule, one may translate 'two white male kittens' as:






Form (1) is usually preferred, as it is shortest (pronounced: kaytid hitens uduns ma; 7 syllables) versus (2) with 8 syllables (ka’hitens uduns ma kaytid)[8]. Notice that in (2) MA is untagged and followed by KAT+ID; still MA is not the head of KAT+ID, but its modifier, as shown by <.

Similar to rule #3 is


Grammar Rule #4: to show that word1, word2, ..., wordN are the heads of the same modifier, use the chain

      word1> word2> ... wordN


The consecutive words in the chain - except the last - are tagged with the tag >. This is realized as –NK after vowels and –UY after consonants. Now one may translate 'white kittens and cats' as:



2. KA/HITE KAT+ID< KAT, etc...


However, rule #4 is not always sufficient, and this is when one must use rule #2b:


   (#4)     the cat's head and tail = HEDA> TALI KAT


  (#2b)     the cat's round head and long tail =



(using the tokens: HEDA=head, RONU=round, TALI=tail, LON=long)


It is possible to add to a word both tags < and >; e.g.


'I see white cats and kittens'     

      I = MI, see = SEGE, several = SE

            MI SEGE KAT<> KAT+ID SE< HITE.[D2] 


<> shows that both KAT and KAT+ID are modifiers of SEGE (by Rule #1) and heads of SE (again by Rule #1); while SE< shows that SE and HITE modify the same head(s). By the way, here we have a complete sentence, showing verb conjugation 'I see' and plural 'SE'; or, better put, showing how such English grammatical concepts are translated.


      The sentence ends in a period, which is realized as falling tone followed by a rest. The period is also a cut, meaning it interrupts the dependency chains preceding it: a word following the period is not a modifier of a preceding word. Another such cut is the question mark, realized as rising tone followed by a rest, and, less obviously, any rest before starting to speak. Then there are the five vowels :


·         E, the copula: e.g.


1. [TOBERMORI] E KAT. = Tobermorry is a cat.

2. [TOBERMORI] TORI [SAKI] E KAT KA PEKA. = Tobermorry in Saki's story is a cat that can speak.


It may join words (as in 1) or phrases (as in 2); think of it as an equal sign. A phrase is just a group of subordination chains, delimited by cuts.


The following correspond to English conjunctions, and join phrases:


·         A, but


·         O, or


·         U, therefore


·         I, and; this does not mean 'also', as in 'bread and butter', but is the vaguest cut between phrases:


I went to the mall, and I saw some red shoes, but they were too tight, and then I met Lucy, and she told me about her daughter, and I said I was sorry I could not have lunch with her.


There is an additional set of compound cuts, as explained below under Rule #6. We may formulate the following:


Grammar Rule #5: Cuts (a closed class consisting of single vowels and punctuation marks) are used to interrupt subordination chains, and to show the relations between phrases in a sentence.


Grammar Rule #6: Asides are tagged sentences, used like cuts.


Asides, like cuts, appear between phrases, and interrupt subordination chains. They clarify the connection between phrases, by showing attitude `but, unfortunately', modifying information `is, to my best knowledge', etc... Asides are sentences, enclosed in the tags {}. The tags are realized as follows:

      { a token, YE

      } an ending: -EY after a consonant, -W after a vowel


Using some new words : DOBU = doubt, PA = past, PI = person, TU = thou:


{[MARI]} TU TI+PE? = Mary!  are you there?

      pronounced: Ye la Marixey tu tyipe?


[MARI] {E PA} [DIPLOMA]+PI. = Mary was a graduate.

pronounced:  La Marix ye e paw la diplomaxpi.


[MARI] {E DOBU} [DIPLOMA]+PI. = Mary is supposed to be a graduate.

pronounced:  La Marix ye e dobuw la diplomaxpi.



What is special about asides is:


1. cuts get modifiers: E PA = was, as opposed to E = is.

2. the tags {} do not nest, so an aside cannot contain another aside.


Asides should be used as little as possible! They seem necessary for past and future of the copula, and essential for vocatives and interjections. Vocatives and interjections, according to school grammar, have 'no function in a sentence', i.e. are not (part of) subject, object, predicate or complement; no function = no syntactic role = no place in LAN grammar, which is wholly syntax. Still they are used freely in many languages, so I found a place for them in mine. In addition, many languages must show attitude of speaker, deference, etc..., which fit well into asides.



Examples of avoiding asides:



Listen to me Mary ; you there?

      (TO = in order to; HERA = hear)


The expression is much flatter than a neat vocative; but at one time I thought flatness (lack of emotion, emphasis or involvement) was a feature, not a bug.



there is doubt Mary modified by graduate.


The words in italics must be supplied to make the sentence somewhat English, but the meaning is quite clear, and this would be the preferred form. In the same way, one would tolerate the somewhat muddied meaning, and say:



      past modified by Mary modified by graduate.



And with that, the grammar ends! The whole purpose of LAN was to build a language with minimal grammar, taking as 'grammar' the stuff you have to memorize - besides vocabulary - when studying a foreign language:


·         amo, amas, amat, ...

·         Milch is feminine, Wein masculine, Bier neuter

·         the future of 'can' is 'will be able'


So we have our minimal grammar; does it work? The answer is provided by a few translations, with commentary. I have included in my sample some poems, to get a feeling for style as opposed to mere sense conveyance - as they say, poetry is what is lost in translation, so sense must be the conseved quantity. And I made my life easy, translating Housman and Heine; I would not dare translate Rilke or Dylan Thomas, but then, who would?


Phonolgy III: Marker Realization.


Markers are all the grammatical signs used in LAN text, besides letters and punctuation. They are divided into three categories:

  • tags, further subdivided into prefix tags / \ and suffix tags < > <>.
  • bonds, + and *
  • limits: these are bracketing markers, [ ] for foreign words and { } for asides.

One may also classify markers by their position in the complete word:

  • purely initial: {
  • purely medial: the bonds + *
  • purely final: < > <> }
  • prefix (i.e. only after the first open syllable): / \
  • free position: [ ] (since foreign words may participate in compounds)


All the markers are realized by a combination of

  • stress
  • vowel modification, i.e. joining to the vowel one of the sounds Y, W
  • double consonant endings


In a prefixed word, the stress falls on the second syllable; in an unprefixed word, one may freely stress any but the second syllable. Thus, word stress on the second syllable identifies completely prefix markers. To distinguish between / and \, the vowel is modified for \, the lower prefix marker: Y or W is added before or after the vowel.


The loose bond between two vowels or two consonants is left unmarked – the so-called null loose bond. The loose bond between a consonant and a vowel is realized as Y before the last letter of the first word.


The tight bond between a consonant and a vowel is realized as W before the last letter of the first word. The tight bond between two vowels may be optionally realized as Y or W between them.


The other realizations are summarized in the table below. Notice, in particular that ] must appear as AX in the interior of a compound word, but may appear as yA in a stand-alone foreign word:



[HISTORI]+VE = LA-HISTORIXVE = historically

          (VE = manner, way of doing)


When there is a choice between y and w, it is used to avoid the ugly combinations iy, yi, uw, wu. If these cannot be avoided, they are pronounced (but not written) 6y, y6, 6w, w6 – i.e. the semivowel remains, but the vowel is pronounced as schwa.


Marker Realization Table


In this table:

  • A,E are primary vowels, B,K primary consonants
  • the combination yw means y or w
  • lower case letters stand for themselves
  • is white space between words
  • XXX is some word
  • marks the stress on the following syllable
















stressed 2nd syllable





y or w, preceding or following A and stressed 2nd syllable










null light bond





y precedes last letter





y precedes last letter





null light bond ; schwa may be inserted





w precedes last letter or intervocalic y or w





w precedes last letter





w precedes last letter





w precedes last letter ; schwa may be inserted






























n may be omitted



































ends in vowel-x  or in y-vowel in final position




Morphology IV.


Numerals have a special formation and special phonology. The open, literally infinite class of numbers is built by pronouncing each character in their written form:


      1 2 3 4 5 6 7 8 9 0 = UNU UDU UTU UKU UFU ULU URU UGU UVU UZU

      decimal point = IPE

      fraction slash = IVE

      minus = MINU

      E (10 to the power) = PEV


10 = UNU+UZU






They are somewhat exceptional as being compounds where order is invariable, and they are pronounced differently from other compounds: instead of hiatus, the U of the digits is elided.


123 = UNU+UDU+UTU = unudutu

12.3 = UNU+UDU+IPE+UTU = unudipetu

23/45 = UDU+UTU+IVE+UKU+UFU = udutivekufu

-1.23E-4 =

      MINU+UNU+IPE+UDU+UTU+PEV+MINU+UKU = minunipedutupevminuku


The numbers are usually preceded by NU (number) or RO (ordinal) and are strongly stressed on the last syllable, to show where the end is. There are also a few abbreviations:

zero zero = AHA

zero zero zero = ASA,

e.g.: 2500 = UDU+UFU+AHA = udufaha, 20003 = UDU+ASA+UTU = udasatu


The combinations ASA+AHA, AHA+AHA, etc... simplify AA to A:


three hundred thousand = UTU+AHA+ASA, UTU+ASA+AHA = utahasa, utasaha

a million = UNU+ASA+ASA = unasasa


      This is the lojban way to treat numbers, and the only reasonable one. Written numbers are understood by anyone, so let us copy in spoken numbers the written form, and be done with quatre vingts dix sept, baq, pik, kalab and gross dozens.


A few more details:

·         mixed fractions are expressed as sums, using the token AD = add:

          2= UDU+AD+UDU+IVE+UTU = udadudivetu;


·         percents and promils are IVE+UNU+AHA = ivenaha, IVE+UNU+ASA = ivenasa respectively.


      In pronounciation, IPE, IVE, MINU, PEV and AD are surrounded by short rests, which may be marked by dashes: udut-ive-kufu , unud-ipe-tu. Of course, mathematical stuff is much cleaner written down than spoken aloud.


Minimal grammar – revisited.


Consider again the translation of a short sentence from ‘Loreley’:


At the end the    waves       swallow     boatman and       boat.

IFO               VAVA SE     VALO        MA+BOTA           BOTA



The grammar – i.e. syntax – is best represented by a grade school parsing diagram. Each arrow is a subordination pair, or link.







at the end































Notice that ‘waves’ is actually a link: ‘VAVA SE’ = ‘waves many’, or ‘SE VAVA’ = ‘a multitude of waves’. It really does not matter if ‘SE’ is the head-word or the modifier, but it is needed to show the plural.

On the other hand ‘IFO’ is a fancy adverb meaning ‘ending in the future’.


Then we may summarize the LANGU diagram in a table:





‘Better’ English


wave modified by many




wave modified by swallowing

wave swallows



swallowing modified by ending in the future

at the end will swallow


swallowing involving a direct object

swallow ...


the direct object modified by boatman

... boatman


the direct object modified by boat.

... boat




One could simply speak out the links:




This can be even translated using the last column:


There is more than one wave, and the waves swallow, and swallowing will end in the future, and swallowing affects something, affects[9] the boatman, and affects the boat.


The addition of the cut ‘I’ is needed to prevent every word from modifying the preceding word[10]. The links could also be rearranged in any order; that would not change the meaning, but some orderings may be easier to understand than others:







One could replace the head-words by their first syllable, as a high tone prefix:




The cuts are no longer needed, but there is some ambiguity: ‘VA/SU’ could be ‘VAVA SU’ as well as ‘VALO SU’. One could rearrange this form, too:






Or one could use the markers < and > :




      Now, although (8) is the clearest and least repetitive form, all the other forms are grammatical, and, one might say, mean the same thing. (1-4) are suitable only for computers -- although the lists fully represent the dependency structure, human short memory cannot fit them in. (5-7) are for real people, since some interpretation of ambiguity is needed. As for (8), I would assert it has some style.

      Here we also see the structure of LAN:

Words may be coupled in dependency pairs (‘VAVA VALO’) or may follow each other in dependency chains (‘VAVA VALO MA+BOTA’); dependency chains may be joined by various means into trees (parsing diagrams), e.g.:

VAVA SE< VALO MA+BOTA: a junction of the chains:

                              VAVA SE, VAVA VALO MA+BOTA





























VAVA VALO MA+BOTA VA/BOTA: a junction of the chains:

                              VAVA VALO MA+BOTA, VALO BOTA





























Finally, cuts are used between such trees to separate word pairs which do not form a subordination link. Sentences are groups of trees separated by cuts. The hierarchy:


      word < subordination pair < chain < tree < sentence


is the LAN equivalent of the usual English scheme:


      word < phrase < clause < sentence.

[1] Contrast with English grammatical gender, which manifests itself only in the choice among 'he', 'she' or 'it'.

[2] what I think meaning means

[3] when I use 'noun', 'verb', 'object', etc. the terms refer to the English translation

[4] in some languages, e.g. Swedish, 'the second' is etymologically 'the other'

[5]  There is definitely some; both languages have measure words, worse than der,die,das.

[6] If you think that's unfair, interchange every r and t and every u and e -- guaranteed to make it unrecognizable to everybody.

[7] Chomsky, whom I mistrust because of his politics, saw that as the intensive language (the rules generating everything one may say) and the extensive language (everything ever said).

[8]  I think it's Hegel, or maybe Schopenhauer… can't find the reference on the net.

[1] Contrast with English grammatical gender, which manifests itself only in the choice among ‘he’, ‘she’ or ‘it’.

[2] what I think meaning means

[3] when I use ‘noun’, ‘verb’, ‘object’, etc. the terms refer to the English translation

[4] in some languages, e.g. Swedish, ‘the second’ is etymologically ‘the other’

[5]  There is definitely some; both languages have measure words, worse than der,die,das.

[6] If you think that’s unfair, interchange every r and t and every u and e – guaranteed to make it unrecognizable to everybody.


[7] so UDU means ‘two’ and ‘a pair’. What else? Any or all of: double, twice, second, dual, dyadic, even, twin, twine, etc…


[8] and the following is even better: idkat hitens uduns ma; see ‘Light and Heavy Words’.

[9] ‘affects’ vs. ‘involves’ : the translation of  ‘SU’

[10] in this case, every third word modifies its precedent.

[11]  this looks like a PROLOG database, with one relation only – ‘modifies’ – and some pairs in random order.

 [D1]Assuming kaytid, not idkat