# Proposal: Extended Roman Numerals

This is a proposal for how to express Roman numerals in Lojban.

## Introduction

Modern Roman numerals rely on a multi-pivot positionally-balanced additive system. Strings are read from left to right. There is an alphabet of symbols which represent the various elements of $A = \{1, 5, 10, 50, 100, 500, 1000\}$ (in decimal notation) and, in a minor extension, $1000*A$. "I" represents 1; "V" represents 5; "X" represents 10 (ten); "L" represents 50 (fifty); "C" represents 100 (one hundred); "D" represents 500 (five hundred); "M" represents 1000 (one thousand); underlined versions of these symbols represent the product of the value represented by the corresponding non-underlined symbol as multiplied by one thousand (and "I" (for one thousand) is not used). A string of symbols is broken into a concatenation of the maximal/longest substrings (here called "canonical substrings") such that the value of each symbol is not greater than the one previous (to its left) in the canonical substring. Id est: the symbols in a canonical substring are weakly monotonically decreasing. The value represented by a canonical substring is the sum of the values represented by each symbol composing it; canonical substrings expressed prior to/left of others have their value subtracted from the value represented by the later-expressed/right canonical substring. Typically in conventional modern notation, no symbol is repeated more than thrice consecutively; preference is given to reducing the number of canonical substrings in a given representation of a number, then reducing the length of each canonical substring (particularly the longest of them), and then using symbols of smaller value (so, for example, three hundred is represented by "CCC", not "CCD"). Moreover, in this convention, a subtractive canonical substring can contain at most one digit in it and such a digit should represent $1*10^n$; thus, 298 is represented as "CCXCVIII" rather than "XCCCVIII" or "IICCC". Consequently, any symbol representing a value of $5*(10^n)$, for some $n \in \mathbb{N}$, may not be used consecutively with itself (in a nontrivial manner: more than once).

We will call $A$ the set of basic Roman numbers (or "Basics" for short); it will be notated as such throughout this article. The set of symbols (digits) which represent the elements of $A$ bijectively is called the set of basic digits and will be notated, perhaps somewhat ironically, $\Lambda$. The set of digits which represent the elements of $1000*A$ will be called the set of millials (short: millials); it will be notated as $\underline{\Lambda}$.

$\Lambda$ and $\underline{\Lambda}$ are called "numeric alphabetics".

It is sometimes the case that the number zero (0) is represented by a digit "N". When it arises, I will follow this convention. However, this digit does not belong to any of the previously mentioned numeric alphabets.

## Goal

• Preserve the feel and system of modern Roman numerals.
• Be unambiguous.
• Extend the system so that it can express any integer.
• Possibly: Allow for multiple ways to express the same number
• Possibly: Be able to represent fractions in a 'Roman' way.

## Macrodigits

Each Roman digit will be represented in/mapped to Lojban as a macrodigit, which in turn will be composed of a string of microdigits (which belong to PA); macrodigits are necessarily separated from one another by "pi'e"; the number as a whole is represented by a string of macrodigits and may optionally be terminated by "boi" or other means.

A string of macrodigits will be read/interpreted from left to right (early to late, in speech). So, "IX" means nine, not eleven; it will be expressed in Lojban as "I" and then "X". This scanning/reading/interpretation order/direction can be overridden by activating the Roman mode somewhat differently.

Since the base/mode has shifted, macrodigits in Lojban will be interpreted according to Roman rules.

### Representing the Numeric Alphabet in Lojban

All positive integers are represented in the Roman numeral mode of Lojban such that they may begin with " no pi'e"; this string may be omitted, but it is helpful and does lead to some elegancw. This introductory string, if present, is followed by at least one macrodigit; all Roman numbers are represented by at least one macrodigit. Each macrodigit consists of at least one microdigit.

In $\Lambda$, only "I", "V", possibly "X", and possibly "M" can be represented by single microdigits when the base is decimal (or less than or equal to forty-nine). Note that in most bases, according to some proposed conventions for Lojban (with which I happen to agree), "ki'o" does not mean "one thousand" (nor, usually, is it an integer power of ten); it often is the base exponentiated by some integer (usually three or four in the proposals which I have seen for binary, octal, decimal, dozenal, and hexadecimal).

All other basic digits require strings made of multiple microdigits in order to form their appropriate macrodigit. This is done in any case by expressing the PA* string which represents the number represented by the Roman basic digit according to the base governing the macrodigit. For example, if the macrodigit is in decimal, then "L" is represented by the microdigit string "mu no". Note that this is not interpreted as "(5,0)" (which would be "VN") because there is no "pi'e" present between "mu" and "no" here; additionally, the presence of the introductory " no pi'e" string disambiguates it as a distinct but single macrodigit.

Only those digits which are written in Roman notation need to have macrodigits explicitly expressed - no place structure needs to be tracked (unlike with decimal and 'standard' Lojban notation, especially with "ki'o"). The macrodigits are expressed exactly in the way that they appear (modulo collapse - see below); they maintain order and neither introduce nor omit anything (except "pi'e" which distinguishes macrodigits from one another).

Just like in 'standard' (non-Latin) Lojban, $n*10^3$ (decimal) may be represented by "ny ki'o" (as microdigits) if the macrodigit is in decimal; $n=1$ means that "ny" may be ommitted. This is especially similar to and helpful for millials; notice that the omissability of "pa" for "ny" is quite like the existence of "M" but not "I".

It may be possible for strings of m consecutive mutually-identical digits to be 'collapseäble' into a single macrodigit consisting of "my" followed by the appropriate number "no"'s ("0"'s), if the macrodigit is in decimal. For example, "CCC" could be encoded as "panono pi'e panono pi'e panono", but it might be possible to collapse it down to "cinono". This is messy for millials (which will have weird gaps in collapseäbility) and it may actually break the system - I have not thoroughly vetted it. It certainly fails the spirit of exactly recording Roman notation in Lojbanic speech, but it does make it more manageäble.

The bases of macrodigits are independent of one another by default. The base of a macrodigit is explicitly set by use of "ju'u'i". Decimal is the cultural default in general (and probably should be the default for this mode upon its activation) and it is probably preferable.

### Use of "pi'e"

Macrodigits must be explicitly separated from each other. If this were not the case, then "M" and "IM" would be identical if both "pa" and then "ki'o" are used, for example.