The Octavia Programming Language

Octavia is a fairly unassuming and unexciting scripting language. It does not support operator precedence, operates directly on an M-expression syntax tree, and the only parsing steps are a tokenizer and a bracket-matcher. This makes it simple to work with and to develop new features for, all while retaining an elegant and compact internal representation. In its current format, Octavia is designed and optimized for text processing on the web and is integrally coupled to the Cadre CMS. Its implementation is in PHP, making it a very high-level language.

A personal note: The name Octavia comes from the simple observation that this is the second language I've written an interpreter in (all previous efforts were in BASIC); the fourth language family I've created; the third major development in the series of interpreters (from line-based to string-based to token-based)—so this part gets a two-cubed expression; and the eighth interpreter I've written overall. Previous projects include the LSI assembly parsers (LSI and SIC), the piki parsers (piki for MS-DOS, piki for RM6, piki for Telerin, and the Leaflet scripting interface for Windows), and the pass3 interpreter (standardly, a command-line tool for Windows, but also integrated into Sappho.) While I previously planned on showcasing some of these projects at this site as examples of what classic Basic and Visual Basic are capable of in the right hands, times have changed somewhat.

So what exactly does Octavia do that makes it worth knowing about? Well, a lot of that is most easily understood from an example:

[show modes: view]
[begin mode view]
[
    if(x == y) {
        This is great!
    } else {
        That was a terrible idea, [uid2name(_viewer.id)].
    };
]
[end mode view]

Notice that this has some sharp deviations from a standard curly bracket language. Semicolons are statement separators and hence optional at the end of passages of code (but must follow closing braces otherwise), many functions will simply echo/print by default if just wrapped in square brackets, and unlike most languages, square brackets are used for indicating code. Variable names can be prefixed optionally with either $ or @, although this is not shown.

Most significantly, however, opening a code block reverts into text mode, requiring the opening of a new pair of square brackets in order to get back into code. The engineering justification for this is that a language intended to be used for display presentation will have a great deal more use for optionally printing messages than optionally conducting logic.

Standard style conventions are still undecided; Octavia permits some extremely terse phrasing for ad hoc code, as semicolons are not necessary at the end of a block.

Why bother? It was necessary to implement a language inside of Cadre in order to allow extensibility without violating the CMS's security framework. At present, Octavia's implementation is quite lax about security and there are many, many holes in both the code and the user data parser (some of which have been fixed on projects like Anthologica, which implements aggressive input scrubbing), but it at least offers a context in which to do so.