Dataspace 4: The Term-inator

Prelude:
Dataspace 0: Those Memex Dreams Again
Dataspace 1: In Search of a Data Model
Dataspace 2: Revenge of the Data Model
Dataspace 3: It Came From The S-Expressions
Dataspace 4: The Term-inator

I want a Memex. Roughly, I want some kind of personal but shareable information desktop where I can enter very small pieces of data, cluster them into large chunks of data, and – most importantly – point to any of these small pieces of data from any of these chunks.

Before we can build such a system, we have to settle on a data model. The logic programming language Prolog gives us a potentially useful universal data model – term structure – but we don’t get the most out of it unless we express it in Lisp-style S-expressions, which reveal hidden semantics that even the formal logic and logic programming communities didn’t catch up with until 1989. But S-expressions also introduce ambiguity that wasn’t there in the original Prolog term structure.

There is a very simple way forward from here, and it’s one that I haven’t seen described before. I think it has potential as a fundamental data semantics for building very large or very small distributed systems.

This is an approach to representing term structure in S-expressions, so I’m calling it term expressions. If we wanted to be cheeky, we could call it T-expressions (because it’s one more than S…)

So S-expressions are a way of representing lists as pairs plus a final nil. But they don’t have to have a final nil. Instead, they can end in a dotted pair or dotted list – with a symbol after the dot indicating ‘the rest of the list’. This comes in really useful in variable matching. Eg:

(a b c . D)

is a dotted list. That D might perhaps be a variable, and if we were matching, it might stand for ‘everything after the c’. Which could be a list itself.

Here’s the core idea:

What if, instead of just a nil, or a dot and then a symbol, we allow a S-expression list to end with – or simply be – a third option: an arbitrary logical term?

So perhaps we had a list looking like this:

(a b c / d e f)

That ‘/’ is the magic character that replaces the dot. Anything after a / is a logical term. It can have arbitrary structure.

Given this, we can represent a Prolog term and a Prolog list differently:

(/ likes john mary) -> likes(john, mary)
(likes john mary) -> [likes, john, mary]

That’s it. The rest of this series is unpacking what this very simple, dumb syntax-level idea allows us to do. But it’s not just syntax – like moving from Prolog terms to S-expressions, it opens a rather large new semantic level doorway. It lets us express things that we can’t quite fit into either S-expressions, or Prolog terms (or SQL relations, or JSON objects, or OOP objects, or filesystems – or sets, or Lisp reader macros, or arbitrary data types), and also simplifies dealing with and representing all of these.

At least that’s my hope. And I think there are reasons to expect hope.

In short:

The low-level data model is linked list pair structure, as in Lisp. It could be run in a more compressed form, perhaps – like straight sequences of machine words. But we’ll start with normal Lisp pairs as our basic layer. We can look at PicoLisp for inspiration for how to build a full language (now even an OS) on top of this.
The serialisation syntax is S-expressions but without the dot. We have parentheses () to open and close lists.
(We can also keep the dot if we want, for embedding within standard Lisps. But we have the option to completely lose the dot for extra simplicity. Sometimes getting rid of a single reserved character can help a lot. Dots are not the best character to reserve, frankly, since they exist inside English text and numbers.)
Instead of the dot, we reserve one character to be a ‘term marker’. Initially I was favouring the vertical bar |, but because that’s a reserved character in many Lisps, I’m now going with the backslash / .
We probably also need a quoting strategy, and I’m more leaning toward C-style backslash \ for ‘quote the next character’ rather than double-quotes “” (because quotes get mangled, and more importantly, quotes don’t nest well at all, and we want recursive nestability)
Though, you know? Looking at the standard keyboard, if I were writing a character-level parser from scratch, and not caring about embedding… I’d probably go with [] for brackets, / for term, and ‘ for quote. Because they’re all on four keys right next to each other, next to Enter, visually distinct, and unshifted.
At a low-level data pointer level, we can probably get rid of the / as a symbol and put it as a tag bit in the pointer. But that’s an optimisation to think about later.

Dataspace 4: The Term-inator

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112