Xanadex


The Xanadex

CVID CID CCID Date Created Date Edited
78e0536d174e57a665b4890cfee6b5b0efdf017d44708933352ce53435482bf3 18402c9947f1c10371c978df93aacf814bee7037dd63ee7f531d59a709116929 6c92e97201983e333429ff57bab78d3e411eda7901b2ccd57bc45088aa69dd77 06 Apr 2026 02:02:52 GMT 07 Apr 2026 20:30:23 GMT

ABSTRACT

Xanadex is a Project Xanadu-inspired project for fundamental internet technologies, at first focused on document sharing. it spawned out of a previous startup, Netaris.

PRELIMINARIES

one important reason previous attempts to implement technologies like these never succeeded is that they did not have the sufficient linguistic technology, leading to everyone involved being confused about what’s what. as such, we will be introducing quite a few new terms here, but the terminology and the reasons for it will make immediate sense on reading it and trust me when I say that there is no other way for this to work.

we make frequent use throughout of CAN (Content Addressable Network). we introduce NAAN for NAme Addressable Network (avoiding collision with “NaN”).

this network is intended to be friendly to both human and bot users. as such, we will not impose any constraints on the network that are there simply for the sake of being human-friendly. also, constraints that are imposed are for the sake of efficient batch data processing (e.g., preventing infinite loops). we have to distinguish between “arbitrary” constraints and “necessary” constraints: human-friendly ones are arbitrary, machine-friendly ones are necessary. prominently, one arbitrary constraint is setting cardinality ceilings for field or action count. Xanadex is intended to be crawler-friendly.

HOW IT WORKS

Xanadex is based around its document format Xanadoc (“doc”). They are referenced in a CAN and are based off of DTOB. It has five parts

  1. unique identifiers (required)
  2. non-essential metadata (optional)
  3. references (paleolinks and autolinks) (required)
  4. content (required)
  5. neolinks (conditional)

we are going to be explaining these out-of-order.

bidirectional links, a key part of Ted Nelson’s vision for Project Xanadu, work as follows. first, acknowledge that any document that is cited by another document necessarily has to temporally precede the citing document. with this, we can make a naming scheme. in a two-way citation, the document that temporally precedes is called the paleodocument, and the document that temporally succeeds is called the neodocument. when a document is posted to Xanadex, the references section is read and the server (right now Xanadex is starting centralized) will propagate links to the referenced documents. these references follow the same naming scheme: references on documents at posting time that point to paleodocuments are called paleolinks, and references that point to documents that were posted later that reference that the cited document are called neolinks. the advantage of the “paleo-” and “neo-” naming scheme is that it is the clearest way to separate what is doing what. other terms like “forwards link” and “backwards link” or “incoming link” and “outgoing link” don’t actually tell you anything about which is which; but a temporal scheme can only be understood one way. a special type of bidirectional link is a transclusion, which is when content in one document is the exact same as content in another document.

the references are stored in a separate field from the content with their start and end index-pair slice. in the context of citations, we refer to the exact place being cited/referenced from as the defslice, and the thing doing the citing/referencing as the refslice. each slice is a start and stop pair (the paleo- and neo- scheme cannot extend here because later we are also introducing autolinks).

the references are stored separately from the content for a few reasons. first: to make a graph of all the references across the network it is much easier to do that if all the references are all in the same place; otherwise, crawlers have to scan the entire length of the document to find inline references which is far slower and can lead to parsing issues. secondly: if someone somehow is in a scenario where they have the text content without the citations and they want to narrow down where it came from (which could happen when doing batch processing of docs), they could look up the hash of the content and get a list of matches. thirdly: since proof of transclusion requires the proofs to have the whole content hash in them, and having the content hash be part of the content will change the hash of the content, we have to store them separately. doing cumulative := references + prefix + inclusion + suffix will not work because to compute a reference, you have to compute the cumulative, which is a circular dependency.

naturally, propagating spam links could clog up the network. this is a key reason why Xanadex is starting centralized, and in the meantime a long-term goal is to figure out some type of user reputation system similar to the one that email uses.

this separation means Xanadex can accommodate a couple certain link actions better than HTML. one of these is, you can point your link to a specific part of a paleodocument instead of the whole document. in essence: “this piece of information we are talking about came from exactly here”. this is done by simply including a range slice for the source document. we call these microlinks. if a document points to another document without slice ranges, we call this a macrolink. the neo- and paleo- nomenclatures are applied as prefixes.

there are a couple constraints on links:

  1. each reference must have a refslice
  2. no two refslices can intersect

without these requirements, the network could get clogged up by links that only vaguely link documents. if two documents are vaguely related, write it down somewhere in your neodoc. point 2 gets away from messy balls of mud, which should be obvious. the distinction between micro- and macro- is the presence of defslices.

another link action Xanadex allows for is autolinks. HTML kind of does it but not by that name - you link to other parts of your own document with the href target as an #id. for docs, you simply specify your defslice and refslice. this also allows for autotransclusions (your readers can be assured that you are using the exact same definition as before), which is not part of HTML.

there are a couple constraints on autolinks that extend the previous constraints:

  1. every autolink must have a defslice and a refslice (no “automacrolinks”)
  2. no autolink defslices can intersect another with another autolink’s refslices (prevents recursion hell. additionally, it is a general quality of speech that arguments have to be constructed linearly, and this link constraint keeps accordance with that)

this schema does allow for an autolink to point to a paleolink elsewhere in the document, meaning that it recurses with a depth of 1. also, note that this only forbids autolink defslices from intersecting with other autolinks’ refslices, but not from intersecting in general.

non-essential metadata

a title is a metadata field separate from the body field. typically it’s text and is different than the filename, since it’s used for identifiers that aren’t filename friendly (e.g. spaces). for the first time, DTOB cleanly allows for clean metadata separation between metadata and body. but for a practical document system, a few constraints have to be imposed that are essentially “arbitrary” (see preliminaries). mainly, where do you cap title length (in bytes). if your length cap is long enough, then you practically have two documents in the same “document”, which is not ideal. DTOB does allow for unbounded arbitrary binary as a field, which gets around “which character encoding schemas are valid”; but i’m choosing to probably not allow nesting for modifiers like bold and italics. this version of Xanadex will set an arbitrary limit, where titles are capped at 140 bytes.

the next few pertinent pieces of metadata that come up for a document system are timestamp, authorship, and viewership permissions. these all are hell of a challenge on their own that there are no satisfactory implementations for yet (although a couple have things that come close), and so the Xanadex network will use temporary solutions for now. Since these ones are temporary solutions, they will not be included in any identifying hashes for now, and possibly never will be.

resolving timestamping between computers that are very far away from each other and computers that do not trust each other is a completely different type of problem that is still unsolved. our drop-in-place working solution is using the Open Timestamps project, based on the bitcoin blockchain.

the ideal for authorship would be some universal decentralized digital ID system, which for our purposes here there is no satisfactory one (but a couple come close). again, completely disparate type of problem that Xanadex is punting on. for now, Xanadex will just use identity verification with email and a NAAN layer on top.

document viewing permissioning will be granted from email-verified account to email-verified account if the owner decides they want a “whitelist” approach.

doc versioning

Xanadex introduces a pretty cool versioning system called “dif” detailed here.

in the non-essential metadata goes a 32-byte field of random data, the Cross-Version Unique Identifier (XVUID).

doc unique identifiers

there are three at first:

  1. the Content Identity Hash (CID) as the hash of the content via the homomorphic hashing algorithm
  2. the Cumulative Content Identity Hash (CCID) as the hash of the references and the content via the homomorphic hashing algorithm
  3. the hash of the content via SHA-256 as a security redundancy

point 3 is included because the Bromberg hashing algorithm is a pretty untested one for collisions and there could be vulnerabilities there. standardization and mass adoption of a homomorphic hashing algorithm is something that will take a lot of time. so, the network may actually have to swap out which hashing algorithm it uses until its security has been determined.

point 2 is included to mitigate against a unique but fairly low-consequences attack vector: someone could post two versions of the same content with different citation data.

hashing over the whole DTOB for the doc is not an option because all the implementations for the previously described non-essential metadata will/can be swapped out, and also for the network to work “non-essential” literally means non-essential. a single-user document repository that gets sent out into space on a golden disk doesn’t need versioning or authorship attribution.

leaving some data as non-essential is also chosen over using a version number in the doc schema as a conditional identifier because I do not trust that there is such a thing as a safe parser that conditionally interprets things based off of magic numbers/version numbers; and chosen instead of retroactively going back and updating everything once a satisfactory homomorphic hashing algorithm or user-id system has been found, which could be expensive and might be impossible once the network becomes distributed and eventually de-centralized.

SPECIFICATIONS

the Xanadoc spec is here