Xanadex

Delineation

CVID	CID	CCID	Date Created	Date Edited
cef52be666abf1ce78393dc37909e9abf08eaaee2a58ad54fef2ecabf939bc1e⧉	5aed1e86d8fb719c220f28811ace46f2168e2497fba14dcd1aab0fe1c35e451d⧉	7e021f80ef01d35a77981d9d80bca5742ddc4fa9fec5c6e41661aab13abc7c49⧉	07 Apr 2026 03:14:49 GMT	07 Apr 2026 05:15:00 GMT

Consider a stream of information in binary encoding (“radix 2”) going from sender to recipient, or writer to reader, whose length ( $l$ ) in bits is unknown and whose value is unknown.

$\{n_1, n_2 ... n_l | n_* \in \{0, 1\}\}$

Now let’s say two or more parts of the information are separate, that is, to be interpreted in different ways by the recipient, but the recipient does not know where one part ends and the next begins. Where $k$ is the end of the first part of information, this stream of information becomes:

$\{n_1, n_2 ... n_j, n_{k+1}, n_{k+2} ... n_l | n_* \in \{0, 1\}\}$

Subsequences $n_1 ... n_k$ and $n_{k+1} ... n_l$ are both bitstreams of unknown length. In fact, by Cantor every sub-bitstream in $n_1 ... n_l$ is a bitstream of unknown length. Here we start with investigating: if we want to mark two or more pieces of information in an unknown unbounded bitstream as separate, how do we do that?

We begin by exhaustively trying approaches, seeing what axes this uncovers, and seeing if we can determine a direction once those arise. Our first approach is a common one in “software engineering”: length prefixing, for encoding the position of the boundary between two pieces of information. If piece of information is $k$ bits long, then its prefix $j$ will require

$j := \lceil {log}_2 k \rceil$

bits to encode its position. But if $k$ is unknown, then this quantity is unknown. This leaves us with three unknown bitstreams of unknown length, which is increasing our unknowns not decreasing them and so cannot be treated as a solution.

But lets assert that $j$ is known, and let’s assign 8 to it. If our recipient understands that it was to scan every 8 bits per 2⁸-1 bits (minus one because value nil 00000000 would mean “no separation in the next understood amount of bits”) for a value to mark separation, and if there is none then to look again after the next 255 bit “chunk” (we will call this technique “chunking”), and once it finds one it can stop scanning and interpret the rest of the bits as literals, this works. And this can quickly be expanded to include “an unknown amount of separated pieces of information”. If a chunk is found to have a separation marker, it will disregard all the bits of information after the separation marker until the start of the next chunk, and the next piece of information will start at the next chunk (this is because of the case where |sequence A| + |sequence B| < |chunk|). If the amount of pieces of information being marked as separated is known by the recipient, than the sender can skip chunk encoding for the last pieces of information to save up to 8/263 bits of data after the second-to-last piece of information. If the amount of pieces of information being marked as separated is unknown, then continue chunk encoding until the end of message. This shared understanding between sender and recipient, that the sender knows the recipient will interpret the data a certain way, and the recipient knows the sender will send it a certain way, is called protocol. The two known pieces of information being communicated are the epistemics of the cardinalty of pieces of information to be marked as separated (is the cardinality known or unknown), and how many bits are to be reserved as chunk prefixes. The fact of dropping the chunk prefixes for the last piece of information if the cardinaility is known can easily be inferred by any intelligent actor. This is actually an important point of protocol design: if you could communicate information to another party, what would they be expected to do with the data? what are the obvious things they could be expected to do, such as the energetically efficient thing of dropping unnecessary prefix encodings? This actually neatly solves our issue here, which is one of the worst issues that has plagued all of computer science for the entirety of its over half-century history.

So protocol has to be communicated between sender and recipient for this to work. Imagine the sender walking around with two pieces of information stuck to their forehead, a number (e.g. 8) and a cardinality epistemic status (e.g. “3 (known)” or “unknown”). Or, imagine a radio signal coming from a star that had a piece of information attached to its sender: the light frequency emitted to it relative to other stars (high or low?). This brings our question to a further question: what is the minimum protocol that is required for a sender to send information and assume that the recipient will be able to do the expected, energetically efficient things with it?

Let’s go to the extreme and say “no protocol at all. The sender is not able to convey any additional information at all to the sender besides the message itself”. If the receiver sees something that looks like it has a pattern in it that it is not random noise (e.g. 1, mostly 00000000 usually appearing every 255 bits; e.g. 2, all 0s appearing after value $j$ plus 8 every 263rd bit when $j$ is the value of the first 8 bits), and they were to try to interpret it, they might be able to see that it is for separation of data. But trying to find patterns like that computationally over otherwise unknown data would require a scan of $j + 2^j - 1$ bits. This goes up infinitely. J is unbounded. These scans do not have to be for the chunking protocol as described above, there are probably other minimal protocols, and probably a provable finite number of them (for future work), but that is beyond the scope of this paper. We will refer to these now-generalized unbounded scans as scanning for integer N. So, to try to reasonably interpret otherwise noisy data when looking for protocol-structured data, what is the minimum scanning number we can look for? Lets continue with our binary scheme and try 2.

Value one is for separate, value zero is for not separate. Every other value is an actual encoding, and every other value is “separate” or “not separate”. This is one separation symbol per informational symbol. If we were to divide the ratio by the total amount of symbols, that gives us 1/2 or 50% informational efficiency, which I term the ratio of separating symbols to informational symbols. This is far worse than our >96% informational efficiency we had previously, which a predictable protocol reader would not do.

If we were to try another approach to get the number to be 2, and we reserved one symbol out of two to be the separator, that leaves us one to encode information. But you cannot encode information with one symbol because the entropy of it is 0 because it cannot collapse into multiple states when observed. And adding a time channel breaks our question of protocol and pure information representation. But, if we kept two symbols to encode information, and added a symbol, so we had one symbol to separate and two to encode, that gives us trinary. This is the minimum viable protocol, the amount of scans to go over it is 3. This is also the minimum amount of symbols to clearly separate information.