Initial: 2003-09-12
Modified: 2003-09-14

MRPT 0.2.1

MRPT 0.1 was not satisfactory, I will define MRPT 0.2. MRPT will not have a fixed type system, the type system may be extended as needed. The type system will try to be much more simplistic than other MRP type protocols.
MRPT 0.2.1: Changing general encoding a bit.

MRPT Base Encoding

This encoding will borrow some ideas from ASN.1.
Both types and semantic objects will be included in the type system.
ASN.1 may be considered as a possible encoding.

VLI Numbers

VLI Numbers are variable length integers. The goal of a VLI Num is to allow a compact representation for smaller numbers, and to still allow for reasonably compactness of larger numbers (approx a 1/8th inflation).
Each byte in a VLI Num has the MSB set to 1 to indicate a subsequent byte comprising lower bits, otherwise the byte is considered as being terminal (holds the lowest 7 bits). An encoder is allowed to insert pad bytes (0x80) if so needed.

Objects

Each object will begin with a 2 byte tag:

The the top 2 bits of the first byte will be a class:
0: base types;
1: implementation types;
2: context specific types;
3: dynamic types.

the next bit will specify whether it is a primitive or compound type:
0: primitive;
1: compound.

next will be a bit flagging the presence of an "Aux Head":
0: no aux-head present;
1: aux-head present.

the next 2 bits are reserved and will be set to 0.
the lower 2 bits and the next byte together will form a 10 bit tag (with the next byte being the lower 8). an exception is that if the lower 2 bits are both 1, then the first byte is followed by a VLI Num defining the tag (this is optional, otherwise all tags >=768 are invalid).

The Tag is possibly followed by an aux head (depending on the Aux Head bit).
The interpretation of an Aux Head depends on the type. The contents of an aux head are only allowed to effect the semantics of the type, and may be ignored if unsupported (an exception may be if the type does not have a defined encoding, ie: as in the case of raw data). If there is any kind of data in the aux head that is needed to properly decode the data, then if possible such data should be included in the contents leaving the auxhead for more semantic reasons.
An Aux Head consists of a VLI Num followed by some data (the length being indicated by the VLI-Num).

Next comes the contents.
The contents consists of a VLI num followed by data.
This number indicates the length of the following data, with the meaning of the data depending on the type given in the tag.

For primitive objects the contents will be raw data, with a meaning only explained by the type.
For compound objects the contents will be an array of objects.

There is to be no padding between objects.

Rationale:
Length fields make it easier for the decoder to determine if an entire structure is present and to allocate temporary storage.
As a cost of using lengths, it will be necessary that an encoder make use of buffers (or forward-predict the size of the data).
Aux heads were put in with some consideration. I viewed that they were potentially useful in many cases over the possibility of putting the header in the contents, and that in the case of compound structures it would not really be possible to header the contents (short of including a context-specific tag in the contents or such).

Thoughts:
I could use ASN.1 BER if needed. Technical differences are fairly minor (ASN.1 lacks auxhead, but this is not really signifigant).
A very simple map would involve mapping MRPT types to the application type space, but this would not be making competent use of ASN.1...
I may just implement and see. PDLIB should be good base to test this.
Considering dropping aux-heads...

MRPT Base Types

These will be generally required for MRPT conformance, Implementation extension types are allowed as well, however it is required that an implementation verify that the other end knows/sends the same types as it (as to avoid interpreting invalid/garbage when dealing with an unknown implementation).
Implementation types are allowed to be fixed on both the sender and reciever, wheras dynamic types are allowed to vary (but will likely be fixed from the point of view of the sender), but are required to be variable for the reciever.

Unless stated otherwise all multibyte integers are to be interpreted as 2's complement big-endian ("Network Byte Order"). All floating point values are to follow the IEEE encoding and also be in big-endian ordering.

Tag Range	Description
0	Invalid.
1-31	Reserved for primitive protocol types.
32-63	Reserved for compound protocol types.
64-95	Reserved for MRPT primitive values.
96-127	Reserved for MRPT compound values.
128-767	Reserved.
768-1023	Currently unusable.

Primitive Protocol Types

Tag Value	Description
1	Negotiation String.

MRPT Primitive Values

Tag Value	Description
64	Integer, required to have a power of 2 size (the implementation may raise an error if invalid/out of range).
65	Special, single byte {0=false,1=true, 2=null, 3-255 reserved}.
66	Narrow String, UTF-8/ASCII.
67	Wide String, UTF-16.
68	Floating point, 4/8 bytes.
69	Raw Data. Auxhead would consist of: a short, possibly followed by some data related to the type. The LSB of the short will indicate that the short is followed by text defining the content type (mime).
70	Date String, 'YYYY-MM-DD', 'YYYY-MM-DD hh:mm:ss', 'YYYY-MM-DD hh:mm:ss.fff'.
71	Symbolic String, UTF-8. Where supported is interpreted as a symbol, otherwise it is interpreted as a string.
72	Character, same encoding as Integer. Interpreted as a character literal where supported, otherwise to be interpreted as an integer.

Compound Protocol Types

Tag Value	Description
32	Call, Contents are a: Method-ID (String or Implementation Dependant Integer); Arguments (Array); and, an Cont-Id (Integer).
33	Return, Contents are a return value, and the Cont-Id from the call.
34	Error, Contents are the Cont-Id and an Error-String.
35	Pass, Contents are a: Method-ID (String or Implementation Dependant Integer); Arguments (Array). A pass is a call that does not return a value.

Error-String

Error-String has the Syntax:
'<error-key> <error-specifics...> <error-description>'.
I will define error keys:

Error Key	Error Specifics	Description
unknown-type	<tag-value>	Sent if an unknown type value was recieved. May not be sent for implementation or context specific types.
invalid-encoding	<tag-value>	Sent if a known type was recieved, but the encoding is invalid.
type-check	<cont-id>/'-'	Sent if the arguments for a called function do not match what the function accepts.

Compound Protocol Values

Tag Value	Description
96	Array of values, Contents are: Context:Compound:1, Wraps the values; Context:Compound:2, Allows a terminal value (to be ignored on languages that don't support them), this is the value usually found in the tail of a list, ie: '(x y . z)'. Default interpretation is that of a list (in languages that support them).
97	Structure, Contents are an array of members. Each member is of the type Context:Compound:1 and contains: a name string and a value object.
98	Link, Contents are: Node-URL (String), Object-ID (String, Integer), Mode (Integer) Mode is even for bi-directional passes and odd for unidirectional ones. The next 7 bits define what the object is, for now: 0=unused, 1=generic uni-directional, 2=generic bi-directional, 3=process, 4=function.
99	Vector Array, Same encoding as array. Interpreted as a vector in capable languages, otherwise to be interpreter as a normal array.

Establishing Connections

MRPT will by default use port 7937.
On connection the client is to write the raw string "MRPTID" followed by a version (0x0021, BE), and the server is to respond with "MRPTOK" followed by a version (0x0021 BE). The connection is to be closed if any other values are recieved, or if a response is not recieved within "a reasonable amount of time".

Negotiation

Negotiation strings can be sent to request/declare information to the other end of the node, each will have the form 'C...', where C is a special key character. '<' and '>' are used to indicate variables embedded in the form, they do not appear in the transmission of the string.
Any unknown negotiation strings are to be ignored. The connection is to be closed if unacceptable values can't be reached.

Form	Description
?<var>	Request the value of the variable on the other end.
=<var> <val>	Response to a request for a value/assignment, or to declare some property.
!<var> <val>	Try to assign a value on the other end.
E<ext>	Try to enable an extension.
S<ext>	Extension Enabled.
F<ext> <reason...>	Extension Failed, reason is an implementation dependant string.
K<tag> <name>	tag is the tag value, name is the name of the tag. This is a possible response of enabling extensions.

Variable/extension names will be viewed as implementation dependant.
Implementation dependant extensions should have a name of the form:
'<implementation>-<extension>' or '<creator>-<extension>'. '<creator>-<implementation>-<extension>' may be used if names may collide within projects of the same creator.

An implementation may not enable/negotiate an extension it does not support.