[This file has been superseded...]


PAK like file format supporting compression and read/write operation (as zipfiles are not that well suited to read/write).
The format will assume no fragmentation of files, rather a file is to be moved if it expands beyond the space available to it.

Header
{
FOURCC magic;	//'ZPAK'
u32 ents;		//size of directory
u64 offs;		//offset of directory
}

DirEntry
{
char name[32];	//file name, 0 padded
u32 chain;		//next dir entry
u32 date_time;	//date modified
u16 flags;		//file flags (1&=fragmented, 2&=dir)
u16 method;		//method number, 0=store, 8=deflate, 9=deflate(ext)
u32 crc32;		//CRC32 of data (or first entry if dir)
u32 usize;		//uncompressed size of data
u32 csize;		//compressed size of data
u64 offset;		//offset to start of data
}

The name will contain the filename if below 32 characters, otherwise, the name field will begin with 0, and the name will precede the compressed contents.

Changed: A name beginning with 0 is not allowed, it will signify a free entry. As such, the name limit is 32 chars.

Consider: Make 32 chars the implementation limit.

Vs. the date/time being in dos format:
	date, low 5 bits=day, next 4 bits=month, high 7=year rel 1980;
	time, low 5=seconds/2, next 6 bits=minute, high 5=hour
The date_time field multiplies and adds the values:
	60s, 60m, 24h, 31d, 12m, 133.63y (fails in 2113 vs 2107).
	so:
	s=date_time%60;
	m=(date_time/60)%60;
	h=(date_time/3600)%24;
	...

Considered:
1&=the file is fragmented. Offset/csize refers to a table of spans, each giving the offset and size of each fragment. This could be useful for larger files, where possibly relocating on each write may not be all that efficient. Default chunking will be 64kB.

2&=directory. The CRC field will refer to the first directory entry. Usize, csize, and offset are reserved. Method will be 0.


Method 9 will be deflate with a 64kB window.
The CRC algo will be the same algo used in ZIP and PNG for example.


So, very little extra is stored in the file itself.

The thought for read/write access will be based on building a list of spans. All the empty space will be given to free spans, and all used space given to used spans. Idea here: First, all used spans are accumulated and sorted, and free spans are inferred from any breaks between the spans.

Files will be read in and decompressed on open, and, if modified, recompressed on close. If a file does not fit within a given span, it will be relocated to a span that fits, and, failing that, to the end of the image.

Fragmented Files:
Fragmented files will have the contents refer to an array of spans.

FragmentSpan {
u32 _resv0;		//reserved, 0
u32 _resv1;		//reserved, 0
u16 flags;		//fragment flags
u16 method;		//method number, 0=store, 8=deflate, 9=deflate(ext)
u32 crc32;		//CRC32 of data
u32 usize;		//uncompressed size of data
u32 csize;		//compressed size of data
u64 offset;		//offset to start of data
}

Special Files:
All files with names beginning with '$' are special and will not be visible (or accessible) as directory contents.

$spans
Management of free spans.

SpansHeader {
u32 free;		//first free entry
u32 ent_16;		//entries 16-32 bytes
u32 ent_32;		//32..63 bytes
u32 ent_64;		//64..127
u32 ent_128;	//128..255
u32 ent_256;	//256..511
u32 ent_512;	//512..1023
u32 ent_1k;		//...
u32 ent_2k;		//
u32 ent_4k;		//
u32 ent_8k;		//
u32 ent_16k;	//
u32 ent_32k;	//
u32 ent_64k;	//
u32 pad;
u32 pad;
}

SpansEntry {
u32 next;		//next in list
u32 size;		//size of span data
u64 offset;		//offset to span data
}

Most likely, coalescing and sorting will be done at runtime (images should be small enough that the overhead should be minor).

Compression:
Could either default to some algo, or try possible algos to determine which is best. Could detect write frequency, and for sufficiently oftenly modified data could default to store.


Experimental algo idea:
Modified LZW+Huffman.

LZW with 4k..256k entries (MRU sorted).
0..127 will encode strings using the prefix+extra scheme.

Prefix	Range			Extra
0..15		0..15			0
16..23	16..31		1
24..31	32..63		2
32..39	64..128		3
40..47	128..255		4
48..55	256..511		5
56..63	512..1023		6
64..71	1024..2047		7
72..79	2048..4095		8
80..87	4096..8191		9
88..95	8192..16383		10
96..103	16384..32765	11
104..111	32768..65535	12
112..119	65536..131071	13
120..127	131072..262143	14

128: EOB


Consider JPEG style huffman tree layout.

Header {
byte fl;		//low 3: dictionary size, 7=EOS
byte cnts[16];	//symbol counts between 1 and 16 bits
byte syms[];	//lists of symbols for each count
}

Dictionary size: 0..7
0=resv, 1=4k, 2=8k, 3=16k, 4=32k, 5=64k, 6=128k, 7=256k

Bits will be coded starting from the LSB of each byte, and huffman codes will be written starting with the MSB. As a result, huffman codes will be effectively transposed.

The EOS bit being set means that this is the last block in the stream.

Consider testing algo for both performance and compression.

Misc:
k=(c-(2*b-a));
f=(0.125*(1.0-4*(x-0.5)*(x-0.5)));
g=b*(1-x) + c*x - k*f;

1 8 27 64 125 216 343 512
1 27 ? 125 216
L 53 RL 34  LE 72 RLE -7
LI 76 E 12
