mirror of
https://github.com/kovidgoyal/calibre.git
synced 2025-06-23 15:30:45 -04:00
227 lines
7.8 KiB
Plaintext
227 lines
7.8 KiB
Plaintext
The zTXT Format
|
|
---------------
|
|
|
|
The zTXT format is relatively straightforward. The simplest zTXT contains a
|
|
Palm database header, followed by zTXT record #0, followed by the compressed
|
|
data. The compressed data can be in one of two formats: one long data stream,
|
|
or split into chunks for random access. If there are any bookmarks, they occupy
|
|
the record immediately after the compressed data. If there are any annotations,
|
|
the annotation index occupies the record immediately after the bookmarks with
|
|
each annotation in the index having a record immediately after the annotation
|
|
index. Here are diagrams of a simple zTXT and a full featured zTXT:
|
|
|
|
DB Header
|
|
0 Record 0
|
|
1
|
|
2
|
|
3
|
|
... Compressed Data
|
|
36
|
|
37
|
|
38
|
|
|
|
DB Header
|
|
0 Record 0
|
|
1
|
|
2
|
|
3
|
|
... Compressed Data
|
|
36
|
|
37
|
|
38
|
|
39 Bookmarks
|
|
40 Annotation Index
|
|
41 Annotation 1
|
|
42 Annotation 2
|
|
43 Annotation 3
|
|
|
|
|
|
Compression Modes
|
|
-----------------
|
|
|
|
zTXT version 1.40 and later supports two modes of compression. Mode 1 is a
|
|
random access mode, and mode 2 consists of one long data stream. Both modes
|
|
work on 8K (the default record size) blocks of text.
|
|
|
|
Please note, however, that as of Weasel Reader version 1.60 the old style
|
|
(mode 2) zTXT format is no longer supported. makeztxt and libztxt still support
|
|
creating these documents for backwards compatibility, but you should not use
|
|
mode 2 if possible.
|
|
|
|
|
|
Mode 1
|
|
------
|
|
|
|
In mode one, 8K blocks of text are compressed into an equal number of blocks of
|
|
compressed data. Using the Z_FULL_FLUSH flush mode with zLib allows for random
|
|
access among the blocks of data. In order for this to function, the first block
|
|
must be decompressed first, and after that any block in the file may be
|
|
decompressed in any order. In mode 1, the blocks of compressed data will likely
|
|
not all have the same size.
|
|
|
|
|
|
Mode 2
|
|
------
|
|
|
|
In zTXT versions before 1.40, this was the only method of compression. This
|
|
mode involves compressing the entire input buffer into a single output buffer
|
|
and then splitting the resulting buffer into 8K segments. This mode requires
|
|
that all of the compressed data be decompressed in one pass. Since there are no
|
|
real 'blocks' of data, the resulting output can be of any blocksize, though
|
|
typically the default of 8K should be fine. The advantage to mode 2 is that it
|
|
will give about 10% - 15% more compression.
|
|
|
|
|
|
zTXT Record #0 Definition (version 1.44)
|
|
----------------------------------------
|
|
|
|
Record 0 provides all of the information about the zTXT contents. Be sure it is
|
|
correct, lest firey death rain down upon your program.
|
|
|
|
typedef struct zTXT_record0Type {
|
|
UInt16 version;
|
|
UInt16 numRecords;
|
|
UInt32 size;
|
|
UInt16 recordSize;
|
|
UInt16 numBookmarks;
|
|
UInt16 bookmarkRecord;
|
|
UInt16 numAnnotations;
|
|
UInt16 annotationRecord;
|
|
UInt8 flags;
|
|
UInt8 reserved;
|
|
UInt32 crc32;
|
|
UInt8 padding[0x20 - 24];
|
|
} zTXT_record0;
|
|
|
|
|
|
Structure Elements
|
|
------------------
|
|
|
|
UInt16 version;
|
|
|
|
This is mostly just informational. Your program can figure out what features
|
|
might be available from the version. However, the remaining parts of the
|
|
structure are designed such that their value will be 0 if that particular
|
|
feature is not present, so that is the correct way to test. The version is
|
|
stored as two 8 bit integers. For example, version 1.42 is 0x012A.
|
|
|
|
UInt16 numRecords;
|
|
|
|
This is the number of DATA records only and does not include record 0,
|
|
bookmarks, or annotations. With compression mode 1, this is also the number of
|
|
uncompressed text records. With mode 2, you must decompress the file to figure
|
|
out how many text records there will be.
|
|
|
|
UInt32 size;
|
|
|
|
The size in bytes of the uncompressed data in the zTXT. Check this value with
|
|
the amount of free storage memory on the Palm to make sure there's enough room
|
|
to decompress the data in full or in part.
|
|
|
|
UInt16 recordSize;
|
|
|
|
recordSize is the size in bytes of a text record. This field is important, as
|
|
the size of text and decompression buffers is based on this value. It is used
|
|
by Weasel to navigate though the text so it can map absolute offsets to record
|
|
numberss. 8192 is the default. With compression mode 1, this is the amount of
|
|
data inside each compressed record (except maybe the last one), but the actual
|
|
compressed records will likely have varying sizes. In mode 2, both compressed
|
|
records and the resulting text records are all of this size (except, again, the
|
|
last record).
|
|
|
|
UInt16 numBookmarks;
|
|
|
|
The definitive count of how many bookmarks are stored in the bookmark index
|
|
record. See the section on bookmarks below.
|
|
|
|
UInt16 bookmarkRecord;
|
|
|
|
If there are any bookmarks, this is set to the record index number that
|
|
contains the bookmark listing, otherwise it is 0.
|
|
|
|
UInt16 numAnnotations;
|
|
|
|
Like the bookmark count, this is the definitive count of how many annotations
|
|
are in the annotation index and how many annotation records follow it. See the
|
|
section on annotation below.
|
|
|
|
UInt16 annotationRecord;
|
|
|
|
If there are any annotations, this is set to the record index number that
|
|
contains the annotation index, otherwise it is 0.
|
|
|
|
UInt8 flags;
|
|
|
|
These flags indicate various features of the zTXT database. flags is a bitmask
|
|
and at present the only two defined bits are:
|
|
|
|
ZTXT_RANDOMACCESS (0x01)
|
|
If the zTXT was compressed according to the method in mode 1, then it
|
|
supports random access and this should be set.
|
|
ZTXT_NONUNIFORM (0x02)
|
|
Setting this bit indicates that the text records within the zTXT database
|
|
are not of uniform length. That is, when the blocks of text are
|
|
decompressed they will not have identical block sizes. If this is not set,
|
|
the compressed blocks are assumed to all have the same size when
|
|
decompressed (typically 8K) except for the last block which can be smaller.
|
|
|
|
UInt32 crc32;
|
|
|
|
A CRC32 value for checking data integrity. This value is computer over all text
|
|
data record only and does not include record 0 nor any bookmark/annotation
|
|
records. The current implementation in makeztxt/Weasel computes this value
|
|
using the crc32 function in zLib which should be the standard CRC32 definition.
|
|
|
|
UInt8 padding[0x20 - 24];
|
|
|
|
zTXT record zero is 32 bytes in length, so the unused portion is padded.
|
|
|
|
|
|
zTXT Bookmarks
|
|
--------------
|
|
|
|
zTXT bookmarks are stored in a simple array in a record at the end of a zTXT.
|
|
The format is as follows:
|
|
|
|
#define MAX_BMRK_LENGTH 20
|
|
|
|
typedef struct GPlmMarkType {
|
|
UInt32 offset;
|
|
Char title[MAX_BMRK_LENGTH];
|
|
} GPlmMark;
|
|
|
|
In the structure, offset is counted as an absolute offset into the text. The
|
|
bookmarks must be sorted in ascending order.
|
|
|
|
If there are no bookmarks, then the bookmark index does not exist. When the
|
|
user creates the first bookmark, the record containing the index will then be
|
|
created. If there are annotations, when the bookmark record is created it must
|
|
go before the annotation index. This will require incrementing annotationRecord
|
|
in record 0 to point to the new record index.
|
|
|
|
Similarly, when all bookmarks are deleted the bookmark index record is also
|
|
deleted. If there are annotations, annotationRecord in record 0 must be
|
|
decremented to point to the new index.
|
|
|
|
|
|
zTXT Annotations
|
|
----------------
|
|
|
|
zTXT annotations have a format almost identical to that of the bookmark index:
|
|
|
|
typedef struct GPlmAnnotationType {
|
|
UInt32 offset;
|
|
Char title[MAX_BMRK_LENGTH];
|
|
} GPlmAnnotation;
|
|
|
|
Like the bookmarks, offset is an absolute offset into the text. The annotation
|
|
index is organized just as the bookmarks are, as a single array in a record.
|
|
Note that this structure does NOT store the actual annotation text.
|
|
|
|
The text of each annotation is stored in its own record immediately following
|
|
the index. So, the first annotation in the index will occupy the first record
|
|
following the index, and the second annotation will be in the second record
|
|
following the index, and so on. The text of each annotation is limited to
|
|
4096 bytes.
|
|
|