mirror of
https://github.com/kovidgoyal/calibre.git
synced 2025-07-09 03:04:10 -04:00
Merge from trunk
This commit is contained in:
commit
90ef9949ca
54
format_docs/compression/palmdoc.txt
Normal file
54
format_docs/compression/palmdoc.txt
Normal file
@ -0,0 +1,54 @@
|
||||
About
|
||||
-----
|
||||
|
||||
PalmDOC uses LZ77 compression techniques. DOC files can contain only compressed
|
||||
text. The format does not allow for any text formatting. This keeps files
|
||||
small, in keeping with the Palm philosophy. However, extensions to the format
|
||||
can use tags, such as HTML or PML, to include formatting within text. These
|
||||
extensions to PalmDoc are not interchangeable and are the basis for most eBook
|
||||
Reader formats on Palm devices.
|
||||
|
||||
LZ77 algorithms achieve compression by replacing portions of the data with
|
||||
references to matching data that has already passed through both encoder and
|
||||
decoder. A match is encoded by a pair of numbers called a length-distance pair,
|
||||
which is equivalent to the statement "each of the next length characters is
|
||||
equal to the character exactly distance characters behind it in the
|
||||
uncompressed stream." (The "distance" is sometimes called the "offset" instead.)
|
||||
|
||||
In the PalmDoc format, a length-distance pair is always encoded by a two-byte
|
||||
sequence. Of the 16 bits that make up these two bytes, 11 bits go to encoding
|
||||
the distance, 3 go to encoding the length, and the remaining two are used to
|
||||
make sure the decoder can identify the first byte as the beginning of such a
|
||||
two-byte sequence.
|
||||
|
||||
PalmDoc combines LZ77 with a simple kind of byte pair compression.
|
||||
|
||||
|
||||
PalmDoc files are decoded as follows:
|
||||
-------------------------------------
|
||||
|
||||
Read a byte from the compressed stream. If the byte is
|
||||
|
||||
0x00: "1 literal" copy that byte unmodified to the decompressed stream.
|
||||
|
||||
0x09 to 0x7f: "1 literal" copy that byte unmodified to the decompressed stream.
|
||||
|
||||
0x01 to 0x08: "literals": the byte is interpreted as a count from 1 to 8, and
|
||||
that many literals are copied unmodified from the compressed stream to the
|
||||
decompressed stream.
|
||||
|
||||
0x80 to 0xbf: "length, distance" pair: the 2 leftmost bits of this byte ('10')
|
||||
are discarded, and the following 6 bits are combined with the 8 bits of the
|
||||
next byte to make a 14 bit "distance, length" item. Those 14 bits are broken
|
||||
into 11 bits of distance backwards from the current location in the
|
||||
uncompressed text, and 3 bits of length to copy from that point
|
||||
(copying n+3 bytes, 3 to 10 bytes).
|
||||
|
||||
0xc0 to 0xff: "byte pair": this byte is decoded into 2 characters: a space
|
||||
character, and a letter formed from this byte XORed with 0x80.
|
||||
|
||||
Repeat from the beginning until there is no more bytes in the compressed file.
|
||||
|
||||
PalmDOC data is always divided into 4096 byte blocks and the blocks are acted
|
||||
upon independently.
|
||||
|
3217
format_docs/compression/zip.txt
Normal file
3217
format_docs/compression/zip.txt
Normal file
File diff suppressed because it is too large
Load Diff
309
format_docs/pdb/ereader.txt
Normal file
309
format_docs/pdb/ereader.txt
Normal file
@ -0,0 +1,309 @@
|
||||
About
|
||||
-----
|
||||
|
||||
The eReader format has evolved and changed over time. Subsequently, there are
|
||||
multiple versions of the eReader format. There are also two different tools
|
||||
that can create eReader files. The official tools are Makebook and Dropbook.
|
||||
Dropbook is the newer official tool that has replaced Makebook. However,
|
||||
Makebook is still in wide use because it supports a wider range of platforms
|
||||
than Dropbook. Dropbook is a GUI application that only runs on Windows and
|
||||
Apple’s OS X.
|
||||
|
||||
|
||||
PDB Identiy
|
||||
-------
|
||||
|
||||
PNRdPPrs
|
||||
|
||||
|
||||
202 and 132 headers
|
||||
-----------------------------------------
|
||||
|
||||
Older files have a record 0 size of 202 and occasionally 116. Newer files have
|
||||
a record 0 size of 132. As of this writing the 202 files only support text and
|
||||
images. The image format in the 202 files is the same as the 132 files. The 132
|
||||
files support a number of additional features.
|
||||
|
||||
|
||||
Record 0, eReader header (202)
|
||||
------------------
|
||||
|
||||
Note all values are in 2 byte increments. Like values are condensed into a
|
||||
range. The range can be borken into 2 byte sections which represent the actual
|
||||
stored values.
|
||||
|
||||
bytes content comments
|
||||
|
||||
0-2 Version Non-DRM books 2 and 4.
|
||||
2-8 Garbage
|
||||
8-10 Non-Text Offset Start of Non text area (images) will run to the
|
||||
end of the section list.
|
||||
10-14 Unknown
|
||||
14-24 Garbage
|
||||
24-28 Unknown
|
||||
28-98 Garbage
|
||||
98-100 Unknown
|
||||
100-110 Garbage
|
||||
110-114 Unknown
|
||||
114-116 Garbage
|
||||
116-202 Unknown
|
||||
|
||||
* Garbage: Intentially random values.
|
||||
|
||||
|
||||
Text Records (202)
|
||||
------------------
|
||||
|
||||
Text starts with section 1 and continues until the section indicated by the
|
||||
Non-Text Offset. All text records are PalmDoc compressed.
|
||||
|
||||
Each character in the compressed data is xored with 0xA5.
|
||||
|
||||
A decompression example in sudo Python:
|
||||
|
||||
for num in range(1, Non-Text Offset):
|
||||
text += decompress_pamldoc(''.join([chr(ord(x) ^ 0xA5) for x in section_data(num)])).decode('cp1252', 'replace')
|
||||
|
||||
|
||||
Dropbook 132 files
|
||||
------------------
|
||||
|
||||
The following sections apply to the newer Dropbook created files.
|
||||
|
||||
|
||||
Record 0, eReader header (132)
|
||||
----------------------------
|
||||
|
||||
This is only for 132 byte header files created by Dropbook.
|
||||
|
||||
bytes content comments
|
||||
|
||||
0-2 compression Specifies compression and drm. 2 = palmdoc,
|
||||
10 = zlib. 260 and 272 = DRM
|
||||
2-6 unknown Value of 0 is used
|
||||
6-8 encoding Always 25152 (0x6240). All text must be
|
||||
encoded as Latin-1 cp1252
|
||||
8-10 Number of small pages The number of small font pages. If page
|
||||
index is not build in then 0.
|
||||
10-12 Number of large pages The number of large font pages. If page
|
||||
index is not build in then 0.
|
||||
12-14 Non-Text record start The location of the first non text records.
|
||||
record 1 to this value minus 1 are all text
|
||||
records
|
||||
14-16 Number of chapters The number of chapter index records
|
||||
contained in the file
|
||||
16-18 Number of small index The number of small font page index records
|
||||
contained in the file
|
||||
18-20 Number of large index The number of large font page index records
|
||||
contained in the file
|
||||
20-22 Number of images The number of images contained in the file
|
||||
22-24 Number of links The number of links contained in the file
|
||||
24-26 Metadata avaliable Is there a metadata record in the file?
|
||||
0 = None, 1 = There is a metadata record
|
||||
26-28 Unknown Value of 0 is used
|
||||
28-30 Number of Footnotes The number of footnote records in the file
|
||||
30-32 Number of Sidebars The number of sidebar records in the file
|
||||
32-34 Chapter index record start The location of chapter index records. If
|
||||
there are no chapters use the value for the
|
||||
Last data record.
|
||||
34-36 2560 Magic value that must be set to 2560
|
||||
36-38 Small page index start The location of small font page index
|
||||
records. If page table is not built in use
|
||||
the value for the Last data record.
|
||||
38-40 Large page index start The location of large font page index
|
||||
records. If page table is not built in use
|
||||
the value for the Last data record.
|
||||
40-42 Image data record start The location of the first image record. If
|
||||
there are no images use the value for the
|
||||
Last data record.
|
||||
42-44 Links record start The location of the first link index
|
||||
record. If there are no links use the value
|
||||
for the Last data record.
|
||||
44-46 Metadata record start The location of the metadata record. If
|
||||
there is no metadata use the value for the
|
||||
Last data record.
|
||||
46-48 Unknown Value of 0 is used
|
||||
48-50 Footnote record start The location of the first footnote record.
|
||||
If there are no footnotes use the value for
|
||||
the Last data record.
|
||||
50-52 Sidebar record start The location of the first sidebar record.
|
||||
If there are no sidebars use the value for
|
||||
the Last data record.
|
||||
52-54 Last data record The location of the last data record
|
||||
54-132 Unknown Value of 0 is used
|
||||
|
||||
Note: All values are in 2 byte increments. All bytes in the table that have a
|
||||
range larger than 2 can be broken into 2 byte segments and have different
|
||||
values set for each grouping.
|
||||
|
||||
|
||||
Records Order
|
||||
-------------
|
||||
|
||||
Though the order of this sections is described in eReader header,
|
||||
DropBook makes the following order:
|
||||
|
||||
1. eReader Header
|
||||
2. Compressed text
|
||||
3. Small font page index
|
||||
4. Large font page index
|
||||
5. Chapter index
|
||||
6. Links index
|
||||
7. Images
|
||||
8. (Extrapolation: there should be one more record type here though it has
|
||||
not yet been uncovered what it might be).
|
||||
9. Metadata
|
||||
10. Sidebar records
|
||||
11. Footnote records
|
||||
12. Text block size record
|
||||
13. "MeTaInFo\x00" word record
|
||||
|
||||
|
||||
Text Records
|
||||
------------
|
||||
|
||||
All text records use cp1252 encoding (although eReader documents talk about
|
||||
UTF-8 as well). Their total compressed size is unknown however, anything below
|
||||
3560 Bytes is known to work. The text will be either zlib or palmdoc
|
||||
compressed. Use the compression value from the eReader header to determine
|
||||
which. All text utalizes the Palm Markup Language (PML) for formatting.
|
||||
|
||||
Starting with DropBook 1.6.0 text is divided into 8KB (8192 bytes) blocks
|
||||
trimming the end to the closest space character and then being compressed.
|
||||
Earlier version of DropBook 1.5.2 tries to behave the same way, though
|
||||
sometimes it trims the block in unexpected place.
|
||||
|
||||
|
||||
Chapter Index Records
|
||||
---------------------
|
||||
|
||||
Each chapter record corresponds to 1 chapter and points at the place in the
|
||||
book. Chapter record takes a form of 'offset name\x00' First 4 bytes are offset
|
||||
of the original pml file where the chapter index points to (offset of
|
||||
the \x|\X?|\C? tags). Then without a space goes a name of a chapter in chapter
|
||||
index. It should contain only text, all formatting tags should be removed.
|
||||
\U and \a tags are not permitted in chapter name. To maintain sub-chapters
|
||||
4*n spaces (\x20) are added to the beginning of the name, where "n" is level of
|
||||
chapter: 0 for \x tag and N for \CN="" and \XN tags. And then an ending
|
||||
\x00 symbol.
|
||||
|
||||
|
||||
Image Records
|
||||
-------------
|
||||
|
||||
Image records must be smaller than 65505 Bytes. They must also be 8bit PNG
|
||||
images.
|
||||
|
||||
An image record takes the form 'PNG name\x00... image_data'
|
||||
|
||||
bytes content comments
|
||||
|
||||
0-4 PNG There must be a space after PNG.
|
||||
4-36 image name. The image name must be 32 exactly 32 Bytes long. Pad
|
||||
the right side of the name with \x00 characters for
|
||||
names shorter than 32 characters.
|
||||
36-58 Unknown
|
||||
58-60 width Width of an image
|
||||
60-62 height Height of an image
|
||||
62-? The image data raw image data in 8 bit PNG format
|
||||
|
||||
Note: DropBooks seems to change something in png raw data. Like reencoding or
|
||||
something, but plain insertion of png image there still works.
|
||||
|
||||
|
||||
Links Records
|
||||
-------------
|
||||
|
||||
Links records are constructed the same way as chapter ones. Each link anchor
|
||||
record corresponds to 1 link anchor and points at the place in the book. Link
|
||||
record takes a form of 'offset name\x00' First 4 bytes are offset of the
|
||||
original pml file where the link anchor points to (offset of the \Q tag). Then
|
||||
without a space goes a name of a link anchor. It should contain only text, all
|
||||
formatting tags should be removed. \U and \a tags are not permitted in link
|
||||
anchor name. And then an ending \x00 symbol.
|
||||
|
||||
|
||||
Footnote Records
|
||||
----------------
|
||||
|
||||
The first footnote record is a \x00 separated list of footnote ids. All
|
||||
subsequent footnote records are the footnote text corresponding to the id's
|
||||
position in the list. Footnote text is compressed in the same manner as normal
|
||||
text records
|
||||
|
||||
E.G.
|
||||
|
||||
footnote section 1 = 'notice1\x00notice2\x00notice3\x00'
|
||||
footnote section 2 = 'Text for notice 1'
|
||||
footnote section 3 = 'Text for notice 2'
|
||||
footnote section 4 = 'Text for notice 3'
|
||||
|
||||
Starting with Dropbook 1.5.2 first record looks a bit different. It is sequence
|
||||
of \x00\x01 then 1 byte of footnote id length, then footnote id then \x00.
|
||||
|
||||
E.G.
|
||||
|
||||
footnote section 1 = '\x00\x01\x07notice1\x00\x00\x01\x0Afootnote10\x00'
|
||||
|
||||
|
||||
Sidebar Records
|
||||
---------------
|
||||
|
||||
The first sidebar record is a \x00 separated list of sidebar ids. All
|
||||
subsequent sidebar records are the sidebar text corresponding to the id's
|
||||
position in the list. Sidebar text is compressed in the same manner as normal
|
||||
text records
|
||||
|
||||
E.G.
|
||||
|
||||
sidebar section 1 = 'notice1\x00notice2\x00notice3\x00'
|
||||
sidebar section 2 = 'Text for notice 1'
|
||||
sidebar section 3 = 'Text for notice 2'
|
||||
sidebar section 4 = 'Text for notice 3'
|
||||
|
||||
Starting with Dropbook 1.5.2 first record looks a bit different. It is sequence
|
||||
of \x00\x01 then 1 byte of sidebar's id length, then sidebar's id then \x00.
|
||||
|
||||
E.G.
|
||||
|
||||
sidebar section 1 = '\x00\x01\x07notice1\x00\x00\x01\x09sidebar10\x00'
|
||||
|
||||
|
||||
Metadata Record
|
||||
---------------
|
||||
|
||||
\x00 separated list of string.
|
||||
|
||||
Metadata takes the form:
|
||||
|
||||
title\x00
|
||||
author\x00
|
||||
copyright\x00
|
||||
publisher\x00
|
||||
isbn\x00
|
||||
|
||||
E.G.
|
||||
|
||||
Gibraltar Earth\x00Michael McCollum\x001999\x00Sci Fi Arizona\x001929381255\x00
|
||||
|
||||
The metdata record is always followed by a record which contains 'MeTaInFo\x00'
|
||||
|
||||
Note: Starting with DropBook 1.5.2 'MeTaInFo\x00' is not following Metadata
|
||||
Record. It is a separate record that ends the file and there are some more
|
||||
records between Metadata record and 'MeTaInFo\x00' record.
|
||||
|
||||
|
||||
Text Sizes Record
|
||||
-----------------
|
||||
|
||||
There is a special record that contains the initial size of all text blocks
|
||||
before compression. It is just a sequence of 2-byte blocks which are containing
|
||||
the sizes.
|
||||
|
||||
E.G.
|
||||
|
||||
\x1F\xFB\x20\x00\x20\x00\x1F\xFE\x1F\xFD\x09\x46
|
||||
|
||||
Note: By this we can judge that theoretical maximum of initial block size is
|
||||
65535 bytes.
|
||||
|
414
format_docs/pdb/mbp.txt
Normal file
414
format_docs/pdb/mbp.txt
Normal file
@ -0,0 +1,414 @@
|
||||
// BEGINING OF FILE
|
||||
// NOTES:
|
||||
// 1* Numeric data stored as big endian, 32 bits.
|
||||
// 2* Data padded to 16 bits limits. (Sometimes to 32 bits limits?)
|
||||
// 3* Text stored seems to be an 8 bit encoding padded to 16 bits
|
||||
// (may be "ISO-8859-1"?, or may be just a local machine character set?)
|
||||
// 4* I initially used the term "MARK" where I should have used "HIGHLIGTH",
|
||||
// bear that in mind (it was a bad name election when I started reversing)
|
||||
|
||||
<0x 31 bytes = book_title_PAR + 0x00 PAD if (book_title_PAR < 31) >
|
||||
<0x 00>
|
||||
<0x 00 00 00 00>
|
||||
...4
|
||||
...4
|
||||
<0x 00 00 00 00>
|
||||
<0x 00 00 00 00>
|
||||
<0x 00 00 00 00>
|
||||
<0x 00 00 00 00>
|
||||
BPAR
|
||||
MOBI
|
||||
<0x 4 bytes = Next free pointer identifier>
|
||||
// Note: pointer identifiers aren't always consecutive,
|
||||
// so this number is usually bigger than de # of index entries
|
||||
<0x 00 00>
|
||||
<0x 4 bytes = Number of index entries>
|
||||
<0x 4 bytes = Position of BPAR>
|
||||
<0x 00 00 00 00> // BPAR pointer identifier = 0x0
|
||||
|
||||
|
||||
// INDEXES:
|
||||
// Order of Indexes: from the beginning of this MBP file,
|
||||
// forward to the end of the file.
|
||||
// Nevertheless, see these comments for order relative to:
|
||||
// "BEGINING OF USER DATA": order of Data marks.
|
||||
// "FINAL GROUP OF MARKS": order of final marks.
|
||||
[for each {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK,
|
||||
AUTHOR,TITLE,CATEGORY,GENRE,ABSTRACT,COVER,PUBLISHER,
|
||||
...}
|
||||
|| "last DATA"]
|
||||
// Note: Pointer identifiers to DATA's assigned so the number
|
||||
// shrinks as the table grows down.
|
||||
[if NOTE || CORRECTION]
|
||||
<0x 4 bytes = Position of DATA....EBVS>
|
||||
<0x 4 bytes = Pointer identifier, used by BKMK blocks>
|
||||
[fi NOTE || CORRECTION]
|
||||
<0x 4 bytes = Position of DATA>
|
||||
<0x 4 bytes = Pointer identifier, used by BKMK blocks>
|
||||
[if NOTE || CORRECTION]
|
||||
<0x 4 bytes = Position of DATA>
|
||||
<0x 4 bytes = Pointer identifier, used by BKMK blocks>
|
||||
[fi NOTE || CORRECTION]
|
||||
[if MARK || DRAWING || BOOKMARK]
|
||||
<0x 4 bytes = Position of DATA....EBVS>
|
||||
<0x 4 bytes = Pointer identifier, used by BKMK blocks>
|
||||
[fi MARK || DRAWING || BOOKMARK]
|
||||
[if AUTHOR || TITLE || CATEGORY || GENRE || ABSTRACT || COVER || PUBLISHER]
|
||||
<0x 4 bytes = Position of [AUTH || TITL || CATE || GENR || ABST || COVE || PUBL] >
|
||||
<0x 4 bytes = Pointer identifier>
|
||||
[fi AUTHOR || TITLE || CATEGORY || GENRE || ABSTRACT || COVER || PUBLISHER]
|
||||
[if last DATA] // there's always a last piece of DATA (not user data?)
|
||||
<0x 4 bytes = Position of last DATA>
|
||||
<0x 4 bytes = Pointer identifier> // usually <0x 00 00 00 01>
|
||||
[fi last DATA]
|
||||
[next {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK,
|
||||
AUTHOR,TITLE,CATEGORY,GENRE,ABSTRACT,COVER,PUBLISHER,
|
||||
...}
|
||||
|| "last DATA"]
|
||||
|
||||
|
||||
[for each {NOTE,MARK,CORRECTION,DRAWING}]
|
||||
<0x 4 bytes = Position of BKMK>
|
||||
<0x 4 bytes = Pointer identifier>
|
||||
// Note: pointer identifiers for BKMK's are usually the minor
|
||||
// of all the identifiers associated to an annotation. All
|
||||
// other DATA references in INDEXES table associated to this
|
||||
// BKMK, have bigger pointer identifiers.
|
||||
// Note: Pointer identifiers to BKMK's assigned so the number
|
||||
// grows as the table grows down.
|
||||
[next {NOTE,MARK,CORRECTION,DRAWING}]
|
||||
|
||||
|
||||
<0x 2 bytes random PAD>
|
||||
BPAR
|
||||
<0x 4 bytes = size of BPAR block>
|
||||
<0x FF FF FF FF>
|
||||
...4 <-- 'position of last read' related
|
||||
...4 <-- 'position of last read' related
|
||||
...4
|
||||
<0x FF FF FF FF>
|
||||
...4
|
||||
...4
|
||||
...4 <-- 'position of last read' related
|
||||
...(rest of size of BPAR block, if bigger than 0x20)
|
||||
[if (size of BPAR block) mod 32 != 0]
|
||||
<0x FF FF FF FF>
|
||||
[fi]
|
||||
|
||||
// BEGINING OF USER DATA:
|
||||
// Order of {NOTE,MARK,CORRECTION,DRAWING} :
|
||||
// starts with user data at the end of the file,
|
||||
// going backwards to the begining of the file:
|
||||
//--------------------------------------------------------------------
|
||||
[for each {NOTE,MARK,CORRECTION,DRAWING}]
|
||||
//-------------------------------
|
||||
[if NOTE]
|
||||
DATA
|
||||
<0x 4 bytes = size of DATA block>
|
||||
[if EBAR] // this block can appear, or not... ???
|
||||
EBAR
|
||||
...various {4 x byte} ???
|
||||
[fi EBAR]
|
||||
EBVS
|
||||
<0x 00 00 00 03> ???
|
||||
<0x 4 bytes = IDENTIFIER> ???
|
||||
[<0x 00 00 00 01>, or nothing at all] ???
|
||||
<0x 00 00 00 08>
|
||||
<0x FF FF FF FF>
|
||||
<0x 00 00 00 00>
|
||||
<0x 00 00 00 10>
|
||||
...(rest of size of DATA block)
|
||||
<0x FD EA = PAD? (ýê)>
|
||||
DATA
|
||||
<0x 4 bytes = size of <marked text (see 3rd note)> >
|
||||
<marked text (see 3rd note)>
|
||||
[if (size of <marked text (see 3rd note)>) mod 4 !=0]
|
||||
<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
|
||||
[fi]
|
||||
DATA
|
||||
<0x 4 bytes = size of <note text (see 3rd note)> >
|
||||
<note text (see 3rd note)>
|
||||
[if (size of <note text (see 3rd note)>) mod 4 !=0]
|
||||
<0x random PAD until (size of <note text (see 3rd note)>) mod 4 ==0>
|
||||
[fi]
|
||||
[fi NOTE]
|
||||
//-------------------------------
|
||||
[if MARK || BOOKMARK]
|
||||
DATA
|
||||
<0x 4 bytes = size of <marked text (see 3rd note)> >
|
||||
<marked text (see 3rd note)>
|
||||
[if (size of <marked text (see 3rd note)>) mod 4 !=0]
|
||||
<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
|
||||
[fi]
|
||||
DATA
|
||||
<0x 4 bytes = size of DATA block>
|
||||
[if EBAR] // this block can appear, or not... ???
|
||||
EBAR
|
||||
...various {4 x byte} ???
|
||||
[fi EBAR]
|
||||
EBVS
|
||||
<0x 00 00 00 03> ???
|
||||
<0x 4 bytes = IDENTIFIER> ???
|
||||
[<0x 00 00 00 01>, or nothing at all] ???
|
||||
<0x 00 00 00 08>
|
||||
<0x FF FF FF FF>
|
||||
<0x 00 00 00 00>
|
||||
<0x 00 00 00 10>
|
||||
...(rest of size of DATA block)
|
||||
<0x FD EA = PAD? (ýê)>
|
||||
[fi MARK || BOOKMARK]
|
||||
//-------------------------------
|
||||
[if CORRECTION]
|
||||
DATA
|
||||
<0x 4 bytes = size of DATA block>
|
||||
[if EBAR] // this block can appear, or not... ???
|
||||
EBAR
|
||||
...various {4 x byte} ???
|
||||
[fi EBAR]
|
||||
EBVS
|
||||
<0x 00 00 00 03> ???
|
||||
<0x 4 bytes = IDENTIFIER> ???
|
||||
[<0x 00 00 00 01>, or nothing at all] ???
|
||||
<0x 00 00 00 08>
|
||||
<0x FF FF FF FF>
|
||||
<0x 00 00 00 00>
|
||||
<0x 00 00 00 10>
|
||||
...(rest of size of DATA block)
|
||||
<0x FD EA = PAD? (ýê)>
|
||||
DATA
|
||||
<0x 4 bytes = size of <marked text (see 3rd note)> >
|
||||
<marked text (see 3rd note)>
|
||||
[if (size of <marked text (see 3rd note)>) mod 4 !=0]
|
||||
<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
|
||||
[fi]
|
||||
DATA
|
||||
<0x 4 bytes = size of <note text (see 3rd note)> >
|
||||
<note text (see 3rd note)>
|
||||
[if (size of <note text (see 3rd note)>) mod 4 !=0]
|
||||
<0x random PAD until (size of <note text (see 3rd note)>) mod 4 ==0>
|
||||
[fi]
|
||||
[fi CORRECTION]
|
||||
//-------------------------------
|
||||
[if DRAWING]
|
||||
DATA
|
||||
<0x 4 bytes = size of raw data>
|
||||
ADQM
|
||||
// NOTE: bakground color is stored in corresponding BKMK.
|
||||
[begin DRAWING format]
|
||||
...4 = <0x 00 00 00 01> ???
|
||||
<0x 4 bytes = X POSITION OF UPPER LEFT CORNER??? >
|
||||
<0x 4 bytes = Y POSITION OF UPPER LEFT CORNER??? >
|
||||
<0x 4 bytes = X SIZE in pixels >
|
||||
<0x 4 bytes = Y SIZE in pixels >
|
||||
...4 = <0x 00 00 00 00> ???
|
||||
<0x 4 bytes = number of STROKES>
|
||||
[if "number of STROKES" == 0]
|
||||
<0x 00 00 00 00>
|
||||
[end DRAWING format]
|
||||
[fi]
|
||||
[for each STROKE]
|
||||
<0x 00 00 00 01> ???
|
||||
<0x 4 bytes> =
|
||||
Stroke's beginning position in list of coordinates.
|
||||
<0x 4 bytes> =
|
||||
Stroke's ending position in list of coordinates.
|
||||
<0x 00 RR GG BB> = RRGGBB color of stroke.
|
||||
[next STROKE]
|
||||
<0x 4 bytes> = number of coordinate pairs in array of coordinates.
|
||||
// NOTE: each stroke is formed out of at least three
|
||||
// coordinate pairs: begin, {next point}(1-n), end point.
|
||||
[for each COORDINATE]
|
||||
<0x 4 bytes> = X coordinate
|
||||
<0x 4 bytes> = Y coordinate
|
||||
[next COORDINATE]
|
||||
[end DRAWING format]
|
||||
[if (size of <marked text (see 3rd note)>) mod 4 !=0]
|
||||
<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
|
||||
[fi]
|
||||
DATA
|
||||
<0x 4 bytes = size of <marked text (see 3rd note)> >
|
||||
<marked text (see 3rd note)>
|
||||
[if (size of <marked text (see 3rd note)>) mod 4 !=0]
|
||||
<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
|
||||
[fi]
|
||||
DATA
|
||||
<0x 4 bytes = size of DATA block>
|
||||
[if EBAR] // this block can appear, or not... ???
|
||||
EBAR
|
||||
...various {4 x byte} ???
|
||||
[fi EBAR]
|
||||
EBVS
|
||||
<0x 00 00 00 03>
|
||||
<0x 4 bytes = IDENTIFIER>
|
||||
[<0x 00 00 00 01>, or nothing at all] ???
|
||||
<0x 00 00 00 08>
|
||||
<0x FF FF FF FF>
|
||||
<0x 00 00 00 00>
|
||||
<0x 00 00 00 10>
|
||||
...(size of DATA block - 30)
|
||||
<0x FD EA = PAD? (ýê)>
|
||||
[fi DRAWING]
|
||||
//-------------------------------
|
||||
[next {NOTE,MARK,CORRECTION,DRAWING}]
|
||||
|
||||
// AUTHOR (if any)
|
||||
//--------------------------------------------------------------------
|
||||
[if AUTHOR]
|
||||
AUTH
|
||||
<0x 4 bytes = size of AUTHOR block>
|
||||
<text (see 3rd note)>
|
||||
[fi AUTHOR]
|
||||
//--------------------------------------------------------------------
|
||||
// TITLE (if any)
|
||||
//--------------------------------------------------------------------
|
||||
[if TITLE]
|
||||
TITL
|
||||
<0x 4 bytes = size of TITLE block>
|
||||
<text (see 3rd note)>
|
||||
[fi TITLE]
|
||||
//--------------------------------------------------------------------
|
||||
// GENRE (if any)
|
||||
//--------------------------------------------------------------------
|
||||
[if GENRE]
|
||||
GENR
|
||||
<0x 4 bytes = size of GENRE block>
|
||||
<text (see 3rd note)>
|
||||
[fi GENRE]
|
||||
//--------------------------------------------------------------------
|
||||
// ABSTRACT (if any)
|
||||
//--------------------------------------------------------------------
|
||||
[if ABSTRACT]
|
||||
ABST
|
||||
<0x 4 bytes = size of ABSTRACT block>
|
||||
<text (see 3rd note)>
|
||||
[fi ABSTRACT]
|
||||
//--------------------------------------------------------------------
|
||||
|
||||
// FINAL DATA
|
||||
// Note: 'FINAL DATA' can occur anytime between these marks:
|
||||
// AUTHOR,TITLE,CATEGORY,GENRE,ABSTRACT,COVER,PUBLISHER,...
|
||||
//--------------------------------------------------------------------
|
||||
DATA
|
||||
<0x 4 bytes = size of EBVS block>
|
||||
[if EBAR] // this block can appear, or not... ???
|
||||
EBAR
|
||||
...various {4 x byte} ???
|
||||
[fi EBAR]
|
||||
EBVS
|
||||
<0x 00 00 00 03> || <0x 00 00 00 04>
|
||||
<0x 4 bytes || 8 bytes = IDENTIFIER>
|
||||
<0x 00 00 00 08>
|
||||
<0x FF FF FF FF>
|
||||
<0x 00 00 00 00>
|
||||
<0x 00 00 00 10>
|
||||
...(size of EBVS block - 30) :
|
||||
...4 <-- 'position of last read' related
|
||||
...various {4 x byte} ???
|
||||
...4 <-- 'position of last read' related
|
||||
...4
|
||||
...4
|
||||
...4
|
||||
<0x FD EA = PAD? (ýê)>
|
||||
//--------------------------------------------------------------------
|
||||
|
||||
// CATEGORY (if any)
|
||||
//--------------------------------------------------------------------
|
||||
[if CATEGORY]
|
||||
CATE
|
||||
<0x 4 bytes = size of CATEGORY block>
|
||||
<text (see 3rd note)>
|
||||
[fi CATEGORY]
|
||||
//--------------------------------------------------------------------
|
||||
// COVER (if any)
|
||||
//--------------------------------------------------------------------
|
||||
[if COVER]
|
||||
COVE
|
||||
<0x 4 bytes = size of COVER block>
|
||||
<text (see 3rd note)>
|
||||
[fi COVER]
|
||||
//--------------------------------------------------------------------
|
||||
// PUBLISHER (if any)
|
||||
//--------------------------------------------------------------------
|
||||
[if PUBLISHER]
|
||||
PUBL
|
||||
<0x 4 bytes = size of PUBLISHER block>
|
||||
<text (see 3rd note)>
|
||||
[fi PUBLISHER]
|
||||
//--------------------------------------------------------------------
|
||||
|
||||
|
||||
// FINAL GROUP OF MARKS
|
||||
// Order of {NOTE,MARK,CORRECTION} :
|
||||
// starts with user data at the begining of the file,
|
||||
// going forwards to the end:
|
||||
//--------------------------------------------------------------------
|
||||
[for each {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK}]
|
||||
BKMK
|
||||
<0x 4 bytes = size of BKMK>
|
||||
<0x 4 bytes = TEXT position of the beginning of {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK}>
|
||||
//-------------------------------
|
||||
[if DRAWING]
|
||||
<0x FF FF FF FF>
|
||||
[else]
|
||||
<0x 4 bytes = TEXT position of the end of {NOTE,MARK,CORRECTION,BOOKMARK}>
|
||||
[fi DRAWING]
|
||||
...4
|
||||
...4
|
||||
//-------------------------------
|
||||
[if NOTE]
|
||||
<0x xx xx xx (20)?>, xxxxxx=>RRGGBB color ???
|
||||
<0x 00 00 00 02>
|
||||
[fi NOTE]
|
||||
[if MARK]
|
||||
<0x xx xx xx (0F/00)??>, xxxxxx=>RRGGBB color ???
|
||||
<0x 00 00 00 04>
|
||||
[fi MARK]
|
||||
[if CORRECTION]
|
||||
<0x xx xx xx (6F)?>, xxxxxx=>RRGGBB color ???
|
||||
<0x 00 00 00 02>
|
||||
[fi CORRECTION]
|
||||
[if DRAWING]
|
||||
<0x xx xx xx (0F)?>, xxxxxx=>RRGGBB DRAWING's background color.
|
||||
<0x 00 00 00 08>
|
||||
[fi DRAWING]
|
||||
[if BOOKMARK]
|
||||
<0x xx xx xx 00>
|
||||
<0x 00 00 00 01>
|
||||
[fi BOOKMARK]
|
||||
// this one is a strange type of mark, of yet not identified use:
|
||||
[if UNKNOWN_TYPE_YET_1]
|
||||
<0x xx xx xx 00>
|
||||
<0x 00 00 40 00>
|
||||
[fi UNKNOWN_TYPE_YET_1]
|
||||
|
||||
//-------------------------------
|
||||
[if BOOKMARK || (NOTE "without stored marked text")]
|
||||
<0x FF FF FF FF>
|
||||
[else]
|
||||
<0x 4 bytes = DATA pointer in INDEXES>
|
||||
[fi BOOKMARK]
|
||||
[if DRAWING || MARK]
|
||||
<0x FF FF FF FF>
|
||||
[else]
|
||||
<0x 4 bytes = DATA pointer in INDEXES>
|
||||
[fi]
|
||||
<0x 4 bytes = DATA pointer in INDEXES>
|
||||
[if DRAWING]
|
||||
<0x 4 bytes = DATA pointer in INDEXES>
|
||||
[else]
|
||||
<0x FF FF FF FF>
|
||||
[fi]
|
||||
//-------------------------------
|
||||
<0x FF FF FF FF>
|
||||
<0x FF FF FF FF>
|
||||
[next {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK}]
|
||||
//--------------------------------------------------------------------
|
||||
|
||||
[if length % 32 bit != 0] ???
|
||||
<0x FF FF FF FF>
|
||||
[fi]
|
||||
|
||||
// END OF FILE
|
||||
|
||||
// by idleloop@yahoo.com, v0.2.e, 12/2009
|
||||
// http://www.angelfire.com/ego2/idleloop
|
341
format_docs/pdb/mobi.txt
Normal file
341
format_docs/pdb/mobi.txt
Normal file
@ -0,0 +1,341 @@
|
||||
from (http://wiki.mobileread.com/wiki/MOBI)
|
||||
|
||||
About
|
||||
-----
|
||||
|
||||
MOBI is the format used by the the MobiPocket Reader. It may have a .mobi
|
||||
extension or it may have a .prc extension. The extension can be changed by the
|
||||
user to either of the accepted forms. In either case it may be DRM protected or
|
||||
non-DRM. The .prc extension is used because the PalmOS doesn't support any file
|
||||
extensions except .prc or .pdb. Note that Mobipocket prohibits their DRM format
|
||||
to be used on dedicated eBook readers that support other DRM formats.
|
||||
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
MOBI format was originally an extension of the PalmDOC format by adding
|
||||
certain HTML like tags to the data. Many MOBI formatted documents still use
|
||||
this form. However there is also a high compression version of this file format
|
||||
that compresses data to a larger degree in a proprietary manner. There are some
|
||||
third party programs that can read the eBooks in the original MOBI format but
|
||||
there are only a few third party program that can read the eBooks in the new
|
||||
compressed form. The higher compression mode is using a huffman coding scheme
|
||||
that has been called the Huff/cdic algorithm.
|
||||
|
||||
From time to time features have been added to the format so new files may have
|
||||
problems if you try and read them with a down level reader. Currently the
|
||||
source files follow the guidelines in the Open eBook format.
|
||||
|
||||
Note that AZW for the Amazon Kindle is the same format as MOBI except that it
|
||||
uses a slightly different DRM scheme.
|
||||
|
||||
|
||||
Format
|
||||
------
|
||||
|
||||
Like PalmDOC, the Mobipocket file format is that of a standard Palm Database
|
||||
Format file. The header of that format includes the name of the database
|
||||
(usually the book title and sometimes a portion of the authors name) which is
|
||||
up to 31 bytes of data. The files are identified as Creator ID of MOBI and a
|
||||
Type of BOOK.
|
||||
|
||||
|
||||
PalmDOC Header
|
||||
--------------
|
||||
|
||||
The first record in the Palm Database Format gives more information about the
|
||||
Mobipocket file. The first 16 bytes are almost identical to the first sixteen
|
||||
bytes of a PalmDOC format file.
|
||||
|
||||
bytes content comments
|
||||
2 Compression 1 == no compression, 2 = PalmDOC compression,
|
||||
17480 = HUFF/CDIC compression.
|
||||
2 Unused Always zero
|
||||
4 text length Uncompressed length of the entire text of the book
|
||||
2 record count Number of PDB records used for the text of the book.
|
||||
2 record size Maximum size of each record containing text, always
|
||||
4096.
|
||||
4 Current Position Current reading position, as an offset into the
|
||||
uncompressed text
|
||||
|
||||
There are two differences from a Palm DOC file. There's an additional
|
||||
compression type (17480), and the Current Position bytes are used for a
|
||||
different purpose:
|
||||
|
||||
bytes content comments
|
||||
2 Encryption Type 0 == no encryption, 1 = Old Mobipocket Encryption,
|
||||
2 = Mobipocket Encryption.
|
||||
2 Unknown Usually zero
|
||||
|
||||
The old Mobipocket Encryption scheme only allows the file to be registered
|
||||
with one PID, unlike the current encryption scheme that allows multiple PIDs to
|
||||
be used in a single file. Unless specifically mentioned, all the encryption
|
||||
information on this page refers to the current scheme.
|
||||
|
||||
|
||||
MOBI Header
|
||||
-----------
|
||||
|
||||
Most Mobipocket file also have a MOBI header in record 0 that follows these
|
||||
16 bytes, and newer formats also have an EXTH header following the MOBI header,
|
||||
again all in record 0 of the PDB file format.
|
||||
|
||||
The MOBI header is of variable length and is not documented. Some fields have
|
||||
been tentatively identified as follows:
|
||||
|
||||
offset bytes content comments
|
||||
16 4 identifier The characters M O B I
|
||||
20 4 header length The length of the MOBI header, including
|
||||
the previous 4 bytes
|
||||
24 4 Mobi type The kind of Mobipocket file this is
|
||||
2 Mobipocket Book
|
||||
3 PalmDoc Book
|
||||
4 Audio
|
||||
257 News
|
||||
258 News_Feed
|
||||
259 News_Magazine
|
||||
513 PICS
|
||||
514 WORD
|
||||
515 XLS
|
||||
516 PPT
|
||||
517 TEXT
|
||||
518 HTML
|
||||
28 4 text Encoding 1252 = CP1252 (WinLatin1); 65001 = UTF-8
|
||||
32 4 Unique-ID Some kind of unique ID number (random?)
|
||||
36 4 Generator version Potentially the version of the
|
||||
Mobipocket-generation tool. Always >=
|
||||
the value of the "format version" field
|
||||
and <= the version of mobigen used to
|
||||
produce the file.
|
||||
40 40 Reserved All 0xFF. In case of a dictionary, or
|
||||
some newer file formats, a few bytes are
|
||||
used from this range of 40 0xFFs
|
||||
80 4 First Non-book index? First record number (starting with 0)
|
||||
that's not the book's text
|
||||
84 4 Full Name Offset Offset in record 0 (not from start of
|
||||
file) of the full name of the book
|
||||
88 4 Full Name Length Length in bytes of the full name of the
|
||||
book
|
||||
92 4 Language Book language code. Low byte is main
|
||||
language 09= English, next byte is
|
||||
dialect, 08 = British, 04 = US
|
||||
96 4 Input Language Input language for a dictionary
|
||||
100 4 Output Language Output language for a dictionary
|
||||
104 4 Format version Potentially the version of the
|
||||
Mobipocket format used in this file.
|
||||
Always >= 1 and <= the value of the
|
||||
"generator version" field.
|
||||
108 4 First Image record First record number (starting with 0)
|
||||
that contains an image. Image records
|
||||
should be sequential. If there are
|
||||
no images this will be 0xffffffff.
|
||||
112 4 HUFF record Record containing Huff information
|
||||
used in HUFF/CDIC decompression.
|
||||
116 4 HUFF count Number of Huff records.
|
||||
122 4 DATP record Unknown: Records starts with DATP.
|
||||
124 4 DATP count Number of DATP records.
|
||||
128 4 EXTH flags Bitfield. if bit 6, 0x40 is set, then
|
||||
there's an EXTH record
|
||||
The following records are only present if the mobi header is long enough.
|
||||
132 36 ? 32 unknown bytes, if MOBI is long enough
|
||||
168 4 DRM Offset Offset to DRM key info in DRMed files.
|
||||
0xFFFFFFFF if no DRM
|
||||
172 4 DRM Count Number of entries in DRM info.
|
||||
174 4 DRM Size Number of bytes in DRM info.
|
||||
176 4 DRM Flags Some flags concerning the DRM info.
|
||||
180 6 ?
|
||||
186 2 Last Image record Possible vaule with the last image
|
||||
record. If there are no images in the
|
||||
book this will be 0xffff.
|
||||
188 4 ?
|
||||
192 4 FCIS record Unknown. Record starts with FCIS.
|
||||
196 4 ?
|
||||
200 4 FLIS record Unknown. Records starts with FLIS.
|
||||
204 ? ? Bytes to the end of the MOBI header,
|
||||
including the following if the header
|
||||
length >= 228. ( 244 from start of
|
||||
record)
|
||||
242 2 Extra Data Flags A set of binary flags, some of which
|
||||
indicate extra data at the end of each
|
||||
text block. This only seems to be valid
|
||||
for Mobipocket format version 5 and 6
|
||||
(and higher?), when the header length
|
||||
is 228 (0xE4) or 232 (0xE8).
|
||||
|
||||
|
||||
EXTH Header
|
||||
-----------
|
||||
|
||||
If the MOBI header indicates that there's an EXTH header, it follows immediately
|
||||
after the MOBI header. since the MOBI header is of variable length, this isn't
|
||||
at any fixed offset in record 0. Note that some readers will ignore any EXTH
|
||||
header info if the mobipocket version number specified in the MOBI header is 2
|
||||
or less (perhaps 3 or less).
|
||||
|
||||
The EXTH header is also undocumented, so some of this is guesswork.
|
||||
|
||||
bytes content comments
|
||||
4 identifier the characters E X T H
|
||||
4 header length the length of the EXTH header, including the previous 4 bytes
|
||||
4 record Count The number of records in the EXTH header. the rest of the EXTH header consists of repeated EXTH records to the end of the EXTH length.
|
||||
EXTH record start Repeat until done.
|
||||
4 record type Exth Record type. Just a number identifying what's stored in the record
|
||||
4 record length length of EXTH record = L , including the 8 bytes in the type and length fields
|
||||
L-8 record data Data.
|
||||
EXTH record end Repeat until done.
|
||||
|
||||
There are lots of different EXTH Records types. Ones found so far in Mobipocket
|
||||
files are listed here, with possible meanings. Hopefully the table will be
|
||||
filled in as more information comes to light.
|
||||
|
||||
record type usual length name comments
|
||||
1 drm_server_id
|
||||
2 drm_commerce_id
|
||||
3 drm_ebookbase_book_id
|
||||
100 author
|
||||
101 publisher
|
||||
102 imprint
|
||||
103 description
|
||||
104 isbn
|
||||
105 subject
|
||||
106 publishingdate
|
||||
107 review
|
||||
108 contributor
|
||||
109 rights
|
||||
110 subjectcode
|
||||
111 type
|
||||
112 source
|
||||
113 asin
|
||||
114 versionnumber
|
||||
115 sample
|
||||
116 startreading
|
||||
118 retail price (as text)
|
||||
119 retail price currency (as text)
|
||||
201 coveroffset
|
||||
202 thumboffset
|
||||
203 hasfakecover
|
||||
204 204 Unknown
|
||||
205 205 Unknown
|
||||
206 206 Unknown
|
||||
207 207 Unknown
|
||||
208 208 Unknown
|
||||
300 300 Unknown
|
||||
401 clippinglimit
|
||||
402 publisherlimit
|
||||
403 403 Unknown
|
||||
404 404 ttsflag
|
||||
501 4 cdetype PDOC - Personal Doc;
|
||||
EBOK - ebook;
|
||||
502 lastupdatetime
|
||||
503 updatedtitle
|
||||
|
||||
And now, at the end of Record 0 of the PDB file format, we usually get the full
|
||||
file name, the offset of which is given in the MOBI header.
|
||||
|
||||
|
||||
Variable-width integers
|
||||
-----------------------
|
||||
|
||||
Some parts of the Mobipocket format encode data as variable-width integers.
|
||||
These integers are represented big-endian with 7 bits per byte in bits 1-7. They
|
||||
may be either forward-encoded, in which case only the LSB has bit 8 set, or
|
||||
backward-encoded, in which case only the MSB has bit 8 set. For example, the
|
||||
number 0x11111 would be represented forward-encoded as:
|
||||
|
||||
0x04 0x22 0x91
|
||||
|
||||
And backward-encoded as:
|
||||
|
||||
0x84 0x22 0x11
|
||||
|
||||
|
||||
Trailing entries
|
||||
----------------
|
||||
|
||||
The Extra Data Flags field of the MOBI header indicates which, if any, trailing
|
||||
entries are appended to the end of each text record. Each set bit in the field
|
||||
indicates a trailing entry. The entries appear to occur in bit-order; e.g.,
|
||||
trailing entry 1 immediately follows the text content and entry 16 occurs at
|
||||
the very end of the record. The effect and exact details of most of these
|
||||
entries is unknown. The trailing entries indicated by bits 2-16 appear to
|
||||
follow a common format. That format is:
|
||||
|
||||
<data><size>
|
||||
|
||||
Where <size> is the size of the entire trailing entry (including the size of
|
||||
<size>) as a backward-encoded Mobipocket variable-width integer.
|
||||
|
||||
Only a few bits have been identified
|
||||
|
||||
bit Data at end of records
|
||||
0x0001 Multi-byte character overlaps
|
||||
0x0002 Some data to help with indexing
|
||||
0x0004 Some data about uncrossable breaks
|
||||
|
||||
|
||||
Multibyte character overlap
|
||||
---------------------------
|
||||
|
||||
When bit 1 of the Extra Data Flags field is set, each record is followed by a
|
||||
trailing entry containing any extra bytes necessary to complete a multibyte
|
||||
character which crosses the record boundary. The bytes do not participate in
|
||||
compression regardless which compression scheme is used for the file. However,
|
||||
unlike the trailing data bytes, the multibytes (including the count byte) do
|
||||
get included in any encryption. The overlapping bytes then re-appear as normal
|
||||
content at the beginning of the following record. The trailing entry ends with
|
||||
a byte containing a count of the overlapping bytes plus additional flags.
|
||||
|
||||
offset bytes content comments
|
||||
0 0-3 N terminal bytes
|
||||
of a multibyte
|
||||
character
|
||||
N 1 Size & flags bits 1-2 encode N, use of bits 3-8 is unknown
|
||||
|
||||
|
||||
PalmDOC Compression
|
||||
-------------------
|
||||
|
||||
PalmDOC uses LZ77 compression techniques. DOC files can contain only compressed
|
||||
text. The format does not allow for any text formatting. This keeps files small,
|
||||
in keeping with the Palm philosophy. However, extensions to the format can use
|
||||
tags, such as HTML or PML, to include formatting within text. These extensions
|
||||
to PalmDoc are not interchangeable and are the basis for most eBook Reader
|
||||
formats on Palm devices.
|
||||
|
||||
LZ77 algorithms achieve compression by replacing portions of the data with
|
||||
references to matching data that has already passed through both encoder and
|
||||
decoder. A match is encoded by a pair of numbers called a length-distance pair,
|
||||
which is equivalent to the statement "each of the next length characters is
|
||||
equal to the character exactly distance characters behind it in the uncompressed
|
||||
stream." (The "distance" is sometimes called the "offset" instead.)
|
||||
|
||||
In the PalmDoc format, a length-distance pair is always encoded by a two-byte
|
||||
sequence. Of the 16 bits that make up these two bytes, 11 bits go to encoding
|
||||
the distance, 3 go to encoding the length, and the remaining two are used to
|
||||
make sure the decoder can identify the first byte as the beginning of such a
|
||||
two-byte sequence. The exact alforithm needed to decode the compressed text can
|
||||
be found on the PalmDOC page.
|
||||
|
||||
PalmDOC data is always divided into 4096 byte blocks and the blocks are acted
|
||||
upon independently.
|
||||
|
||||
PalmDOC does have support for bookmarks. These pointers are named and refer to
|
||||
an offset location in a file. If the file is edited these locations may no
|
||||
longer refer to the correct locations. Some reading programs allow the user to
|
||||
enter or edit these bookmarks while others treat them as a TOC. Some reading
|
||||
programs may ignore them entirely. They are stored at the end of the file itself
|
||||
so the full file needs to be scanned when loaded to find them.
|
||||
|
||||
|
||||
MBP
|
||||
---
|
||||
|
||||
This is the extension used on a side file (auxiliary) for MOBI formatted eBooks.
|
||||
It is used to store metadata used by the library software and also to store
|
||||
user entered data like bookmarks, annotations, last read position. This file is
|
||||
created automatically by the reader program when the eBook is first opened and
|
||||
has a .mbp extension. The Library management software in MobiPocket uses this
|
||||
file to get information displayed in the library window such as title and author
|
||||
so that it won't have to open the larger eBook file.
|
||||
|
25
format_docs/pdb/palmdoc.txt
Normal file
25
format_docs/pdb/palmdoc.txt
Normal file
@ -0,0 +1,25 @@
|
||||
PalmDoc Format
|
||||
--------------
|
||||
|
||||
The format is that of a standard Palm Database Format file. The header of that
|
||||
format includes the name of the database (usually the book title and sometimes
|
||||
a portion of the authors name) which is up to 31 bytes of data. This string of
|
||||
characters is terminated with a 0 in the C style. The files are identified as
|
||||
Creator ID of REAd and a Type of TEXt.
|
||||
|
||||
|
||||
Record 0
|
||||
--------
|
||||
|
||||
The first record in the Palm Database Format gives more information about the
|
||||
PalmDOC file, and contains 16 bytes.
|
||||
|
||||
bytes content comments
|
||||
|
||||
2 Compression 1 == no compression, 2 = PalmDOC compression (see below)
|
||||
2 Unused Always zero
|
||||
4 text length Uncompressed length of the entire text of the book
|
||||
2 record count Number of PDB records used for the text of the book.
|
||||
2 record size Maximum size of each record containing text, always 4096
|
||||
4 Current Position Current reading position, as an offset into the uncompressed text
|
||||
|
104
format_docs/pdb/pdb_format.txt
Normal file
104
format_docs/pdb/pdb_format.txt
Normal file
@ -0,0 +1,104 @@
|
||||
Format
|
||||
------
|
||||
|
||||
A PDB file can be borken into multiple parts. The header, record 0 and data.
|
||||
values stored within the various parts are big-endian byte order. The data
|
||||
part is is broken down into multiple sections. The section count and offsets
|
||||
are referened in the PDB header. Sections can be no more than 65505 bytes in
|
||||
length.
|
||||
|
||||
|
||||
Layout
|
||||
------
|
||||
|
||||
PDB files take the format: DB header followed by the record 0 which has
|
||||
contained format specific iformation followed by data.
|
||||
|
||||
DB Header
|
||||
0 Record 0
|
||||
.
|
||||
. Data (borken down into sections)
|
||||
.
|
||||
|
||||
|
||||
Palm Database Header Format
|
||||
|
||||
bytes content comments
|
||||
|
||||
32 name database name. This name is 0 terminated in the
|
||||
field and will be used as the file name on a
|
||||
computer. For eBooks this usually contains the
|
||||
title and may have the author depending on the
|
||||
length available.
|
||||
|
||||
2 attributes bit field.
|
||||
0x0002 Read-Only
|
||||
0x0004 Dirty AppInfoArea
|
||||
0x0008 Backup this database (i.e. no conduit exists)
|
||||
0x0010 (16 decimal) Okay to install newer over
|
||||
existing copy, if present on PalmPilot
|
||||
0x0020 (32 decimal) Force the PalmPilot to reset
|
||||
after this database is installed
|
||||
0x0040 (64 decimal) Don't allow copy of file to be
|
||||
beamed to other Pilot.
|
||||
|
||||
2 version file version
|
||||
|
||||
4 creation date No. of seconds since start of January 1, 1904.
|
||||
|
||||
4 modification date No. of seconds since start of January 1, 1904.
|
||||
|
||||
4 last backup date No. of seconds since start of January 1, 1904.
|
||||
|
||||
4 modificationNumber
|
||||
|
||||
4 appInfoID offset to start of Application Info (if present)
|
||||
or null
|
||||
|
||||
4 sortInfoID offset to start of Sort Info (if present) or null
|
||||
|
||||
4 type See above table. (For Applications this data will
|
||||
be 'appl')
|
||||
|
||||
4 creator See above table. This program will be launched if
|
||||
the file is tapped
|
||||
|
||||
4 uniqueIDseed used internally to identify record
|
||||
|
||||
4 nextRecordListID Only used when in-memory on Palm OS. Always set to
|
||||
zero in stored files.
|
||||
|
||||
2 number of Records number of records in the file - N
|
||||
|
||||
8N record Info List
|
||||
|
||||
start of record
|
||||
info entry Repeat N times to end of record info entry
|
||||
|
||||
4 record Data Offset the offset from the start of the PDB of this record
|
||||
|
||||
1 record Attributes bit field. The least significant four bits are used
|
||||
to represent the category values. These are the
|
||||
categories used to split the databases for viewing
|
||||
on the screen. A few of the 16 categories are
|
||||
pre-defined but the user can add their own. There
|
||||
is an undefined category for use if the user or
|
||||
programmer hasn't set this.
|
||||
0x10 (16 decimal) Secret record bit.
|
||||
0x20 (32 decimal) Record in use (busy bit).
|
||||
0x40 (64 decimal) Dirty record bit.
|
||||
0x80 (128, unsigned decimal) Delete record on
|
||||
next HotSync.
|
||||
|
||||
3 UniqueID The unique ID for this record. Often just a
|
||||
sequential count from 0
|
||||
|
||||
end of record
|
||||
info entry
|
||||
|
||||
2? Gap to data traditionally 2 zero bytes to Info or raw data
|
||||
|
||||
? Records The actual data in the file. AppInfoArea (if
|
||||
present), SortInfoArea (if present) and then
|
||||
records sequentially
|
||||
|
34
format_docs/pdb/pdb_types.txt
Normal file
34
format_docs/pdb/pdb_types.txt
Normal file
@ -0,0 +1,34 @@
|
||||
Palm Database File Code
|
||||
-----------------------
|
||||
|
||||
Reader Type Code
|
||||
|
||||
Adobe Reader .pdfADBE
|
||||
PalmDOC TEXtREAd
|
||||
BDicty BVokBDIC
|
||||
DB (Database program) DB99DBOS
|
||||
eReader PNRdPPrs
|
||||
eReader DataPPrs
|
||||
FireViewer (ImageViewer) vIMGView
|
||||
HanDBase PmDBPmDB
|
||||
InfoView InfoINDB
|
||||
iSilo ToGoToGo
|
||||
iSilo 3 SDocSilX
|
||||
JFile JbDbJBas
|
||||
JFile Pro JfDbJFil
|
||||
LIST DATALSdb
|
||||
MobileDB Mdb1Mdb1
|
||||
MobiPocket BOOKMOBI
|
||||
Plucker DataPlkr
|
||||
QuickSheet DataSprd
|
||||
SuperMemo SM01SMem
|
||||
TealDoc TEXtTlDc
|
||||
TealInfo InfoTlIf
|
||||
TealMeal DataTlMl
|
||||
TealPaint DataTlPt
|
||||
ThinkDB dataTDBP
|
||||
Tides TdatTide
|
||||
TomeRaider ToRaTRPW
|
||||
Weasel zTXTGPlm
|
||||
WordSmith BDOCWrdS
|
||||
|
2122
format_docs/pdb/plucker.html
Normal file
2122
format_docs/pdb/plucker.html
Normal file
File diff suppressed because it is too large
Load Diff
936
format_docs/pdb/pml.txt
Normal file
936
format_docs/pdb/pml.txt
Normal file
@ -0,0 +1,936 @@
|
||||
Palm Markup Language
|
||||
--------------------
|
||||
|
||||
This page explains how to use the Palm Markup Language (PML) to specify
|
||||
formatting and other information in a text file for later reading using the
|
||||
eReader.
|
||||
|
||||
PML commands start with a backslash, "\", and usually consist of a single
|
||||
character after that. Some PML commands are paired, such as those that specify
|
||||
italicized text. Other commands are directives, such as the "\p", which
|
||||
specifies a page break. PML is not meant to be an industrial-strength markup
|
||||
language, but it is easy to understand, easy to parse, and creates high-quality
|
||||
electronic books.
|
||||
|
||||
Since PML and Palm DropBook are not without flaws, there is a page of Tips and
|
||||
Pitfalls.
|
||||
|
||||
|
||||
Let's Dive Right In
|
||||
-------------------
|
||||
|
||||
palmsample.txt contains examples of formatting text, specifying chapters, etc.
|
||||
Use it to start from, or just as an example when making your own books.
|
||||
|
||||
The following table specifies the Palm Markup Language commands, and what
|
||||
they do.
|
||||
|
||||
\p New page
|
||||
\x New chapter; also causes a new page break.
|
||||
Enclose chapter title (and any style codes)
|
||||
with \x and \x
|
||||
\Xn New chapter, indented n levels (n between 0 and
|
||||
4 inclusive) in the Chapter dialog; doesn't
|
||||
cause a page break. Enclose chapter title (and
|
||||
any style codes) with \Xn and \Xn
|
||||
\Cn="Chapter title" Insert "Chapter title" into the chapter
|
||||
listing, with level n (like \Xn). The text is
|
||||
not shown on the page and does not force a page
|
||||
break. This can sometimes be useful to insert a
|
||||
chapter mark at the beginning of an
|
||||
introduction to the chapter, for example.
|
||||
\c Center this block of text; close with \c on
|
||||
beginning of line
|
||||
\r Right justify text block; close with \r on
|
||||
beginning of line
|
||||
\i Italicize block; close with \i
|
||||
\u Underline block; close with \u
|
||||
\o Overstrike block; close with \o
|
||||
\v Invisible text; close with \v (can be used for
|
||||
comments)
|
||||
\t Indent block. Start at beginning of a line,
|
||||
close with \t at end of a line
|
||||
\T="50%" Indents the specified percentage of the screen
|
||||
width, 50% in this case. If the current drawing
|
||||
position is already past the specified screen
|
||||
location, this tag is ignored.
|
||||
\w="50%" Embed a horizontal rule of a given percentage
|
||||
width of the screen, in this case 50%. This tag
|
||||
causes a line break before and after it. The
|
||||
rule is centered. The percent sign is mandatory.
|
||||
\n Switch to the "normal" font, which is specified
|
||||
by the user
|
||||
\s Switch to stdFont; close with \s to revert to
|
||||
normal font
|
||||
\b Switch to boldFont; close with \b to revert to
|
||||
normal font (deprecated; use \B instead)
|
||||
\l Switch to largeFont; close with \l to revert to
|
||||
normal font
|
||||
\B Mark text as bold. Unlike the \b tag, \B
|
||||
doesn't change the font, so you can have large
|
||||
bold text. You cannot mix \b and \B in the same
|
||||
PML file.
|
||||
\Sp Mark text as superscript. Should not be mixed
|
||||
with other styles such as bold, italic, etc.
|
||||
Enclose superscripted text with \Sp.
|
||||
\Sb Mark text as subscript. Should not be mixed
|
||||
with other styles such as bold, italic, etc.
|
||||
Enclose subscripted text with \Sb.
|
||||
\k Make enclosed text into small-caps; close with
|
||||
\k. Any characters enclosed in \k tags
|
||||
(including those with accents) are made
|
||||
uppercase and are rendered at a smaller point
|
||||
size than a regular uppercase character.
|
||||
\\ Represents a single backslash
|
||||
\aXXX Insert non-ASCII character whose Windows 1252
|
||||
code is decimal XXX. See the PML character
|
||||
table for details.
|
||||
\UXXXX Insert non-ASCII character whose Unicode code
|
||||
is hexidecimal XXXX. See the Extended PML
|
||||
character table for details.
|
||||
\m="imagename.png" Insert the named image. See the section on
|
||||
Images below.
|
||||
\q="#linkanchor"Some text\q Reference a link anchor which is at another
|
||||
spot in the document. The string after the
|
||||
anchor specification and before the trailing\q
|
||||
is underlined or otherwise shown to be a link
|
||||
when viewing the document.
|
||||
\Q="linkanchor" Specify a link anchor in the document.
|
||||
\- Insert a soft hyphen. A soft hyphen shows up
|
||||
only if it is necessary to break a word across
|
||||
a line.
|
||||
\Fn="footnote1"1\Fn Link the "1" to a footnote whose name is
|
||||
footnote1, tagged at the end of the PML
|
||||
document. See the section on Footnotes and
|
||||
Sidebars below.
|
||||
\Sd="sidebar1"Sidebar\Sd Link the "Sidebar" text to a sidebar whose name
|
||||
is sidebar1, tagged at the end of the PML
|
||||
document. See the section on Footnotes and
|
||||
Sidebars below.
|
||||
\I Mark as a reference index item. Enclose index
|
||||
item (and any style codes) with \I and \I. See
|
||||
Creating Dictionaries for more information.
|
||||
|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
\pThis is a new page
|
||||
|
||||
\xChapter III\x
|
||||
|
||||
\X1Chapter III, part A\X1
|
||||
|
||||
\p\C="Introduction"The following story is one of my favorites...
|
||||
|
||||
\cProperty of
|
||||
Gateway Senior High School
|
||||
\c
|
||||
|
||||
\rJustify my love
|
||||
\r
|
||||
|
||||
This stuff is \ireally\i cool.
|
||||
|
||||
I just read \uMoby Dick.\u
|
||||
|
||||
This is a \obig\o mistake.
|
||||
|
||||
Copyright 1917\v Date of magazine serialization \v
|
||||
|
||||
\tOnce upon a time
|
||||
there was a wicked queen
|
||||
called Esmerelda.\t
|
||||
|
||||
Mammals:\T="40%"Lions
|
||||
\T="40%"Tigers
|
||||
\T="40%"Bears
|
||||
|
||||
He walked away.
|
||||
\w="80%"
|
||||
Later that day, he ran into an old friend.
|
||||
|
||||
\nIn the normal ways...
|
||||
|
||||
The \stitle page\s should be formatted...
|
||||
|
||||
I just \bcan't\b believe that you...
|
||||
|
||||
This \lREALLY\l is a large tiger...
|
||||
|
||||
This \Bbold\B text can be either \l\Blarge bold\B\l or \s\Bsmall bold\B\s.
|
||||
|
||||
e\Spx + 2\Sp = 9
|
||||
|
||||
C\Sb2\SbH\Sb3\SbO\Sb2\Sb should be used in moderation.
|
||||
|
||||
See also \kanteater\k.
|
||||
|
||||
The DOS prompt said "C:\\windows\\"
|
||||
|
||||
The man said \a147Yeah.\a148
|
||||
|
||||
Arrows can point \U2190 left or right \U2192.
|
||||
|
||||
A Yield sign looks like this: \m="yieldsign.png".
|
||||
|
||||
See the \q="#detailedinstructions"Detailed Instructions\q for how to install your eBook.
|
||||
|
||||
\Q="detailedinstructions"\bDetailed Instructions\b - This section
|
||||
describes how to install an eBook to your handheld device.
|
||||
|
||||
Very long words like anti\-dis\-establish\-ment\-arian\-ism may benefit from
|
||||
the use of soft hyphens.
|
||||
|
||||
The Emerson case\Fn="emerson"[1]\Fn will be very important...
|
||||
|
||||
For more information, see the \Sd="moreinfo"sidebar\Sd.
|
||||
|
||||
\I\Baardvark\B\I \in.\i a large burrowing nocturnal mammal that feeds especially on termites and ants
|
||||
|
||||
|
||||
Footnotes and Sidebars
|
||||
----------------------
|
||||
|
||||
Footnotes and Sidebars are specified with an XML-like syntax at the end of the
|
||||
PML document. For example,
|
||||
|
||||
<sidebar id="sidebar1">
|
||||
Here's some \itext\i for a sidebar.
|
||||
</sidebar>
|
||||
|
||||
would specify the sidebar to be displayed when the user taps on a sidebar link
|
||||
in the text that was specified using the \Sd tag.
|
||||
|
||||
Any text or PML placed after the first footnote or sidebar is ignored as part
|
||||
of the book text.
|
||||
|
||||
Sidebars and footnotes can include most PML features, but there are some PML
|
||||
tags that cannot be used inside of a sidebar or footnote.
|
||||
|
||||
These include
|
||||
Chapters \x, \X, \C
|
||||
Links \q, \Q
|
||||
Footnotes \Fn
|
||||
Sidebars \Sd
|
||||
|
||||
See the palmsample.txt file for examples of how to use many of the PML tags.
|
||||
|
||||
|
||||
Images
|
||||
------
|
||||
|
||||
The following rules are intended to guarantee that images in your eBook will be
|
||||
viewable on all platforms that eReader runs on.
|
||||
|
||||
On low-resolution Palm OS handhelds, an image wider than 158 pixels or taller
|
||||
than 148 pixels will be represented in the text by a thumbnail that the user
|
||||
can tap to view the entire image. Images smaller than 158 x 148 will be
|
||||
presented in-line with the text.
|
||||
|
||||
On high-resolution Palm OS handhelds (those having screens of 320x320 pixels or
|
||||
more), images smaller than 158 by 148 pixels will be pixel-doubled. Images
|
||||
larger than 158x148 may be shown in-line with the text, if they will fit on
|
||||
the screen.
|
||||
|
||||
On non-Palm OS platforms, small images will be scaled up appropriately. Large
|
||||
images will be scaled down to fit on the page; in this case the user can tap on
|
||||
the image to view the entire image and zoom in or out.
|
||||
|
||||
For DropBook to find the image, it must be present in a directory whose name
|
||||
matches that of the PML text file. For example, if "pmlsample.txt" contains a
|
||||
reference to an image called "intro.png", then there must be a directory called
|
||||
"pmlsample_img" that contains intro.png. The directory's name is the name of
|
||||
the PML file (without the .txt extension) with "_img" appended.
|
||||
|
||||
Images must be in PNG format and cannot be filtered or interlaced. Image depth
|
||||
must be 8 bits or less. Any color table may be used for color images.
|
||||
|
||||
Image files must be less than or equal to 65505 bytes in size, since they are
|
||||
embedded into the .pdb format of the book; Palm database records are limited to
|
||||
65505 bytes in length. Since images are compressed, the actual image displayed
|
||||
by the reader may be much larger than 64K.
|
||||
|
||||
Any or all of these restrictions may eventually be removed.
|
||||
|
||||
|
||||
Adding a Title, Cover Art, and Other Meta-information to Your eBook
|
||||
-------------------------------------------------------------------
|
||||
|
||||
DropBook normally presents a dialog in which the title and other information
|
||||
for the eBook may be specified. This information may be embedded in the PML
|
||||
file instead.
|
||||
|
||||
To specify the eBook title as it will appear in the Open dialog on the
|
||||
handheld, place a block of invisible comment text at the beginning of the file
|
||||
using \v tags. Inside this comment block, put the string TITLE="My eBook",
|
||||
where "My eBook" is replaced with the name of your eBook. It should look
|
||||
something like this:
|
||||
|
||||
\vTITLE="Palm Sample Document"\v
|
||||
|
||||
You can also specify the author using the AUTHOR meta-tag, the publisher with
|
||||
PUBLISHER, copyright information with COPYRIGHT, and the eBook ISBN with EISBN.
|
||||
A fully-specified set of meta-information might appear in PML as:
|
||||
|
||||
\vTITLE="Palm Sample Document" AUTHOR="Sam Morgenstern" PUBLISHER="eReader.com"
|
||||
EISBN="X-XXXX-XXXX" COPYRIGHT="Copyright \a169 2004 by Sam Morgenstern"\v
|
||||
|
||||
Cover art: If an image named "cover.png" is present in the eBook, it is assumed
|
||||
to be the cover art for the eBook. See the rules for images for sizing and
|
||||
other information.
|
||||
|
||||
Some or all of this information may appear in the book information dialog in
|
||||
eReader, and may be used for other purposes in future products.
|
||||
|
||||
|
||||
Creating Dictionaries
|
||||
---------------------
|
||||
|
||||
The \I PML tag is used to delimit an index item. Example: \Iaardvark\I
|
||||
|
||||
Each entry must start in the normal font. If DropBook shows an error beginning
|
||||
with "No styles permitted before...", there is probably a missing end style tag
|
||||
before the text shown in the error message.
|
||||
|
||||
Links, chapters and other PML structures are not permitted in dictionaries.
|
||||
Images, however, are.
|
||||
|
||||
A special dictionary entry, "(Front matter)" is shown before other entries in
|
||||
the list of entries, and should be used to include pronunciation symbols and
|
||||
other front matter.
|
||||
|
||||
Note that use of dictionaries requires eReader Pro.
|
||||
|
||||
|
||||
Tips and Pitfalls
|
||||
-----------------
|
||||
|
||||
This page explains some common mistakes, some bugs in DropBook and/or the
|
||||
eReader, and some techniques that will allow you to create quality electronic
|
||||
books for the eReader.
|
||||
|
||||
* Check out the Converting to Palm eBooks page for some pointers on
|
||||
converting text from various formats into the Palm Markup Language.
|
||||
* Use a return at the end of each paragraph, not each line.
|
||||
* Using an extra return between paragraphs reads easier than paragraph
|
||||
indentation.
|
||||
* The eReader doesn't display empty lines at the top of a page. If you need
|
||||
to have some "empty" lines at the top of a page, put a space on each line.
|
||||
* Don't use tables if you can possibly avoid it.
|
||||
|
||||
None of the fonts that the eReader supports are monospaced, so tables can
|
||||
be difficult to represent. Break out the information in another way, or
|
||||
use the \T tag, but beware of tables that look great on a Palm OS
|
||||
handheld but not on a Pocket PC or vice versa.
|
||||
|
||||
* The Reader breaks lines on spaces, dashes or underscores. This has
|
||||
several implications.
|
||||
|
||||
1. Don't fill more than a line with spaces, dashes or underscores.
|
||||
There's a bug (which will be fixed in a future release) which
|
||||
causes MakeBook to hang on such a line. Note that in the large
|
||||
font, the number of spaces, dashes or underscores will be much
|
||||
smaller than in the small font.
|
||||
2. A string such as He shouted "Wait!--" may place the last quote on
|
||||
the beginning of a line, since the line would break after the
|
||||
second dash. Prevent this by using the PML string: He shouted
|
||||
"Wait!\a150\a150". The non-breaking dash, code 150, will not break
|
||||
a line. Use \a160 for a non-breaking space. Even better: use \a151,
|
||||
a long dash, instead of two short dashes.
|
||||
|
||||
* The justification codes \c and \r (center and right justification) must
|
||||
have closing codes on the beginning of the line following the justified
|
||||
text.
|
||||
* The indentation tag \t must have a closing tag at the end of a line of
|
||||
the indented text.
|
||||
* Use \s (small font) in the title page(s) of books to force the page(s) to
|
||||
format nicely. Other than that, \n, \s and \l should rarely be necessary;
|
||||
the font size used for most text display should be chosen by the user.
|
||||
|
||||
|
||||
Converting Uncommon Characters to PML
|
||||
-------------------------------------
|
||||
|
||||
Use this chart to convert uncommon characters to their Palm Markup Language
|
||||
(PML) equivalent. Most characters are simply represented as themselves in PML
|
||||
and don't require this chart. But some uncommon characters can only be
|
||||
represented in PML by their "\aXXX" syntax. Use this chart to look up that
|
||||
"\aXXX" syntax.
|
||||
|
||||
For Example, if you wanted to write the following phrase in PML:
|
||||
|
||||
Copyright © 1999 by Samuel Morgenstern
|
||||
|
||||
In PML, you would write it as:
|
||||
|
||||
Copyright \a169 1999 by Samuel Morgenstern
|
||||
|
||||
Char HTML # Code HTML Char Code PML Char Code Description
|
||||
|
||||
  - Normal space
|
||||
! ! - ! Exclamation
|
||||
" " " " Double quote
|
||||
# # - # Hash
|
||||
$ $ - $ Dollar
|
||||
% % - % Percent
|
||||
& & & & Ampersand
|
||||
' ' - ' Apostrophe
|
||||
( ( - ( Open bracket
|
||||
) ) - ) Close bracket
|
||||
* * - * Asterisk
|
||||
+ + - + Plus sign
|
||||
, , - , Comma
|
||||
- - - - Minus sign
|
||||
. . - . Period
|
||||
/ / - / Forward slash
|
||||
0 0 - 0 Digit 0
|
||||
1 1 - 1 Digit 1
|
||||
2 2 - 2 Digit 2
|
||||
3 3 - 3 Digit 3
|
||||
4 4 - 4 Digit 4
|
||||
5 5 - 5 Digit 5
|
||||
6 6 - 6 Digit 6
|
||||
7 7 - 7 Digit 7
|
||||
8 8 - 8 Digit 8
|
||||
9 9 - 9 Digit 9
|
||||
: : - : Colon
|
||||
; ; - ; Semicolon
|
||||
< < < Less than
|
||||
= = - = Equals
|
||||
> > > Greater than
|
||||
? ? - ? Question mark
|
||||
@ @ - @ At sign
|
||||
A A - A A
|
||||
B B - B B
|
||||
C C - C C
|
||||
D D - D D
|
||||
E E - E E
|
||||
F F - F F
|
||||
G G - G G
|
||||
H H - H H
|
||||
I I - I I
|
||||
J J - J J
|
||||
K K - K K
|
||||
L L - L L
|
||||
M M - M M
|
||||
N N - N N
|
||||
O O - O O
|
||||
P P - P P
|
||||
Q Q - Q Q
|
||||
R R - R R
|
||||
S S - S S
|
||||
T T - T T
|
||||
U U - U U
|
||||
V V - V V
|
||||
W W - W W
|
||||
X X - X X
|
||||
Y Y - Y Y
|
||||
Z Z - Z Z
|
||||
[ [ - [ Open square bracket
|
||||
\ \ - \\ Backslash
|
||||
] ] - ] Close square bracket
|
||||
^ ^ - ^ Caret
|
||||
_ _ - _ Underscore
|
||||
` ` - ` Grave accent
|
||||
a a - a a
|
||||
b b - b b
|
||||
c c - c c
|
||||
d d - d d
|
||||
e e - e e
|
||||
f f - f f
|
||||
g g - g g
|
||||
h h - h h
|
||||
i i - i i
|
||||
j j - j j
|
||||
k k - k k
|
||||
l l - l l
|
||||
m m - m m
|
||||
n n - n n
|
||||
o o - o o
|
||||
p p - p p
|
||||
q q - q q
|
||||
r r - r r
|
||||
s s - s s
|
||||
t t - t t
|
||||
u u - u u
|
||||
v v - v v
|
||||
w w - w w
|
||||
x x - x x
|
||||
y y - y y
|
||||
z z - z z
|
||||
{ { - { Left brace
|
||||
| | - | Vertical bar
|
||||
} } - } Right brace
|
||||
~ ~ - ~ Tilde
|
||||
|
||||
  \a160 Non-breaking space
|
||||
¡ ¡ \a161 Inverted exclamation
|
||||
¢ ¢ \a162 Cent sign
|
||||
£ £ \a163 Pound sign
|
||||
¤ ¤ \a164 Currency sign
|
||||
¥ ¥ \a165 Yen sign
|
||||
¦ ¦ \a166 Broken bar
|
||||
§ § \a167 Section sign
|
||||
¨ ¨ \a168 Umlaut or diaeresis
|
||||
© © \a169 Copyright sign
|
||||
ª ª \a170 Feminine ordinal
|
||||
« « \a171 Left angle quotes
|
||||
¬ ¬ \a172 Logical not sign
|
||||
­ ­ \a173 Soft hyphen
|
||||
® ® \a174 Registered trademark
|
||||
¯ ¯ \a175 Spacing macron
|
||||
° ° \a176 Degree sign
|
||||
± ± \a177 Plus-minus sign
|
||||
² ² \a178 Superscript 2
|
||||
³ ³ \a179 Superscript 3
|
||||
´ ´ \a180 Spacing acute
|
||||
µ µ \a181 Micro sign
|
||||
¶ ¶ \a182 Paragraph sign
|
||||
· · \a183 Middle dot
|
||||
¸ ¸ \a184 Spacing cedilla
|
||||
¹ ¹ \a185 Superscript 1
|
||||
º º \a186 Masculine ordinal
|
||||
» » \a187 Right angle quotes
|
||||
¼ ¼ \a188 One quarter
|
||||
½ ½ \a189 One half
|
||||
¾ ¾ \a190 Three quarters
|
||||
¿ ¿ \a191 Inverted question mark
|
||||
À À \a192 A grave
|
||||
Á Á \a193 A acute
|
||||
  \a194 A circumflex
|
||||
à à \a195 A tilde
|
||||
Ä Ä \a196 A diaeresis
|
||||
Å Å \a197 A ring
|
||||
Æ &Aelig; \a198 AE ligature
|
||||
Ç Ç \a199 C cedilla
|
||||
È È \a200 E grave
|
||||
É É \a201 E acute
|
||||
Ê Ê \a202 E circumflex
|
||||
Ë Ë \a203 E diaeresis
|
||||
Ì Ì \a204 I grave
|
||||
Í Í \a205 I acute
|
||||
Î Î \a206 I circumflex
|
||||
Ï Ï \a207 I diaeresis
|
||||
Ð Ð \a208 Eth
|
||||
Ñ Ñ \a209 N tilde
|
||||
Ò Ò \a210 O grave
|
||||
Ó Ó \a211 O acute
|
||||
Ô Ô \a212 O circumflex
|
||||
Õ Õ \a213 O tilde
|
||||
Ö Ö \a214 O diaeresis
|
||||
× × \a215 Multiplication sign
|
||||
Ø Ø \a216 O slash
|
||||
Ù Ù \a217 U grave
|
||||
Ú Ú \a218 U acute
|
||||
Û Û \a219 U circumflex
|
||||
Ü Ü \a220 U diaeresis
|
||||
Ý Ý \a221 Y acute
|
||||
Þ Þ \a222 THORN
|
||||
ß ß \a223 sharp s
|
||||
à à \a224 a grave
|
||||
á á \a225 a acute
|
||||
â â \a226 a circumflex
|
||||
ã ã \a227 a tilde
|
||||
ä ä \a228 a diaeresis
|
||||
å å \a229 a ring
|
||||
æ æ \a230 ae ligature
|
||||
ç ç \a231 c cedilla
|
||||
è è \a232 e grave
|
||||
é é \a233 e acute
|
||||
ê ê \a234 e circumflex
|
||||
ë ë \a235 e diaeresis
|
||||
ì ì \a236 i grave
|
||||
í í \a237 i acute
|
||||
î î \a238 i circumflex
|
||||
ï ï \a239 i diaeresis
|
||||
ð ð \a240 eth
|
||||
ñ ñ \a241 n tilde
|
||||
ò ò \a242 o grave
|
||||
ó ó \a243 o acute
|
||||
ô ô \a244 o circumflex
|
||||
õ õ \a245 o tilde
|
||||
ö ö \a246 o diaeresis
|
||||
÷ ÷ \a247 division sign
|
||||
ø ø \a248 o slash
|
||||
ù ù \a249 u grave
|
||||
ú ú \a250 u acute
|
||||
û û \a251 u circumflex
|
||||
ü ü \a252 u diaeresis
|
||||
ý ý \a253 y acute
|
||||
þ þ \a254 thorn
|
||||
ÿ ÿ \a255 y diaeresis
|
||||
, ‚ ‚ \a130 single low quote
|
||||
ƒ ƒ \a131 Scripted f
|
||||
„ „ \a132 low quote
|
||||
… … \a133 Ellipsis
|
||||
† † \a134 Dagger
|
||||
‡ &Dagger \a135 Double dagger
|
||||
Š Š \a138 Large S w/inverted caret
|
||||
< ‹ ‹ \a139 single left angle quote
|
||||
Œ Œ \a140 Large combined oe
|
||||
‘ ‘ \a145 Open single smart quote
|
||||
’ ’ \a146 Close single smart quote
|
||||
“ “ \a147 Open double smart quote
|
||||
” ” \a148 Close double smart quote
|
||||
• • \a149 Bullet
|
||||
– – \a150 Small dash (en dash)
|
||||
— — \a151 Large dash (em dash)
|
||||
™ ™ \a153 Trademark
|
||||
š š \a154 Small S w/inverted caret
|
||||
> › › \a155 single right angle quote
|
||||
œ œ \a156 Small combined oe
|
||||
Ÿ Ÿ \a159 Large Y with diaeresis
|
||||
|
||||
|
||||
Extended Character Set
|
||||
----------------------
|
||||
|
||||
In addition to the special characters supported by earlier versions of eReader
|
||||
(which can be accessed using the \a### tag), all versions of eReader Pro and
|
||||
eReader version 2.4 and later include support for additional special characters
|
||||
and symbols. These symbols can be accessed using the \U#### tag, where #### are
|
||||
four hexidecimal digits giving the Unicode encoding of the special character.
|
||||
|
||||
Only the limited subset of Unicode characters given in the table below are
|
||||
supported. In addition, some of the characters that are included in the table
|
||||
are not present in eReader Pro versions prior to 2.4. To ensure that the
|
||||
characters are displayed correctly, books using these tags should be read using
|
||||
eReader or eReader Pro version 2.4 or later.
|
||||
|
||||
On Palm OS handhelds these special symbols are only available in one size,
|
||||
matching the "Small" font. For best results on Palm OS handhelds the \U tag
|
||||
should only be used inside blocks set to the "Small" font by way of \s tags.
|
||||
On Palm OS handhelds these special characters are not affected by the font tags
|
||||
(\s, \l, \b and \n), the bold style tag (\B), or the small caps style tag (\k).
|
||||
|
||||
If the \U characters are not showing up correctly using eReader on your Windows
|
||||
desktop or laptop this problem is a result of the fonts for eReader not being
|
||||
installed properly. The solution is to go to the directory C:\Windows\Fonts\
|
||||
and "double click" on each font that starts with "Maynard". This will open each
|
||||
font and allow the system to register it. Close the windows that were opened a
|
||||
result of the mouse clicks and the problem should be resolved.
|
||||
|
||||
Char HTML Code PML Code Description
|
||||
|
||||
Latin Extended-A
|
||||
Ā Ā \U0100 LATIN CAPITAL LETTER A WITH MACRON
|
||||
ā ā \U0101 LATIN SMALL LETTER A WITH MACRON
|
||||
Ă Ă \U0102 LATIN CAPITAL LETTER A WITH BREVE
|
||||
ă ă \U0103 LATIN SMALL LETTER A WITH BREVE
|
||||
ą ą \U0105 LATIN SMALL LETTER A WITH OGONEK
|
||||
ć ć \U0107 LATIN SMALL LETTER C WITH ACUTE
|
||||
Č Č \U010C LATIN CAPITAL LETTER C WITH CARON
|
||||
č č \U010D LATIN SMALL LETTER C WITH CARON
|
||||
Ē Ē \U0112 LATIN CAPITAL LETTER E WITH MACRON
|
||||
ē ē \U0113 LATIN SMALL LETTER E WITH MACRON
|
||||
ĕ ĕ \U0115 LATIN SMALL LETTER E WITH BREVE
|
||||
ė ė \U0117 LATIN SMALL LETTER E WITH DOT ABOVE
|
||||
ę ę \U0119 LATIN SMALL LETTER E WITH OGONEK
|
||||
ě ě \U011B LATIN SMALL LETTER E WITH CARON
|
||||
ĝ ĝ \U011D LATIN SMALL LETTER G WITH CIRCUMFLEX
|
||||
ğ ğ \U011F LATIN SMALL LETTER G WITH BREVE
|
||||
Ī Ī \U012A LATIN CAPITAL LETTER I WITH MACRON
|
||||
ī ī \U012B LATIN SMALL LETTER I WITH MACRON
|
||||
ĭ ĭ \U012D LATIN SMALL LETTER I WITH BREVE
|
||||
į į \U012F LATIN SMALL LETTER I WITH OGONEK
|
||||
ı ı \U0131 LATIN SMALL LETTER DOTLESS I
|
||||
Ł Ł \U0141 LATIN CAPITAL LETTER L WITH STROKE
|
||||
ł ł \U0142 LATIN SMALL LETTER L WITH STROKE
|
||||
ń ń \U0144 LATIN SMALL LETTER N WITH ACUTE
|
||||
ň ň \U0148 LATIN SMALL LETTER N WITH CARON
|
||||
ŋ ŋ \U014B LATIN SMALL LETTER ENG
|
||||
Ō Ō \U014C LATIN CAPITAL LETTER O WITH MACRON
|
||||
ō ō \U014D LATIN SMALL LETTER O WITH MACRON
|
||||
ŏ ŏ \U014F LATIN SMALL LETTER O WITH BREVE
|
||||
ő ő \U0151 LATIN SMALL LETTER O WITH DOUBLE ACUTE
|
||||
ŕ ŕ \U0155 LATIN SMALL LETTER R WITH ACUTE
|
||||
ř ř \U0159 LATIN SMALL LETTER R WITH CARON
|
||||
Ś Ś \U015A LATIN CAPITAL LETTER S WITH ACUTE
|
||||
ś ś \U015B LATIN SMALL LETTER S WITH ACUTE
|
||||
ş ş \U015F LATIN SMALL LETTER S WITH CEDILLA
|
||||
ţ ţ \U0163 LATIN SMALL LETTER T WITH CEDILLA
|
||||
ũ ũ \U0169 LATIN SMALL LETTER U WITH TILDE
|
||||
ū ū \U016B LATIN SMALL LETTER U WITH MACRON
|
||||
ŭ ŭ \U016D LATIN SMALL LETTER U WITH BREVE
|
||||
ŷ ŷ \U0177 LATIN SMALL LETTER Y WITH CIRCUMFLEX
|
||||
ź ź \U017A LATIN SMALL LETTER Z WITH ACUTE
|
||||
Ž Ž \U017D LATIN CAPITAL LETTER Z WITH CARON
|
||||
ž ž \U017E LATIN SMALL LETTER Z WITH CARON
|
||||
Latin Extended-B
|
||||
ƿ \U01BF LATIN LETTER WYNN
|
||||
ǎ \U01CE LATIN SMALL LETTER A WITH CARON
|
||||
ǐ \U01D0 LATIN SMALL LETTER I WITH CARON
|
||||
ǒ \U01D2 LATIN SMALL LETTER O WITH CARON
|
||||
ǔ \U01D4 LATIN SMALL LETTER U WITH CARON
|
||||
ǡ \U01E1 LATIN SMALL LETTER A WITH DOT ABOVE AND MACRON
|
||||
ǣ \U01E3 LATIN SMALL LETTER AE WITH MACRON
|
||||
ǧ \U01E7 LATIN SMALL LETTER G WITH CARON
|
||||
ǫ \U01EB LATIN SMALL LETTER O WITH OGONEK
|
||||
ǰ \U01F0 LATIN SMALL LETTER J WITH CARON
|
||||
ȇ \U0207 LATIN SMALL LETTER E WITH INVERTED BREVE
|
||||
ȝ \U021D LATIN SMALL LETTER YOGH
|
||||
ȧ \U0227 LATIN SMALL LETTER A WITH DOT ABOVE
|
||||
ȯ \U022F LATIN SMALL LETTER O WITH DOT ABOVE
|
||||
ȳ \U0233 LATIN SMALL LETTER Y WITH MACRON
|
||||
IPA Extensions
|
||||
ɑ \U0251 LATIN SMALL LETTER SCRIPT A
|
||||
ɒ \U0252 LATIN SMALL LETTER TURNED SCRIPT A
|
||||
ɔ \U0254 LATIN SMALL LETTER OPEN O
|
||||
ə \U0259 LATIN SMALL LETTER SCHWA
|
||||
ɜ \U025C LATIN SMALL LETTER REVERSED OPEN E
|
||||
ɥ \U0265 LATIN LETTER SMALL LETTER TURNED H
|
||||
ɪ \U026A LATIN LETTER SMALL CAPITAL I
|
||||
ɲ \U0272 LATIN SMALL LETTER N WITH LEFT HOOK
|
||||
ʃ \U0283 LATIN SMALL LETTER ESH
|
||||
ʉ \U0289 LATIN SMALL LETTER U BAR
|
||||
ʊ \U028A LATIN SMALL LETTER UPSILON
|
||||
ʌ \U028C LATIN SMALL LETTER TURNED V
|
||||
ʏ \U028F LATIN LETTER SMALL CAPITAL Y
|
||||
ʒ \U0292 LATIN SMALL LETTER EZH
|
||||
ʔ \U0294 LATIN LETTER GLOTTAL STOP
|
||||
ʜ \U029C LATIN LETTER SMALL CAPITAL H
|
||||
Spacing Modifier Letters
|
||||
ʾ \U02BE MODIFIER LETTER RIGHT HALF RING
|
||||
ʿ \U02BF MODIFIER LETTER LEFT HALF RING
|
||||
ˇ ˇ \U02C7 CARON
|
||||
ˈ \U02C8 MODIFIER LETTER VERTICAL LINE
|
||||
ˌ \U02CC MODIFIER LETTER LOW VERTICAL LINE
|
||||
ː \U02D0 MODIFIER LETTER TRIANGULAR COLON
|
||||
˘ ˘ \U02D8 BREVE
|
||||
˙ ˙ \U02D9 DOT ABOVE
|
||||
Greek and Coptic
|
||||
Α Α \U0391 GREEK CAPTIAL LETTER ALPHA
|
||||
Β Β \U0392 GREEK CAPTIAL LETTER BETA
|
||||
Γ Γ \U0393 GREEK CAPTIAL LETTER GAMMA
|
||||
Δ Ε \U0394 GREEK CAPTIAL LETTER DELTA
|
||||
Ε Ε \U0395 GREEK CAPTIAL LETTER EPSILON
|
||||
Ζ Ζ \U0396 GREEK CAPTIAL LETTER ZETA
|
||||
Η Η \U0397 GREEK CAPTIAL LETTER ETA
|
||||
Θ Θ \U0398 GREEK CAPTIAL LETTER THETA
|
||||
Ι Ι \U0399 GREEK CAPTIAL LETTER IOTA
|
||||
Κ Κ \U039A GREEK CAPTIAL LETTER KAPPA
|
||||
Λ Λ \U039B GREEK CAPTIAL LETTER LAMBDA
|
||||
Μ Μ \U039C GREEK CAPTIAL LETTER MU
|
||||
Ν Ν \U039D GREEK CAPTIAL LETTER NU
|
||||
Ξ Ξ \U039E GREEK CAPTIAL LETTER XI
|
||||
Ο Ο \U039F GREEK CAPTIAL LETTER OMICRON
|
||||
Π Π \U03A0 GREEK CAPTIAL LETTER PI
|
||||
Ρ Ρ \U03A1 GREEK CAPTIAL LETTER RHO
|
||||
Σ Σ \U03A3 GREEK CAPTIAL LETTER SIGMA
|
||||
Τ Τ \U03A4 GREEK CAPTIAL LETTER TAU
|
||||
Υ Υ \U03A5 GREEK CAPTIAL LETTER UPSILON
|
||||
Φ Φ \U03A6 GREEK CAPTIAL LETTER PHI
|
||||
Χ Χ \U03A7 GREEK CAPTIAL LETTER CHI
|
||||
Ψ Ψ \U03A8 GREEK CAPTIAL LETTER PSI
|
||||
Ω Ω \U03A9 GREEK CAPTIAL LETTER OMEGA
|
||||
α α \U03B1 GREEK SMALL LETTER ALPHA
|
||||
β β \U03B2 GREEK SMALL LETTER BETA
|
||||
γ γ \U03B3 GREEK SMALL LETTER GAMMA
|
||||
δ δ \U03B4 GREEK SMALL LETTER DELTA
|
||||
ε ε \U03B5 GREEK SMALL LETTER EPSILON
|
||||
ζ ζ \U03B6 GREEK SMALL LETTER ZETA
|
||||
η η \U03B7 GREEK SMALL LETTER ETA
|
||||
θ θ \U03B8 GREEK SMALL LETTER THETA
|
||||
ι ι \U03B9 GREEK SMALL LETTER IOTA
|
||||
κ κ \U03BA GREEK SMALL LETTER KAPPA
|
||||
λ λ \U03BB GREEK SMALL LETTER LAMBDA
|
||||
μ μ \U03BC GREEK SMALL LETTER MU
|
||||
ν ν \U03BD GREEK SMALL LETTER NU
|
||||
ξ ξ \U03BE GREEK SMALL LETTER XI
|
||||
ο ο \U03BF GREEK SMALL LETTER OMICRON
|
||||
π π \U03C0 GREEK SMALL LETTER PI
|
||||
ρ ρ \U03C1 GREEK SMALL LETTER RHO
|
||||
ς ς \U03C2 GREEK SMALL LETTER FINAL SIGMA
|
||||
σ σ \U03C3 GREEK SMALL LETTER SIGMA
|
||||
τ τ \U03C4 GREEK SMALL LETTER TAU
|
||||
υ υ \U03C5 GREEK SMALL LETTER UPSILON
|
||||
φ φ \U03C6 GREEK SMALL LETTER PHI
|
||||
χ χ \U03C7 GREEK SMALL LETTER CHI
|
||||
ψ ψ \U03C8 GREEK SMALL LETTER PSI
|
||||
ω ω \U03C9 GREEK SMALL LETTER OMEGA
|
||||
ϑ \U03D1 GREEK THETA SYMBOL
|
||||
ϝ \U03DD GREEK SMALL LETTER DIGAMMA
|
||||
Hebrew
|
||||
א א \U05D0 HEBREW LETTER ALEPH
|
||||
ב ב \U05D1 HEBREW LETTER BET
|
||||
ג ג \U05D2 HEBREW LETTER GIMEL
|
||||
ד ד \U05D3 HEBREW LETTER DALET
|
||||
ה ה \U05D4 HEBREW LETTER HE
|
||||
ו ו \U05D5 HEBREW LETTER VAV
|
||||
ז ז \U05D6 HEBREW LETTER ZAYIN
|
||||
ח ח \U05D7 HEBREW LETTER HET
|
||||
ט ט \U05D8 HEBREW LETTER TET
|
||||
י י \U05D9 HEBREW LETTER YOD
|
||||
ך ך \U05DA HEBREW LETTER FINAL KAF
|
||||
כ כ \U05DB HEBREW LETTER KAF
|
||||
ל ל \U05DC HEBREW LETTER LAMED
|
||||
ם ם \U05DD HEBREW LETTER FINAL MEM
|
||||
מ מ \U05DE HEBREW LETTER MEM
|
||||
ן ן \U05DF HEBREW LETTER FINAL NUN
|
||||
נ נ \U05E0 HEBREW LETTER NUN
|
||||
ס ס \U05E1 HEBREW LETTER SAMEKH
|
||||
ע ע \U05E2 HEBREW LETTER AYIN
|
||||
ף ף \U05E3 HEBREW LETTER FINAL PE
|
||||
פ פ \U05E4 HEBREW LETTER PE
|
||||
ץ ץ \U05E5 HEBREW LETTER FINAL TSADI
|
||||
צ צ \U05E6 HEBREW LETTER TSADI
|
||||
ק ק \U05E7 HEBREW LETTER QOF
|
||||
ר ר \U05E8 HEBREW LETTER RESH
|
||||
ת ת \U05EA HEBREW LETTER TAV
|
||||
Latin Extended Additional
|
||||
ḋ \U1E0B LATIN SMALL LETTER D WITH DOT ABOVE
|
||||
ḍ \U1E0D LATIN SMALL LETTER D WITH DOT BELOW
|
||||
ḗ \U1E17 LATIN SMALL LETTER E WITH MACRON AND ACUTE
|
||||
Ḣ \U1E22 LATIN CAPITAL LETTER H WITH DOT ABOVE
|
||||
Ḥ \U1E24 LATIN CAPITAL LETTER H WITH DOT BELOW
|
||||
ḥ \U1E25 LATIN SMALL LETTER H WITH DOT BELOW
|
||||
ḫ \U1E2B LATIN SMALL LETTER H WITH BREVE BELOW
|
||||
ḳ \U1E33 LATIN SMALL LETTER K WITH DOT BELOW
|
||||
ḷ \U1E37 LATIN SMALL LETTER L WITH DOT BELOW
|
||||
ṁ \U1E41 LATIN SMALL LETTER M WITH DOT ABOVE
|
||||
ṃ \U1E43 LATIN SMALL LETTER M WITH DOT BELOW
|
||||
ṅ \U1E45 LATIN SMALL LETTER N WITH DOT ABOVE
|
||||
ṇ \U1E47 LATIN SMALL LETTER N WITH DOT BELOW
|
||||
ṓ \U1E53 LATIN SMALL LETTER O WITH MACRON AND ACUTE
|
||||
ṙ \U1E59 LATIN SMALL LETTER R WITH DOT ABOVE
|
||||
Ṛ \U1E5A LATIN CAPITAL LETTER R WITH DOT BELOW
|
||||
ṛ \U1E5B LATIN SMALL LETTER R WITH DOT BELOW
|
||||
ṡ \U1E61 LATIN SMALL LETTER S WITH DOT ABOVE
|
||||
ṣ \U1E63 LATIN SMALL LETTER S WITH DOT BELOW
|
||||
ṫ \U1E6B LATIN SMALL LETTER T WITH DOT ABOVE
|
||||
ṭ \U1E6D LATIN SMALL LETTER T WITH DOT BELOW
|
||||
ṯ \U1E6F LATIN SMALL LETTER T WITH LINE BELOW
|
||||
ẑ \U1E91 LATIN SMALL LETTER Z WITH CIRCUMFLEX
|
||||
ẓ \U1E93 LATIN SMALL LETTER Z WITH DOT BELOW
|
||||
ẖ \U1E96 LATIN SMALL LETTER H WITH LINE BELOW
|
||||
ạ \U1EA1 LATIN SMALL LETTER A WITH DOT BELOW
|
||||
ọ \U1ECD LATIN SMALL LETTER O WITH DOT BELOW
|
||||
ỹ \U1EF9 LATIN SMALL LETTER Y WITH TILDE
|
||||
General Punctuation
|
||||
- ‑ \U2011 NON-BREAKING HYPHEN
|
||||
‸ \U2038 CARET
|
||||
‽ \U203D INTERROBANG
|
||||
⁂ \U2042 ASTERISM
|
||||
Arrows
|
||||
← ← \U2190 LEFTWARDS ARROW
|
||||
→ → \U2192 RIGHTWARDS ARROW
|
||||
Mathematical Operators
|
||||
∂ ∂ \U2202 PARTIAL DIFFERENTIAL
|
||||
√ √ \U221A SQUARE ROOT
|
||||
∞ ∞ \U221E INFINITY
|
||||
∥ ∥ \U2225 PARALLEL TO
|
||||
∫ ∫ \U222B INTEGRAL
|
||||
≠ ≠ \U2260 NOT EQUAL TO
|
||||
⊔ \U2294 SQUARE CUP
|
||||
⊕ \U2295 CIRCLED PLUS
|
||||
⋮ \U22EE VERTICAL ELLIPSIS
|
||||
Enclosed Alphanumerics
|
||||
Ⓤ \U24CA CIRCLED LATIN CAPITAL LETTER U
|
||||
Miscellaneous Symbols
|
||||
☜ ☜ \U261C WHITE LEFT POINTING INDEX
|
||||
☞ ☞ \U261E WHITE RIGHT POINTING INDEX
|
||||
☿ \U263F MERCURY
|
||||
♀ \U2640 FEMALE SIGN
|
||||
♂ \U2642 MALE SIGN
|
||||
♃ \U2643 JUPITER
|
||||
♄ \U2644 SATURN
|
||||
♅ \U2645 URANUS
|
||||
♆ \U2646 NEPTUNE
|
||||
♇ \U2647 PLUTO
|
||||
♠ \U2660 BLACK SPADE SUIT
|
||||
♡ \U2661 WHITE HEART SUIT
|
||||
♢ \U2662 WHITE DIAMOND SUIT
|
||||
♣ \U2663 BLACK CLUB SUIT
|
||||
♭ \U266D MUSIC FLAT SIGN
|
||||
♮ \U266E MUSIC NATURAL SIGN
|
||||
♯ \U266F MUSIC SHARP SIGN
|
||||
Dingbats
|
||||
✓ \U2713 CHECK MARK
|
||||
✠ \U2720 MALTESE CROSS
|
||||
Private Use Area
|
||||
- \UE000 LATIN SMALL LETTER A WITH MACRON AND ACUTE
|
||||
- \UE001 LATIN SMALL LETTER A WITH MACRON AND TILDE
|
||||
- \UE002 LATIN SMALL LETTER A WITH VERTICAL LINE ABOVE
|
||||
- \UE003 LATIN CAPITAL LETTER C WITH MACRON
|
||||
- \UE004 LATIN SMALL LETTER C WITH MACRON
|
||||
- \UE005 LATIN SMALL LETTER C WITH BREVE
|
||||
- \UE006 LATIN SMALL LETTER C WITH DOT BELOW
|
||||
- \UE007 LATIN SMALL LIGATURE CH
|
||||
- \UE008 LATIN CAPITAL LETTER D WITH MACRON
|
||||
- \UE009 LATIN SMALL LETTER E WITH BAR BELOW
|
||||
- \UE00A LATIN SMALL LETTER E WITH TILDE
|
||||
- \UE00B LATIN SMALL LETTER E WITH MACRON AND BREVE
|
||||
- \UE00C LATIN SMALL LETTER E WITH TILDE AND DOT ABOVE
|
||||
- \UE00D LATIN SMALL LETTER E WITH HOOK RIGHT BELOW
|
||||
- \UE00E LATIN SMALL LETTER G WITH INVERTED BREVE
|
||||
- \UE00F LATIN SMALL LETTER I WITH INVERTED BREVE BELOW
|
||||
- \UE010 LATIN SMALL LETTER I WITH MACRON AND ACUTE
|
||||
- \UE011 LATIN SMALL LETTER K WITH CIRCUMFLEX
|
||||
- \UE012 LATIN SMALL LETTER K WITH BREVE
|
||||
- \UE013 LATIN SMALL LETTER K WITH INVERTED BREVE
|
||||
- \UE014 LATIN SMALL LIGATURE KH
|
||||
- \UE015 LATIN CAPITAL LETTER L WITH MACRON
|
||||
- \UE016 LATIN SMALL LETTER L WITH TILDE
|
||||
- \UE017 LATIN SMALL LETTER L WITH INVERTED BREVE
|
||||
- \UE018 LATIN CAPITAL LETTER M WITH MACRON
|
||||
- \UE019 LATIN SMALL LETTER M WITH MACRON
|
||||
- \UE01A LATIN SMALL LETTER M WITH TILDE
|
||||
- \UE01B LATIN SMALL LETTER O WITH CEDILLA
|
||||
- \UE01C LATIN SMALL LETTER O WITH MACRON AND CIRUMFLEX
|
||||
- \UE01E LATIN SMALL LIGATURE OI
|
||||
- \UE01F LATIN SMALL LIGATURE OO
|
||||
- \UE020 LATIN SMALL LIGATURE OO WITH MACRON
|
||||
- \UE021 LATIN SMALL LIGATURE OU
|
||||
- \UE022 LATIN SMALL LETTER OPEN O WITH ACUTE
|
||||
- \UE023 LATIN SMALL LETTER R WITH DIARESIS
|
||||
- \UE024 LATIN SMALL LETTER R WITH CIRCUMFLEX
|
||||
- \UE025 LATIN SMALL LETTER R WITH RING BELOW
|
||||
- \UE026 LATIN SMALL LETTER S WITH VERTICAL LINE ABOVE
|
||||
- \UE027 LATIN SMALL LETTER S WITH OGONEK
|
||||
- \UE028 LATIN SMALL LETTER S WITH COMMA
|
||||
- \UE02A LATIN SMALL LETTER S WITH BREVE
|
||||
- \UE02B LATIN SMALL LIGATURE SH
|
||||
- \UE02C LATIN SMALL LIGATURE TH
|
||||
- \UE02D LATIN SMALL LETTER U WITH MACRON AND ACUTE
|
||||
- \UE02E LATIN CAPITAL LETTER V WITH MACRON
|
||||
- \UE02F LATIN CAPITAL LETTER X WITH MACRON
|
||||
- \UE030 LATIN SMALL LETTER X WITH CIRCUMFLEX
|
||||
- \UE031 LATIN SMALL LETTER Y WITH BREVE
|
||||
- \UE032 LATIN SMALL LIGATURE ZH
|
||||
- \UE033 LATIN SMALL LETTER TURNED E WITH ACUTE
|
||||
- \UE034 LATIN SMALL LETTER TURNED E WITH CIRCUMFLEX
|
||||
- \UE035 GREEK SMALL LETTER ALPHA WITH GRAVE
|
||||
- \UE036 MUSICAL SYMBOL SEGNO
|
||||
- \UE037 MUSICAL SYMBOL FERMATA
|
||||
- \UE038 MUSICAL SYMBOL CRESCENDO
|
||||
- \UE039 MUSICAL SYMBOL DECRESCENDO
|
||||
- \UE03A MUSICAL SYMBOL DOUBLE SHARP
|
||||
- \UE03B MUSICAL SYMBOL BREVE
|
||||
- \UE03C MUSICAL SYMBOL DOWN BOW
|
||||
- \UE03D MUSICAL SYMBOL UP BOW
|
||||
- \UE03E MUSICAL SYMBOL BREVE ALTERNATE
|
||||
- \UE03F PRINTING SYMBOL DELE
|
||||
- \UE040 PRINTING SYMBOL FRACTIONAL EM
|
||||
- \UE041 INVERTED ASTERISM
|
||||
- \UE042 LATIN SMALL LETTER SCHWA SUPERSCRIPT
|
||||
- \UE043 LATIN SMALL LETTER TURNED Y
|
||||
- \UE044 LATIN SMALL LIGATURE OE WITH MACRON
|
||||
- \UE045 SQUARE ROOT WITH BAR
|
||||
- \UE046 LATIN SMALL LETTER U WITH DOT ABOVE
|
||||
- \UE047 LATIN SMALL LIGATURE UE
|
||||
- \UE048 LATIN SMALL LIGATURE UE WITH MACRON
|
||||
- \UE049 LATIN SMALL LETTER OPEN O WITH TILDE
|
||||
- \UE04A LATIN SMALL LETTER T WITH CARON BELOW
|
||||
- \UE04B LATIN SMALL LETTER SCRIPT A WITH TILDE
|
||||
- \UE04C GREEK SMALL LETTER EPSILON WITH TILDE
|
||||
- \UE04D LATIN SMALL LIGATURE OE WITH TILDE
|
||||
- \UE04E MODIFIER LETTER DOUBLE VERTICAL LINE
|
||||
- \UE04F DOUBLE HYPHEN
|
||||
- \UE050 LATIN SMALL LETTER SCHWA WITH DOT ABOVE
|
||||
- \UE051 LATIN SMALL LETTER SCHWA WITH MACRON
|
||||
Alphabetic Presentation Forms
|
||||
fl fl \UFB02 LATIN SMALL LIGATURE FL
|
||||
שׁ שׁ \UFB2A HEBREW LETTER SINH WITH SHIN DOT
|
||||
שׂ שׂ \UFB2B HEBREW LETTER SINH WITH SIN DOT
|
||||
|
226
format_docs/pdb/ztxt.txt
Normal file
226
format_docs/pdb/ztxt.txt
Normal file
@ -0,0 +1,226 @@
|
||||
The zTXT Format
|
||||
---------------
|
||||
|
||||
The zTXT format is relatively straightforward. The simplest zTXT contains a
|
||||
Palm database header, followed by zTXT record #0, followed by the compressed
|
||||
data. The compressed data can be in one of two formats: one long data stream,
|
||||
or split into chunks for random access. If there are any bookmarks, they occupy
|
||||
the record immediately after the compressed data. If there are any annotations,
|
||||
the annotation index occupies the record immediately after the bookmarks with
|
||||
each annotation in the index having a record immediately after the annotation
|
||||
index. Here are diagrams of a simple zTXT and a full featured zTXT:
|
||||
|
||||
DB Header
|
||||
0 Record 0
|
||||
1
|
||||
2
|
||||
3
|
||||
... Compressed Data
|
||||
36
|
||||
37
|
||||
38
|
||||
|
||||
DB Header
|
||||
0 Record 0
|
||||
1
|
||||
2
|
||||
3
|
||||
... Compressed Data
|
||||
36
|
||||
37
|
||||
38
|
||||
39 Bookmarks
|
||||
40 Annotation Index
|
||||
41 Annotation 1
|
||||
42 Annotation 2
|
||||
43 Annotation 3
|
||||
|
||||
|
||||
Compression Modes
|
||||
-----------------
|
||||
|
||||
zTXT version 1.40 and later supports two modes of compression. Mode 1 is a
|
||||
random access mode, and mode 2 consists of one long data stream. Both modes
|
||||
work on 8K (the default record size) blocks of text.
|
||||
|
||||
Please note, however, that as of Weasel Reader version 1.60 the old style
|
||||
(mode 2) zTXT format is no longer supported. makeztxt and libztxt still support
|
||||
creating these documents for backwards compatibility, but you should not use
|
||||
mode 2 if possible.
|
||||
|
||||
|
||||
Mode 1
|
||||
------
|
||||
|
||||
In mode one, 8K blocks of text are compressed into an equal number of blocks of
|
||||
compressed data. Using the Z_FULL_FLUSH flush mode with zLib allows for random
|
||||
access among the blocks of data. In order for this to function, the first block
|
||||
must be decompressed first, and after that any block in the file may be
|
||||
decompressed in any order. In mode 1, the blocks of compressed data will likely
|
||||
not all have the same size.
|
||||
|
||||
|
||||
Mode 2
|
||||
------
|
||||
|
||||
In zTXT versions before 1.40, this was the only method of compression. This
|
||||
mode involves compressing the entire input buffer into a single output buffer
|
||||
and then splitting the resulting buffer into 8K segments. This mode requires
|
||||
that all of the compressed data be decompressed in one pass. Since there are no
|
||||
real 'blocks' of data, the resulting output can be of any blocksize, though
|
||||
typically the default of 8K should be fine. The advantage to mode 2 is that it
|
||||
will give about 10% - 15% more compression.
|
||||
|
||||
|
||||
zTXT Record #0 Definition (version 1.44)
|
||||
----------------------------------------
|
||||
|
||||
Record 0 provides all of the information about the zTXT contents. Be sure it is
|
||||
correct, lest firey death rain down upon your program.
|
||||
|
||||
typedef struct zTXT_record0Type {
|
||||
UInt16 version;
|
||||
UInt16 numRecords;
|
||||
UInt32 size;
|
||||
UInt16 recordSize;
|
||||
UInt16 numBookmarks;
|
||||
UInt16 bookmarkRecord;
|
||||
UInt16 numAnnotations;
|
||||
UInt16 annotationRecord;
|
||||
UInt8 flags;
|
||||
UInt8 reserved;
|
||||
UInt32 crc32;
|
||||
UInt8 padding[0x20 - 24];
|
||||
} zTXT_record0;
|
||||
|
||||
|
||||
Structure Elements
|
||||
------------------
|
||||
|
||||
UInt16 version;
|
||||
|
||||
This is mostly just informational. Your program can figure out what features
|
||||
might be available from the version. However, the remaining parts of the
|
||||
structure are designed such that their value will be 0 if that particular
|
||||
feature is not present, so that is the correct way to test. The version is
|
||||
stored as two 8 bit integers. For example, version 1.42 is 0x012A.
|
||||
|
||||
UInt16 numRecords;
|
||||
|
||||
This is the number of DATA records only and does not include record 0,
|
||||
bookmarks, or annotations. With compression mode 1, this is also the number of
|
||||
uncompressed text records. With mode 2, you must decompress the file to figure
|
||||
out how many text records there will be.
|
||||
|
||||
UInt32 size;
|
||||
|
||||
The size in bytes of the uncompressed data in the zTXT. Check this value with
|
||||
the amount of free storage memory on the Palm to make sure there's enough room
|
||||
to decompress the data in full or in part.
|
||||
|
||||
UInt16 recordSize;
|
||||
|
||||
recordSize is the size in bytes of a text record. This field is important, as
|
||||
the size of text and decompression buffers is based on this value. It is used
|
||||
by Weasel to navigate though the text so it can map absolute offsets to record
|
||||
numberss. 8192 is the default. With compression mode 1, this is the amount of
|
||||
data inside each compressed record (except maybe the last one), but the actual
|
||||
compressed records will likely have varying sizes. In mode 2, both compressed
|
||||
records and the resulting text records are all of this size (except, again, the
|
||||
last record).
|
||||
|
||||
UInt16 numBookmarks;
|
||||
|
||||
The definitive count of how many bookmarks are stored in the bookmark index
|
||||
record. See the section on bookmarks below.
|
||||
|
||||
UInt16 bookmarkRecord;
|
||||
|
||||
If there are any bookmarks, this is set to the record index number that
|
||||
contains the bookmark listing, otherwise it is 0.
|
||||
|
||||
UInt16 numAnnotations;
|
||||
|
||||
Like the bookmark count, this is the definitive count of how many annotations
|
||||
are in the annotation index and how many annotation records follow it. See the
|
||||
section on annotation below.
|
||||
|
||||
UInt16 annotationRecord;
|
||||
|
||||
If there are any annotations, this is set to the record index number that
|
||||
contains the annotation index, otherwise it is 0.
|
||||
|
||||
UInt8 flags;
|
||||
|
||||
These flags indicate various features of the zTXT database. flags is a bitmask
|
||||
and at present the only two defined bits are:
|
||||
|
||||
ZTXT_RANDOMACCESS (0x01)
|
||||
If the zTXT was compressed according to the method in mode 1, then it
|
||||
supports random access and this should be set.
|
||||
ZTXT_NONUNIFORM (0x02)
|
||||
Setting this bit indicates that the text records within the zTXT database
|
||||
are not of uniform length. That is, when the blocks of text are
|
||||
decompressed they will not have identical block sizes. If this is not set,
|
||||
the compressed blocks are assumed to all have the same size when
|
||||
decompressed (typically 8K) except for the last block which can be smaller.
|
||||
|
||||
UInt32 crc32;
|
||||
|
||||
A CRC32 value for checking data integrity. This value is computer over all text
|
||||
data record only and does not include record 0 nor any bookmark/annotation
|
||||
records. The current implementation in makeztxt/Weasel computes this value
|
||||
using the crc32 function in zLib which should be the standard CRC32 definition.
|
||||
|
||||
UInt8 padding[0x20 - 24];
|
||||
|
||||
zTXT record zero is 32 bytes in length, so the unused portion is padded.
|
||||
|
||||
|
||||
zTXT Bookmarks
|
||||
--------------
|
||||
|
||||
zTXT bookmarks are stored in a simple array in a record at the end of a zTXT.
|
||||
The format is as follows:
|
||||
|
||||
#define MAX_BMRK_LENGTH 20
|
||||
|
||||
typedef struct GPlmMarkType {
|
||||
UInt32 offset;
|
||||
Char title[MAX_BMRK_LENGTH];
|
||||
} GPlmMark;
|
||||
|
||||
In the structure, offset is counted as an absolute offset into the text. The
|
||||
bookmarks must be sorted in ascending order.
|
||||
|
||||
If there are no bookmarks, then the bookmark index does not exist. When the
|
||||
user creates the first bookmark, the record containing the index will then be
|
||||
created. If there are annotations, when the bookmark record is created it must
|
||||
go before the annotation index. This will require incrementing annotationRecord
|
||||
in record 0 to point to the new record index.
|
||||
|
||||
Similarly, when all bookmarks are deleted the bookmark index record is also
|
||||
deleted. If there are annotations, annotationRecord in record 0 must be
|
||||
decremented to point to the new index.
|
||||
|
||||
|
||||
zTXT Annotations
|
||||
----------------
|
||||
|
||||
zTXT annotations have a format almost identical to that of the bookmark index:
|
||||
|
||||
typedef struct GPlmAnnotationType {
|
||||
UInt32 offset;
|
||||
Char title[MAX_BMRK_LENGTH];
|
||||
} GPlmAnnotation;
|
||||
|
||||
Like the bookmarks, offset is an absolute offset into the text. The annotation
|
||||
index is organized just as the bookmarks are, as a single array in a record.
|
||||
Note that this structure does NOT store the actual annotation text.
|
||||
|
||||
The text of each annotation is stored in its own record immediately following
|
||||
the index. So, the first annotation in the index will occupy the first record
|
||||
following the index, and the second annotation will be in the second record
|
||||
following the index, and so on. The text of each annotation is limited to
|
||||
4096 bytes.
|
||||
|
303
format_docs/rb.txt
Normal file
303
format_docs/rb.txt
Normal file
@ -0,0 +1,303 @@
|
||||
Rocket eBook File Format
|
||||
------------------------
|
||||
|
||||
from http://rbmake.sourceforge.net/rb_format.html
|
||||
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
This document attempts to describe the format of a .rb file -- the book
|
||||
format that is downloaded into NuvoMedia's <http://www.nuvomedia.com>
|
||||
hand-held wonder, the Rocket eBook
|
||||
<http://www.rocket-ebook.com/enter.html>.
|
||||
|
||||
*Note:* All multi-byte integers are stored in Vax/Intel order (the
|
||||
opposite of network byte order). Most integers are 4 bytes (an int32),
|
||||
but there are some minor exceptions (as detailed below).
|
||||
|
||||
Also, the following document refers to the .rb file sections as "pages".
|
||||
|
||||
|
||||
Details
|
||||
-------
|
||||
|
||||
The first 4 bytes of the file seem to be a magic number (in hex): B0 0C
|
||||
B0 0C. I like to think of this as a hexidecimal pun on the word "book"
|
||||
(repeated). [Matt Greenwood has reported seeing a magic number of "B0 0C
|
||||
F0 0D" in another type of ReB-related file -- i.e. "book food".]
|
||||
|
||||
The next two bytes appear to be a version number, currently "02 00". I
|
||||
assume this means major version 2, minor version 0.
|
||||
|
||||
The next 4 bytes are the string "NUVO", followed by 4 bytes of 00h. (I
|
||||
have also seen an old title that had 0s in place of the "NUVO".)
|
||||
|
||||
This brings us up to offset 0Eh, at which point we have a 4-byte
|
||||
representation of the date the book was created (Matt Greenwood pointed
|
||||
this out to me -- thanks!). The year is encoded as an int16. On older
|
||||
version of the RocketLibrary was encoding the year's full value (e.g.
|
||||
1999 was "CF 07" and 2000 was "D0 07"), but a more recent version is now
|
||||
using the tm_year value verbatim -- i.e. it's storing 100 for the year
|
||||
2000 ("64 00"). The year is followed by an int8 for the 1-relative month
|
||||
number, and an int8 for the day of the month.
|
||||
|
||||
After that is 6 bytes of 00h. These may be reserved for setting the time
|
||||
of creation (at a guess).
|
||||
|
||||
Then, at offset 18h, we have an int32 that contains the absolute offset
|
||||
of the "Table of Contents" (the directory of the pages contained within
|
||||
this .rb file). In all of the .rb file's I've seen, this remains
|
||||
constant with a value of 128h. However, I have tested an atypical .rb
|
||||
file where I placed the ToC at the end of the file (after all the file
|
||||
contents), and it worked fine. (I've chosen not to build any books in
|
||||
such a non-standard format, however.)
|
||||
|
||||
Immediately following this is an int32 with the length of the .rb file
|
||||
(so we can check if the file is complete or not).
|
||||
|
||||
All the bytes from here (offset 20h) up to offset 128h appear to only be
|
||||
used by an encrypted title. In a non-encrypted title, they are always 0.
|
||||
|
||||
The table of contents typically comes next (at offset 128h). It starts
|
||||
with an int32 count of the number of "page" entries (.rb-file sections)
|
||||
in the ToC. Each entry consists of a name (zero-padded to 32 bytes),
|
||||
followed by 3 int32s: the length of this entry's data segment, the
|
||||
absolute offset of the data in the .rb file, and a flag. The known flag
|
||||
values are: 1 (encrypted), 2 (info page), and 8 (deflated). The names
|
||||
are tweaked as needed to ensure that they are all unique. The current
|
||||
RocketWriter software uses a unique 6-digit number, a dash, up to 8
|
||||
characters from the filename, and then the re-mapped suffix for the data
|
||||
(.html, .hidx, .png, .info, etc.). My rbmake library simply ensures that
|
||||
the names are no longer than 15 characters (not counting the suffix) and
|
||||
are all unique.
|
||||
|
||||
Often the first item in the ToC is the info page, but it doesn't have to
|
||||
be. This page of information contains NAME=VALUE pairs that note the
|
||||
author, title, what the root-page's name is, etc. (See appendix A). This
|
||||
data is never encrypted nor compressed, so this entry's flag value is
|
||||
always "2".
|
||||
|
||||
An image page is always stored as a B&W image in PNG format. Since it
|
||||
has its own compression, it is stored without any additional attempt at
|
||||
deflation. I have also never seen an encrypted image, so its flag value
|
||||
is always 0.
|
||||
|
||||
An HTML page contains the tags and text that were re-written into a
|
||||
consistent syntax (this presumably makes the HTML renderer in the ReB
|
||||
itself simpler). HTML pages are typically compressed (See appendix B).
|
||||
Every HTML page appears to use the suffix .html no matter what the file
|
||||
name was on import (but I have seen older files with .htm used as the
|
||||
suffix, so the rocket appears to support both).
|
||||
|
||||
For every HTML page there is a corresponding .hidx page that contains a
|
||||
summary of the paragraph formatting and the position of the anchor names
|
||||
in the associated .html page (See appendix C). This page is sometimes
|
||||
compressed, depending on length (See appendix B).
|
||||
|
||||
There are also reference titles that have a .hkey page that contains a
|
||||
list of words that can be looked up in the associated .html page (See
|
||||
appendix D).
|
||||
|
||||
Immediately following the ToC is the data for each piece mentioned in
|
||||
the ToC, in the same order as it appeared in the ToC.
|
||||
|
||||
Finally, the end of the file appears to be padded with 20 bytes of 01h.
|
||||
|
||||
|
||||
Appendix A: Info Page Format
|
||||
----------------------------
|
||||
|
||||
The info page consists of a series of lines that contain "NAME=VALUE"
|
||||
strings. Each line is terminated by a single newline. Here are the
|
||||
values that the RocketWriter generates:
|
||||
|
||||
COMMENT=Info file for <title>
|
||||
TYPE=2
|
||||
TITLE=<title>
|
||||
AUTHOR=<author>
|
||||
URL=ebook:<long, unique string used for the file's name by the librarian>
|
||||
GENERATOR=<e.g. RocketLibrarian 1.3.216>
|
||||
PARSE=1
|
||||
OUTPUT=1
|
||||
BODY=<name of root HTML page (as it appears in the ToC)>
|
||||
MENUMARK=menumark.html
|
||||
SuggestedRetailPrice=<usually empty>
|
||||
|
||||
Encrypted titles have a few more entries (including those listed above):
|
||||
|
||||
ISBN=<ISBN number, including dashes>
|
||||
REVISION=<digits>
|
||||
TITLE_LANGUAGE=<en-us>
|
||||
PUB_NAME=<Publisher's name>
|
||||
PUBSERVER_ID=<digits>
|
||||
GENERATOR=<e.g. RocketPress 1.3.121>
|
||||
VERSION=<digits>
|
||||
USERNAME=<rocket-ID>
|
||||
COPY_ID=<digits>
|
||||
COPYRIGHT=<copyright>
|
||||
COPYTITLE=<another copyright?>
|
||||
|
||||
A reference title also has an indication that there is a .hkey page
|
||||
present, and may also have a GENRE of "Reference":
|
||||
|
||||
HKEY=1
|
||||
GENRE=Reference
|
||||
|
||||
|
||||
Appendix B: The format of compressed data
|
||||
-----------------------------------------
|
||||
|
||||
Compressed pages have a data section in the .rb file with the following
|
||||
format:
|
||||
|
||||
The first int32 is a count of the number of 4096-byte chunks of data we
|
||||
broke the uncompressed page into (the last chunk can be shorter than
|
||||
4096 bytes, of course).
|
||||
|
||||
This is immediately followed by an int32 with the length of the entire
|
||||
uncompressed data.
|
||||
|
||||
After this there are <count> int32s that indicate the size of each
|
||||
chunk's compressed data.
|
||||
|
||||
Following these length int32s is the output from a deflation (the
|
||||
algorithm used in gzip) for each 4096-byte chunk of the original data.
|
||||
It appears that you must use a window-bit size of 13 and a compression
|
||||
level of "best" to be compatible with the Rocket eBook's system software.
|
||||
|
||||
|
||||
Appendix C: HTML-index Page Format
|
||||
----------------------------------
|
||||
|
||||
The .hidx page's purpose is to allow the renderer to quickly look up the
|
||||
format of each paragraph (useful for random access to the data), and the
|
||||
position of the anchor names.
|
||||
|
||||
The first section lists the various paragraph-producing tags. It is
|
||||
headed by a line of "[tags <count>]", where <count> is the number of
|
||||
tags that follow this header. The tags are listed one per line, and have
|
||||
an implied enumeration from 0 to N-1 (which the other tags and the
|
||||
upcoming paragraph sections reference).
|
||||
|
||||
The first tag is typically (always?) "<HTML> -1". The number trailing
|
||||
the tag indicates what other tag (or sequence of tags, one per line) in
|
||||
which we are nested. So, if we have a <BR> nested inside a <P
|
||||
ALIGN="center">, it would be listed separately from a <BR> that was
|
||||
nested inside a normal paragraph, and each one would have a different
|
||||
trailing index number.
|
||||
|
||||
Following the tag section is the paragraph section. The heading is
|
||||
"[paragraphs <count>]", and is followed by a line for each paragraph.
|
||||
These lines consist of a character offset into the .html page for the
|
||||
start of the paragraph followed by a 0-relative offset into the tag
|
||||
section (indicating what kind of formatting to use for the indicated
|
||||
paragraph).
|
||||
|
||||
The paragraph-section character offsets point to the first bit of text
|
||||
after the associated tag.
|
||||
|
||||
The last section details the anchor names. The heading is
|
||||
"[names <count>]", and each item that follows is a quoted string of the
|
||||
anchor name, followed by a character offset into the .html page where
|
||||
we'll find that name. If there are no names in the associated HTML
|
||||
section, the heading is included with a 0 count (i.e. "[names 0]").
|
||||
|
||||
The name-section character offsets point to the start of the anchor tag
|
||||
(not after the tag, like the offsets in the "paragraphs" section).
|
||||
|
||||
The lines are terminated by newlines (in standard unix fashion).
|
||||
|
||||
For example:
|
||||
|
||||
[tags 10]
|
||||
<HTML> -1
|
||||
<BODY> 0
|
||||
<P ALIGN="right"> 1
|
||||
<P ALIGN="left"> 1
|
||||
<P> 1
|
||||
<H3 ALIGN="center"> 1
|
||||
<P ALIGN="center"> 1
|
||||
<BR> 6
|
||||
<H2 ALIGN="center"> 1
|
||||
<BR> 1
|
||||
|
||||
[paragraphs 42]
|
||||
160 9
|
||||
164 9
|
||||
184 8
|
||||
220 8
|
||||
261 6
|
||||
316 5
|
||||
359 1
|
||||
379 6
|
||||
410 6
|
||||
460 7
|
||||
511 7
|
||||
564 7
|
||||
616 7
|
||||
668 7
|
||||
720 7
|
||||
773 7
|
||||
827 7
|
||||
880 7
|
||||
933 7
|
||||
988 7
|
||||
1043 7
|
||||
1100 7
|
||||
1157 7
|
||||
1214 7
|
||||
1270 7
|
||||
1328 7
|
||||
1385 7
|
||||
1442 7
|
||||
1497 7
|
||||
1556 7
|
||||
1561 7
|
||||
1635 1
|
||||
1656 5
|
||||
1690 6
|
||||
1737 7
|
||||
1773 5
|
||||
1798 4
|
||||
1826 3
|
||||
2663 1
|
||||
2668 4
|
||||
2689 2
|
||||
2730 8
|
||||
|
||||
[names 1]
|
||||
"ch1" 2689
|
||||
|
||||
|
||||
Appendix D: HTML-key Page Format
|
||||
--------------------------------
|
||||
|
||||
The .hkey page contains a list of words, one per line, sorted in a
|
||||
strict ASCII sequence, each one followed by a tab and the offset in the
|
||||
.html page of the word's data. I presume that the .hkey page must share
|
||||
the same name prefix as its related .html page.
|
||||
|
||||
If the names contain high-bit characters, they are translated into
|
||||
regular ASCII in the .hkey file, since this allows the user to search
|
||||
for the words using unaccented characters.
|
||||
|
||||
The lines are terminated with a newline (in standard unix fashion).
|
||||
|
||||
An example:
|
||||
|
||||
a 5
|
||||
apple 38
|
||||
b 84
|
||||
book 104
|
||||
|
||||
Each of these offsets points to a paragraph tag in the associated .html
|
||||
page. I have only seen this sequence of tags used so far:
|
||||
|
||||
<P><BIG><B>word</B></BIG> other stuff</P>
|
||||
|
||||
I have seen multiple <B>...</B> tags in the middle of the single set of
|
||||
<BIG>...</BIG> tags, but this is the basic tag format.
|
||||
|
||||
The offset in the .hkey page points to the start of the <P> tag.
|
||||
|
56
format_docs/tcr.txt
Normal file
56
format_docs/tcr.txt
Normal file
@ -0,0 +1,56 @@
|
||||
About
|
||||
-----
|
||||
|
||||
Text compression format that can be decompressed starting at any point.
|
||||
Little-endian byte ordering is used.
|
||||
|
||||
|
||||
Header
|
||||
------
|
||||
|
||||
TCR files always start with:
|
||||
|
||||
!!8-Bit!!
|
||||
|
||||
|
||||
Layout
|
||||
------
|
||||
|
||||
Header
|
||||
256 key dictionary
|
||||
compressed text
|
||||
|
||||
|
||||
Dictionary
|
||||
----------
|
||||
|
||||
A dictionary of key and replacement string. There are a total of 256 keys,
|
||||
0 - 255. Each string is preceded with one byte that represents the length of
|
||||
the string.
|
||||
|
||||
|
||||
Compressed text
|
||||
---------------
|
||||
|
||||
The compressed text is a series of values 0-255 which correspond to a key and
|
||||
thus a string. Reassembling is replacing each key in the compressed text with
|
||||
its corresponding string.
|
||||
|
||||
|
||||
Compressor
|
||||
-----------------
|
||||
|
||||
From Andrew Giddings TCR.c (http://www.cix.co.uk/~gidds/Software/TCR.html):
|
||||
|
||||
The TCR compression format is easy to describe: after the fixed header is a
|
||||
dictionary of 256 strings, each preceded by a length byte. The rest of the
|
||||
file is a list of codes from this dictionary.
|
||||
|
||||
The compressor works by starting with each code defined as itself. While
|
||||
there's an unused code, it finds the most common two-code combination, and
|
||||
creates a new code for it, replacing all occurrences in the text with the
|
||||
new code.
|
||||
|
||||
It also searches for codes that are always followed by another, which it can
|
||||
merge, possibly freeing up some.
|
||||
|
@ -52,6 +52,17 @@ p.formats {
|
||||
text-indent: 0.0in;
|
||||
}
|
||||
|
||||
/*
|
||||
* Minimize widows and orphans by logically grouping chunks
|
||||
* Some reports of problems with Sony (ADE) ereaders
|
||||
* ADE: page-break-inside:avoid;
|
||||
* iBooks: display:inline-block;
|
||||
* width:100%;
|
||||
*/
|
||||
div.author_logical_group {
|
||||
page-break-inside:avoid;
|
||||
}
|
||||
|
||||
div.description > p:first-child {
|
||||
margin: 0 0 0 0;
|
||||
text-indent: 0em;
|
||||
@ -62,27 +73,19 @@ div.description {
|
||||
text-indent: 1em;
|
||||
}
|
||||
|
||||
/*
|
||||
* Attempt to minimize widows and orphans by logically grouping chunks
|
||||
* Recommend enabling for iPad
|
||||
* Some reports of problems with Sony ereaders, presumably ADE engines
|
||||
*/
|
||||
/*
|
||||
div.logical_group {
|
||||
display:inline-block;
|
||||
width:100%;
|
||||
div.initial_letter {
|
||||
page-break-before:always;
|
||||
}
|
||||
*/
|
||||
|
||||
p.date_index {
|
||||
p.author_title_letter_index {
|
||||
font-size:x-large;
|
||||
text-align:center;
|
||||
font-weight:bold;
|
||||
margin-top:1em;
|
||||
margin-top:0px;
|
||||
margin-bottom:0px;
|
||||
}
|
||||
|
||||
p.letter_index {
|
||||
p.date_index {
|
||||
font-size:x-large;
|
||||
text-align:center;
|
||||
font-weight:bold;
|
||||
@ -99,6 +102,14 @@ p.series {
|
||||
text-indent:-2em;
|
||||
}
|
||||
|
||||
p.series_letter_index {
|
||||
font-size:x-large;
|
||||
text-align:center;
|
||||
font-weight:bold;
|
||||
margin-top:1em;
|
||||
margin-bottom:0px;
|
||||
}
|
||||
|
||||
p.read_book {
|
||||
text-align:left;
|
||||
margin-top:0px;
|
||||
|
@ -13,15 +13,12 @@ class MSNSankeiNewsProduct(BasicNewsRecipe):
|
||||
description = 'Products release from Japan'
|
||||
oldest_article = 7
|
||||
max_articles_per_feed = 100
|
||||
encoding = 'Shift_JIS'
|
||||
encoding = 'utf-8'
|
||||
language = 'ja'
|
||||
cover_url = 'http://sankei.jp.msn.com/images/common/sankeShinbunLogo.jpg'
|
||||
masthead_url = 'http://sankei.jp.msn.com/images/common/sankeiNewsLogo.gif'
|
||||
|
||||
feeds = [(u'\u65b0\u5546\u54c1', u'http://sankei.jp.msn.com/rss/news/release.xml')]
|
||||
|
||||
remove_tags_before = dict(id="__r_article_title__")
|
||||
remove_tags_after = dict(id="ajax_release_news")
|
||||
remove_tags = [{'class':"parent chromeCustom6G"},
|
||||
dict(id="RelatedImg")
|
||||
]
|
||||
remove_tags_before = dict(id="NewsTitle")
|
||||
remove_tags_after = dict(id="RelatedTitle")
|
||||
|
@ -1,7 +1,5 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>'
|
||||
__copyright__ = '2009-2011, Darko Miletic <darko.miletic at gmail.com>'
|
||||
|
||||
'''
|
||||
theonion.com
|
||||
@ -15,26 +13,39 @@ class TheOnion(BasicNewsRecipe):
|
||||
description = "America's finest news source"
|
||||
oldest_article = 2
|
||||
max_articles_per_feed = 100
|
||||
publisher = u'Onion, Inc.'
|
||||
category = u'humor, news, USA'
|
||||
language = 'en'
|
||||
|
||||
publisher = 'Onion, Inc.'
|
||||
category = 'humor, news, USA'
|
||||
language = 'en'
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
encoding = 'utf-8'
|
||||
remove_javascript = True
|
||||
html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'
|
||||
publication_type = 'newsportal'
|
||||
masthead_url = 'http://o.onionstatic.com/img/headers/onion_190.png'
|
||||
extra_css = """
|
||||
body{font-family: Helvetica,Arial,sans-serif}
|
||||
.section_title{color: gray; text-transform: uppercase}
|
||||
.title{font-family: Georgia,serif}
|
||||
.meta{color: gray; display: inline}
|
||||
.has_caption{display: block}
|
||||
.caption{font-size: x-small; color: gray; margin-bottom: 0.8em}
|
||||
"""
|
||||
|
||||
html2lrf_options = [
|
||||
'--comment' , description
|
||||
, '--category' , category
|
||||
, '--publisher' , publisher
|
||||
]
|
||||
|
||||
keep_only_tags = [dict(name='div', attrs={'id':'main'})]
|
||||
conversion_options = {
|
||||
'comment' : description
|
||||
, 'tags' : category
|
||||
, 'publisher': publisher
|
||||
, 'language' : language
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='h2', attrs={'class':['section_title','title']})
|
||||
,dict(attrs={'class':['main_image','meta','article_photo_lead','article_body']})
|
||||
,dict(attrs={'id':['entries']})
|
||||
]
|
||||
remove_attributes=['lang','rel']
|
||||
remove_tags_after = dict(attrs={'class':['article_body','feature_content']})
|
||||
remove_tags = [
|
||||
dict(name=['object','link','iframe','base'])
|
||||
dict(name=['object','link','iframe','base','meta'])
|
||||
,dict(name='div', attrs={'class':['toolbar_side','graphical_feature','toolbar_bottom']})
|
||||
,dict(name='div', attrs={'id':['recent_slider','sidebar','pagination','related_media']})
|
||||
]
|
||||
@ -44,3 +55,28 @@ class TheOnion(BasicNewsRecipe):
|
||||
(u'Daily' , u'http://feeds.theonion.com/theonion/daily' )
|
||||
,(u'Sports' , u'http://feeds.theonion.com/theonion/sports' )
|
||||
]
|
||||
|
||||
def get_article_url(self, article):
|
||||
artl = BasicNewsRecipe.get_article_url(self, article)
|
||||
if artl.startswith('http://www.theonion.com/audio/'):
|
||||
artl = None
|
||||
return artl
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
for item in soup.findAll(style=True):
|
||||
del item['style']
|
||||
for item in soup.findAll('a'):
|
||||
limg = item.find('img')
|
||||
if item.string is not None:
|
||||
str = item.string
|
||||
item.replaceWith(str)
|
||||
else:
|
||||
if limg:
|
||||
item.name = 'div'
|
||||
item.attrs = []
|
||||
if not limg.has_key('alt'):
|
||||
limg['alt'] = 'image'
|
||||
else:
|
||||
str = self.tag_to_string(item)
|
||||
item.replaceWith(str)
|
||||
return soup
|
||||
|
@ -89,21 +89,21 @@ class NOOK_COLOR(NOOK):
|
||||
BCD = [0x216]
|
||||
WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = 'EBOOK_DISK'
|
||||
|
||||
EBOOK_DIR_MAIN = 'My Files/Books'
|
||||
EBOOK_DIR_MAIN = 'My Files'
|
||||
|
||||
'''
|
||||
def create_upload_path(self, path, mdata, fname, create_dirs=True):
|
||||
filepath = NOOK.create_upload_path(self, path, mdata, fname,
|
||||
create_dirs=create_dirs)
|
||||
edm = self.EBOOK_DIR_MAIN.replace('/', os.sep)
|
||||
npath = os.path.join(edm, _('News')) + os.sep
|
||||
if npath in filepath:
|
||||
filepath = filepath.replace(npath, os.sep.join('My Files',
|
||||
'Magazines')+os.sep)
|
||||
filedir = os.path.dirname(filepath)
|
||||
if create_dirs and not os.path.exists(filedir):
|
||||
os.makedirs(filedir)
|
||||
create_dirs=False)
|
||||
edm = self.EBOOK_DIR_MAIN
|
||||
subdir = 'Books'
|
||||
if mdata.tags:
|
||||
if _('News') in mdata.tags:
|
||||
subdir = 'Magazines'
|
||||
filepath = filepath.replace(os.sep+edm+os.sep,
|
||||
os.sep+edm+os.sep+subdir+os.sep)
|
||||
filedir = os.path.dirname(filepath)
|
||||
if create_dirs and not os.path.exists(filedir):
|
||||
os.makedirs(filedir)
|
||||
|
||||
return filepath
|
||||
'''
|
||||
|
||||
|
@ -71,19 +71,28 @@ class FB2MLizer(object):
|
||||
return u'<?xml version="1.0" encoding="UTF-8"?>' + output
|
||||
|
||||
def clean_text(self, text):
|
||||
# Condense empty paragraphs into a line break.
|
||||
text = re.sub(r'(?miu)(<p>\s*</p>\s*){3,}', '<p><empty-line /></p>', text)
|
||||
# Remove empty paragraphs.
|
||||
text = re.sub(r'(?miu)<p>\s*</p>', '', text)
|
||||
# Clean up pargraph endings.
|
||||
text = re.sub(r'(?miu)\s*</p>', '</p>', text)
|
||||
# Put paragraphs following a paragraph on a separate line.
|
||||
text = re.sub(r'(?miu)</p>\s*<p>', '</p>\n\n<p>', text)
|
||||
|
||||
# Remove empty title elements.
|
||||
text = re.sub(r'(?miu)<title>\s*</title>', '', text)
|
||||
text = re.sub(r'(?miu)\s+</title>', '</title>', text)
|
||||
|
||||
# Remove empty sections.
|
||||
text = re.sub(r'(?miu)<section>\s*</section>', '', text)
|
||||
# Clean up sections start and ends.
|
||||
text = re.sub(r'(?miu)\s*</section>', '\n</section>', text)
|
||||
text = re.sub(r'(?miu)</section>\s*', '</section>\n\n', text)
|
||||
text = re.sub(r'(?miu)\s*<section>', '\n<section>', text)
|
||||
text = re.sub(r'(?miu)<section>\s*', '<section>\n', text)
|
||||
text = re.sub(r'(?miu)</section><section>', '</section>\n\n<section>', text)
|
||||
# Put sectnions followed by sections on a separate line.
|
||||
text = re.sub(r'(?miu)</section>\s*<section>', '</section>\n\n<section>', text)
|
||||
|
||||
if self.opts.insert_blank_line:
|
||||
text = re.sub(r'(?miu)</p>', '</p><empty-line />', text)
|
||||
@ -338,6 +347,11 @@ class FB2MLizer(object):
|
||||
tags = []
|
||||
# First tag in tree
|
||||
tag = barename(elem_tree.tag)
|
||||
# Number of blank lines above tag
|
||||
try:
|
||||
ems = int(round((float(style.marginTop) / style.fontSize) - 1))
|
||||
except:
|
||||
ems = 0
|
||||
|
||||
# Convert TOC entries to <title>s and add <section>s
|
||||
if self.opts.sectionize == 'toc':
|
||||
@ -370,7 +384,9 @@ class FB2MLizer(object):
|
||||
fb2_out.append('<section>')
|
||||
self.section_level += 1
|
||||
|
||||
# Process the XHTML tag if it needs to be converted to an FB2 tag.
|
||||
# Process the XHTML tag and styles. Converted to an FB2 tag.
|
||||
# Use individual if statement not if else. There can be
|
||||
# only one XHTML tag but it can have multiple styles.
|
||||
if tag == 'img':
|
||||
if elem_tree.attrib.get('src', None):
|
||||
# Only write the image tag if it is in the manifest.
|
||||
@ -381,7 +397,11 @@ class FB2MLizer(object):
|
||||
fb2_out += p_txt
|
||||
tags += p_tag
|
||||
fb2_out.append('<image xlink:href="#%s" />' % self.image_hrefs[page.abshref(elem_tree.attrib['src'])])
|
||||
elif tag == 'br':
|
||||
if tag in ('br', 'hr') or ems:
|
||||
if not ems:
|
||||
multiplier = 1
|
||||
else:
|
||||
multiplier = ems
|
||||
if self.in_p:
|
||||
closed_tags = []
|
||||
open_tags = tag_stack+tags
|
||||
@ -391,52 +411,38 @@ class FB2MLizer(object):
|
||||
closed_tags.append(t)
|
||||
if t == 'p':
|
||||
break
|
||||
fb2_out.append('<empty-line />')
|
||||
fb2_out.append('<empty-line />' * multiplier)
|
||||
closed_tags.reverse()
|
||||
for t in closed_tags:
|
||||
fb2_out.append('<%s>' % t)
|
||||
else:
|
||||
fb2_out.append('<empty-line />')
|
||||
elif tag in ('div', 'li', 'p'):
|
||||
fb2_out.append('<empty-line />' * multiplier)
|
||||
if tag in ('div', 'li', 'p'):
|
||||
p_text, added_p = self.close_open_p(tag_stack+tags)
|
||||
fb2_out += p_text
|
||||
if added_p:
|
||||
tags.append('p')
|
||||
elif tag == 'b':
|
||||
if tag == 'b' or style['font-weight'] in ('bold', 'bolder'):
|
||||
s_out, s_tags = self.handle_simple_tag('strong', tag_stack+tags)
|
||||
fb2_out += s_out
|
||||
tags += s_tags
|
||||
elif tag == 'i':
|
||||
if tag == 'i' or style['font-style'] == 'italic':
|
||||
s_out, s_tags = self.handle_simple_tag('emphasis', tag_stack+tags)
|
||||
fb2_out += s_out
|
||||
tags += s_tags
|
||||
elif tag in ('del', 'strike'):
|
||||
if tag in ('del', 'strike') or style['text-decoration'] == 'line-through':
|
||||
s_out, s_tags = self.handle_simple_tag('strikethrough', tag_stack+tags)
|
||||
fb2_out += s_out
|
||||
tags += s_tags
|
||||
elif tag == 'sub':
|
||||
if tag == 'sub':
|
||||
s_out, s_tags = self.handle_simple_tag('sub', tag_stack+tags)
|
||||
fb2_out += s_out
|
||||
tags += s_tags
|
||||
elif tag == 'sup':
|
||||
if tag == 'sup':
|
||||
s_out, s_tags = self.handle_simple_tag('sup', tag_stack+tags)
|
||||
fb2_out += s_out
|
||||
tags += s_tags
|
||||
|
||||
# Processes style information.
|
||||
if style['font-style'] == 'italic':
|
||||
s_out, s_tags = self.handle_simple_tag('emphasis', tag_stack+tags)
|
||||
fb2_out += s_out
|
||||
tags += s_tags
|
||||
elif style['font-weight'] in ('bold', 'bolder'):
|
||||
s_out, s_tags = self.handle_simple_tag('strong', tag_stack+tags)
|
||||
fb2_out += s_out
|
||||
tags += s_tags
|
||||
elif style['text-decoration'] == 'line-through':
|
||||
s_out, s_tags = self.handle_simple_tag('strikethrough', tag_stack+tags)
|
||||
fb2_out += s_out
|
||||
tags += s_tags
|
||||
|
||||
# Process element text.
|
||||
if hasattr(elem_tree, 'text') and elem_tree.text:
|
||||
if not self.in_p:
|
||||
|
@ -633,7 +633,7 @@ class Style(object):
|
||||
def lineHeight(self):
|
||||
if self._lineHeight is None:
|
||||
result = None
|
||||
parent = self._getparent()
|
||||
parent = self._get_parent()
|
||||
if 'line-height' in self._style:
|
||||
lineh = self._style['line-height']
|
||||
if lineh == 'normal':
|
||||
|
@ -67,10 +67,11 @@ class TXTMLizer(object):
|
||||
output.append(self.get_toc())
|
||||
for item in self.oeb_book.spine:
|
||||
self.log.debug('Converting %s to TXT...' % item.href)
|
||||
stylizer = Stylizer(item.data, item.href, self.oeb_book, self.opts, self.opts.output_profile)
|
||||
content = unicode(etree.tostring(item.data.find(XHTML('body')), encoding=unicode))
|
||||
content = unicode(etree.tostring(item.data, encoding=unicode))
|
||||
content = self.remove_newlines(content)
|
||||
output += self.dump_text(etree.fromstring(content), stylizer, item)
|
||||
content = etree.fromstring(content)
|
||||
stylizer = Stylizer(content, item.href, self.oeb_book, self.opts, self.opts.output_profile)
|
||||
output += self.dump_text(content.find(XHTML('body')), stylizer, item)
|
||||
output += '\n\n\n\n\n\n'
|
||||
output = u''.join(output)
|
||||
output = u'\n'.join(l.rstrip() for l in output.splitlines())
|
||||
@ -219,11 +220,16 @@ class TXTMLizer(object):
|
||||
if tag in SPACE_TAGS:
|
||||
text.append(u' ')
|
||||
|
||||
# Scene breaks.
|
||||
# Hard scene breaks.
|
||||
if tag == 'hr':
|
||||
text.append('\n\n* * *\n\n')
|
||||
elif style['margin-top']:
|
||||
text.append('\n\n' + '\n' * round(style['margin-top']))
|
||||
# Soft scene breaks.
|
||||
try:
|
||||
ems = int(round((float(style.marginTop) / style.fontSize) - 1))
|
||||
if ems:
|
||||
text.append('\n' * ems)
|
||||
except:
|
||||
pass
|
||||
|
||||
# Process tags that contain text.
|
||||
if hasattr(elem, 'text') and elem.text:
|
||||
|
@ -492,8 +492,7 @@ title and author are swapped before the title case is set</string>
|
||||
<item>
|
||||
<widget class="QCheckBox" name="update_title_sort">
|
||||
<property name="toolTip">
|
||||
<string>Recompute the title sort value and store it in title sort.
|
||||
This will happen after any title case changes</string>
|
||||
<string>Update title sort based on the current title. This will be applied only after other changes to title.</string>
|
||||
</property>
|
||||
<property name="text">
|
||||
<string>Update &title sort</string>
|
||||
|
@ -420,7 +420,8 @@ class ResultCache(SearchQueryParser): # {{{
|
||||
return candidates - res
|
||||
return res
|
||||
|
||||
def get_matches(self, location, query, allow_recursion=True, candidates=None):
|
||||
def get_matches(self, location, query, candidates=None,
|
||||
allow_recursion=True):
|
||||
matches = set([])
|
||||
if candidates is None:
|
||||
candidates = self.universal_set()
|
||||
@ -434,8 +435,8 @@ class ResultCache(SearchQueryParser): # {{{
|
||||
if isinstance(location, list):
|
||||
if allow_recursion:
|
||||
for loc in location:
|
||||
matches |= self.get_matches(loc, query, candidates,
|
||||
allow_recursion=False)
|
||||
matches |= self.get_matches(loc, query,
|
||||
candidates=candidates, allow_recursion=False)
|
||||
return matches
|
||||
raise ParseException(query, len(query), 'Recursive query group detected', self)
|
||||
|
||||
|
@ -1841,8 +1841,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
||||
body.insert(btc,pTag)
|
||||
btc += 1
|
||||
|
||||
# <p class="letter_index">
|
||||
# <p class="book_title">
|
||||
divTag = Tag(soup, "div")
|
||||
dtc = 0
|
||||
current_letter = ""
|
||||
@ -1870,11 +1868,12 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
||||
divTag.insert(dtc, divRunningTag)
|
||||
dtc += 1
|
||||
divRunningTag = Tag(soup, 'div')
|
||||
divRunningTag['class'] = "logical_group"
|
||||
if dtc > 0:
|
||||
divRunningTag['class'] = "initial_letter"
|
||||
drtc = 0
|
||||
current_letter = self.letter_or_symbol(book['title_sort'][0])
|
||||
pIndexTag = Tag(soup, "p")
|
||||
pIndexTag['class'] = "letter_index"
|
||||
pIndexTag['class'] = "author_title_letter_index"
|
||||
aTag = Tag(soup, "a")
|
||||
aTag['name'] = "%s" % self.letter_or_symbol(book['title_sort'][0])
|
||||
pIndexTag.insert(0,aTag)
|
||||
@ -1982,8 +1981,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
||||
body.insert(btc, aTag)
|
||||
btc += 1
|
||||
|
||||
# <p class="letter_index">
|
||||
# <p class="author_index">
|
||||
divTag = Tag(soup, "div")
|
||||
dtc = 0
|
||||
divOpeningTag = None
|
||||
@ -2017,10 +2014,11 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
||||
current_letter = self.letter_or_symbol(book['author_sort'][0].upper())
|
||||
author_count = 0
|
||||
divOpeningTag = Tag(soup, 'div')
|
||||
divOpeningTag['class'] = "logical_group"
|
||||
if dtc > 0:
|
||||
divOpeningTag['class'] = "initial_letter"
|
||||
dotc = 0
|
||||
pIndexTag = Tag(soup, "p")
|
||||
pIndexTag['class'] = "letter_index"
|
||||
pIndexTag['class'] = "author_title_letter_index"
|
||||
aTag = Tag(soup, "a")
|
||||
aTag['name'] = "%sauthors" % self.letter_or_symbol(current_letter)
|
||||
pIndexTag.insert(0,aTag)
|
||||
@ -2032,16 +2030,21 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
||||
# Start a new author
|
||||
current_author = book['author']
|
||||
author_count += 1
|
||||
if author_count == 2:
|
||||
if author_count >= 2:
|
||||
# Add divOpeningTag to divTag, kill divOpeningTag
|
||||
divTag.insert(dtc, divOpeningTag)
|
||||
dtc += 1
|
||||
divOpeningTag = None
|
||||
dotc = 0
|
||||
if divOpeningTag:
|
||||
divTag.insert(dtc, divOpeningTag)
|
||||
dtc += 1
|
||||
divOpeningTag = None
|
||||
dotc = 0
|
||||
|
||||
# Create a divRunningTag for the next author
|
||||
if author_count > 2:
|
||||
divTag.insert(dtc, divRunningTag)
|
||||
dtc += 1
|
||||
|
||||
# Create a divRunningTag for the rest of the authors in this letter
|
||||
divRunningTag = Tag(soup, 'div')
|
||||
divRunningTag['class'] = "logical_group"
|
||||
divRunningTag['class'] = "author_logical_group"
|
||||
drtc = 0
|
||||
|
||||
non_series_books = 0
|
||||
@ -2373,8 +2376,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
||||
body.insert(btc,pTag)
|
||||
btc += 1
|
||||
|
||||
# <p class="letter_index">
|
||||
# <p class="author_index">
|
||||
divTag = Tag(soup, "div")
|
||||
dtc = 0
|
||||
|
||||
@ -2558,8 +2559,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
||||
body.insert(btc, aTag)
|
||||
btc += 1
|
||||
|
||||
# <p class="letter_index">
|
||||
# <p class="author_index">
|
||||
divTag = Tag(soup, "div")
|
||||
dtc = 0
|
||||
|
||||
@ -2661,8 +2660,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
||||
body.insert(btc, aTag)
|
||||
btc += 1
|
||||
|
||||
# <p class="letter_index">
|
||||
# <p class="author_index">
|
||||
divTag = Tag(soup, "div")
|
||||
dtc = 0
|
||||
current_letter = ""
|
||||
@ -2677,7 +2674,7 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
||||
# Start a new letter with Index letter
|
||||
current_letter = self.letter_or_symbol(sort_title[0].upper())
|
||||
pIndexTag = Tag(soup, "p")
|
||||
pIndexTag['class'] = "letter_index"
|
||||
pIndexTag['class'] = "series_letter_index"
|
||||
aTag = Tag(soup, "a")
|
||||
aTag['name'] = "%s_series" % self.letter_or_symbol(current_letter)
|
||||
pIndexTag.insert(0,aTag)
|
||||
|
@ -457,7 +457,7 @@ class CustomColumns(object):
|
||||
if num is not None:
|
||||
data = self.custom_column_num_map[num]
|
||||
if data['datatype'] == 'composite':
|
||||
return set()
|
||||
return set([])
|
||||
if not data['editable']:
|
||||
raise ValueError('Column %r is not editable'%data['label'])
|
||||
table, lt = self.custom_table_names(data['num'])
|
||||
@ -468,7 +468,7 @@ class CustomColumns(object):
|
||||
if data['datatype'] == 'series' and extra is None:
|
||||
(val, extra) = self._get_series_values(val)
|
||||
|
||||
books_to_refresh = set()
|
||||
books_to_refresh = set([])
|
||||
if data['normalized']:
|
||||
if data['datatype'] == 'enumeration' and (
|
||||
val and val not in data['display']['enum_values']):
|
||||
@ -497,7 +497,7 @@ class CustomColumns(object):
|
||||
ex = existing[idx]
|
||||
xid = self.conn.get(
|
||||
'SELECT id FROM %s WHERE value=?'%table, (ex,), all=False)
|
||||
if ex != x:
|
||||
if allow_case_change and ex != x:
|
||||
case_change = True
|
||||
self.conn.execute(
|
||||
'UPDATE %s SET value=? WHERE id=?'%table, (x, xid))
|
||||
|
@ -1636,7 +1636,8 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
||||
if not authors:
|
||||
authors = [_('Unknown')]
|
||||
self.conn.execute('DELETE FROM books_authors_link WHERE book=?',(id,))
|
||||
books_to_refresh = set()
|
||||
books_to_refresh = set([])
|
||||
final_authors = []
|
||||
for a in authors:
|
||||
case_change = False
|
||||
if not a:
|
||||
@ -1648,13 +1649,17 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
||||
if aus:
|
||||
aid, name = aus[0]
|
||||
# Handle change of case
|
||||
if allow_case_change and name != a:
|
||||
self.conn.execute('''UPDATE authors
|
||||
SET name=? WHERE id=?''', (a, aid))
|
||||
case_change = True
|
||||
if name != a:
|
||||
if allow_case_change:
|
||||
self.conn.execute('''UPDATE authors
|
||||
SET name=? WHERE id=?''', (a, aid))
|
||||
case_change = True
|
||||
else:
|
||||
a = name
|
||||
else:
|
||||
aid = self.conn.execute('''INSERT INTO authors(name)
|
||||
VALUES (?)''', (a,)).lastrowid
|
||||
final_authors.append(a.replace('|', ','))
|
||||
try:
|
||||
self.conn.execute('''INSERT INTO books_authors_link(book, author)
|
||||
VALUES (?,?)''', (id, aid))
|
||||
@ -1668,7 +1673,7 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
||||
self.conn.execute('UPDATE books SET author_sort=? WHERE id=?',
|
||||
(ss, id))
|
||||
self.data.set(id, self.FIELD_MAP['authors'],
|
||||
','.join([a.replace(',', '|') for a in authors]),
|
||||
','.join([a.replace(',', '|') for a in final_authors]),
|
||||
row_is_id=True)
|
||||
self.data.set(id, self.FIELD_MAP['author_sort'], ss, row_is_id=True)
|
||||
aum = self.authors_with_sort_strings(id, index_is_id=True)
|
||||
@ -1716,6 +1721,10 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
||||
title = title.decode(preferred_encoding, 'replace')
|
||||
self.conn.execute('UPDATE books SET title=? WHERE id=?', (title, id))
|
||||
self.data.set(id, self.FIELD_MAP['title'], title, row_is_id=True)
|
||||
ts = self.conn.get('SELECT sort FROM books WHERE id=?', (id,),
|
||||
all=False)
|
||||
if ts:
|
||||
self.data.set(id, self.FIELD_MAP['sort'], ts, row_is_id=True)
|
||||
return True
|
||||
|
||||
def set_title(self, id, title, notify=True, commit=True):
|
||||
@ -1768,10 +1777,13 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
||||
WHERE name=?''', (publisher,))
|
||||
if pubx:
|
||||
aid, cur_name = pubx[0]
|
||||
if allow_case_change and publisher != cur_name:
|
||||
self.conn.execute('''UPDATE publishers SET name=?
|
||||
if publisher != cur_name:
|
||||
if allow_case_change:
|
||||
self.conn.execute('''UPDATE publishers SET name=?
|
||||
WHERE id=?''', (publisher, aid))
|
||||
case_change = True
|
||||
case_change = True
|
||||
else:
|
||||
publisher = cur_name
|
||||
else:
|
||||
aid = self.conn.execute('''INSERT INTO publishers(name)
|
||||
VALUES (?)''', (publisher,)).lastrowid
|
||||
@ -2163,7 +2175,7 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
||||
FROM books_tags_link WHERE tag=tags.id) < 1''')
|
||||
otags = self.get_tags(id)
|
||||
tags = self.cleanup_tags(tags)
|
||||
books_to_refresh = set()
|
||||
books_to_refresh = set([])
|
||||
for tag in (set(tags)-otags):
|
||||
case_changed = False
|
||||
tag = tag.strip()
|
||||
@ -2258,7 +2270,7 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
||||
WHERE (SELECT COUNT(id) FROM books_series_link
|
||||
WHERE series=series.id) < 1''')
|
||||
(series, idx) = self._get_series_values(series)
|
||||
books_to_refresh = set()
|
||||
books_to_refresh = set([])
|
||||
if series:
|
||||
case_change = False
|
||||
if not isinstance(series, unicode):
|
||||
@ -2268,9 +2280,12 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
||||
sx = self.conn.get('SELECT id,name from series WHERE name=?', (series,))
|
||||
if sx:
|
||||
aid, cur_name = sx[0]
|
||||
if allow_case_change and cur_name != series:
|
||||
self.conn.execute('UPDATE series SET name=? WHERE id=?', (series, aid))
|
||||
case_change = True
|
||||
if cur_name != series:
|
||||
if allow_case_change:
|
||||
self.conn.execute('UPDATE series SET name=? WHERE id=?', (series, aid))
|
||||
case_change = True
|
||||
else:
|
||||
series = cur_name
|
||||
else:
|
||||
aid = self.conn.execute('INSERT INTO series(name) VALUES (?)', (series,)).lastrowid
|
||||
self.conn.execute('INSERT INTO books_series_link(book, series) VALUES (?,?)', (id, aid))
|
||||
|
Loading…
x
Reference in New Issue
Block a user