Merge from trunk

2025-07-09 03:04:10 -04:00 · 2011-02-03 21:16:48 +00:00 · 2011-02-03 21:16:48 +00:00 · 90ef9949ca
commit 90ef9949ca
parent 75b28092d3 8749611440
25 changed files with 8337 additions and 128 deletions
--- a/format_docs/compression/palmdoc.txt
+++ b/format_docs/compression/palmdoc.txt
@ -0,0 +1,54 @@
+About
+-----
+
+PalmDOC uses LZ77 compression techniques. DOC files can contain only compressed
+text. The format does not allow for any text formatting. This keeps files
+small, in keeping with the Palm philosophy. However, extensions to the format
+can use tags, such as HTML or PML, to include formatting within text. These
+extensions to PalmDoc are not interchangeable and are the basis for most eBook
+Reader formats on Palm devices.
+
+LZ77 algorithms achieve compression by replacing portions of the data with
+references to matching data that has already passed through both encoder and
+decoder. A match is encoded by a pair of numbers called a length-distance pair,
+which is equivalent to the statement "each of the next length characters is
+equal to the character exactly distance characters behind it in the
+uncompressed stream." (The "distance" is sometimes called the "offset" instead.)
+
+In the PalmDoc format, a length-distance pair is always encoded by a two-byte
+sequence. Of the 16 bits that make up these two bytes, 11 bits go to encoding
+the distance, 3 go to encoding the length, and the remaining two are used to
+make sure the decoder can identify the first byte as the beginning of such a
+two-byte sequence.
+
+PalmDoc combines LZ77 with a simple kind of byte pair compression.
+
+
+PalmDoc files are decoded as follows:
+-------------------------------------
+
+Read a byte from the compressed stream. If the byte is
+
+0x00: "1 literal" copy that byte unmodified to the decompressed stream.
+
+0x09 to 0x7f: "1 literal" copy that byte unmodified to the decompressed stream.
+
+0x01 to 0x08: "literals": the byte is interpreted as a count from 1 to 8, and
+that many literals are copied unmodified from the compressed stream to the
+decompressed stream.
+
+0x80 to 0xbf: "length, distance" pair: the 2 leftmost bits of this byte ('10')
+are discarded, and the following 6 bits are combined with the 8 bits of the
+next byte to make a 14 bit "distance, length" item. Those 14 bits are broken
+into 11 bits of distance backwards from the current location in the
+uncompressed text, and 3 bits of length to copy from that point
+(copying n+3 bytes, 3 to 10 bytes).
+
+0xc0 to 0xff: "byte pair": this byte is decoded into 2 characters: a space
+character, and a letter formed from this byte XORed with 0x80.
+
+Repeat from the beginning until there is no more bytes in the compressed file.
+
+PalmDOC data is always divided into 4096 byte blocks and the blocks are acted
+upon independently. 
+
--- a/format_docs/compression/zip.txt
+++ b/format_docs/compression/zip.txt
--- a/format_docs/pdb/ereader.txt
+++ b/format_docs/pdb/ereader.txt
@ -0,0 +1,309 @@
+About
+-----
+
+The eReader format has evolved and changed over time. Subsequently, there are
+multiple versions of the eReader format. There are also two different tools
+that can create eReader files. The official tools are Makebook and Dropbook.
+Dropbook is the newer official tool that has replaced Makebook. However,
+Makebook is still in wide use because it supports a wider range of platforms
+than Dropbook. Dropbook is a GUI application that only runs on Windows and
+Apple’s OS X.
+
+
+PDB Identiy
+-------
+
+PNRdPPrs
+
+
+202 and 132 headers
+-----------------------------------------
+
+Older files have a record 0 size of 202 and occasionally 116. Newer files have
+a record 0 size of 132. As of this writing the 202 files only support text and
+images. The image format in the 202 files is the same as the 132 files. The 132
+files support a number of additional features.
+
+
+Record 0, eReader header (202)
+------------------
+
+Note all values are in 2 byte increments. Like values are condensed into a
+range. The range can be borken into 2 byte sections which represent the actual
+stored values.
+
+bytes       content             comments
+
+0-2         Version             Non-DRM books 2 and 4.
+2-8         Garbage
+8-10        Non-Text Offset     Start of Non text area (images) will run to the
+                                end of the section list.
+10-14       Unknown
+14-24       Garbage
+24-28       Unknown
+28-98       Garbage
+98-100      Unknown
+100-110     Garbage
+110-114     Unknown
+114-116     Garbage
+116-202     Unknown
+
+* Garbage: Intentially random values.
+
+
+Text Records (202)
+------------------
+
+Text starts with section 1 and continues until the section indicated by the
+Non-Text Offset. All text records are PalmDoc compressed.
+
+Each character in the compressed data is xored with 0xA5.
+
+A decompression example in sudo Python:
+
+for num in range(1, Non-Text Offset):
+    text += decompress_pamldoc(''.join([chr(ord(x) ^ 0xA5) for x in section_data(num)])).decode('cp1252', 'replace')
+
+
+Dropbook 132 files
+------------------
+
+The following sections apply to the newer Dropbook created files.
+
+
+Record 0, eReader header (132)
+----------------------------
+
+This is only for 132 byte header files created by Dropbook.
+
+bytes   content                     comments
+
+0-2     compression                 Specifies compression and drm. 2 = palmdoc,
+                                    10 = zlib. 260 and 272 = DRM
+2-6     unknown                     Value of 0 is used
+6-8     encoding                    Always 25152 (0x6240). All text must be
+                                    encoded as Latin-1 cp1252
+8-10    Number of small pages       The number of small font pages. If page
+                                    index is not build in then 0.
+10-12   Number of large pages       The number of large font pages. If page
+                                    index is not build in then 0.
+12-14   Non-Text record start       The location of the first non text records.
+                                    record 1 to this value minus 1 are all text
+                                    records
+14-16   Number of chapters          The number of chapter index records
+                                    contained in the file
+16-18   Number of small index       The number of small font page index records
+                                    contained in the file
+18-20   Number of large index       The number of large font page index records
+                                    contained in the file
+20-22   Number of images            The number of images contained in the file
+22-24   Number of links             The number of links contained in the file
+24-26   Metadata avaliable          Is there a metadata record in the file?
+                                    0 = None, 1 = There is a metadata record
+26-28   Unknown                     Value of 0 is used
+28-30   Number of Footnotes         The number of footnote records in the file
+30-32   Number of Sidebars          The number of sidebar records in the file
+32-34   Chapter index record start  The location of chapter index records. If
+                                    there are no chapters use the value for the
+                                    Last data record.
+34-36   2560                        Magic value that must be set to 2560
+36-38   Small page index start      The location of small font page index
+                                    records. If page table is not built in use
+                                    the value for the Last data record.
+38-40   Large page index start      The location of large font page index
+                                    records. If page table is not built in use
+                                    the value for the Last data record.
+40-42   Image data record start     The location of the first image record. If
+                                    there are no images use the value for the
+                                    Last data record.
+42-44   Links record start          The location of the first link index
+                                    record. If there are no links use the value
+                                    for the Last data record.
+44-46   Metadata record start       The location of the metadata record. If
+                                    there is no metadata use the value for the
+                                    Last data record.
+46-48   Unknown                     Value of 0 is used
+48-50   Footnote record start       The location of the first footnote record.
+                                    If there are no footnotes use the value for
+                                    the Last data record.
+50-52   Sidebar record start        The location of the first sidebar record.
+                                    If there are no sidebars use the value for
+                                    the Last data record.
+52-54   Last data record            The location of the last data record
+54-132  Unknown                     Value of 0 is used
+
+Note: All values are in 2 byte increments. All bytes in the table that have a
+range larger than 2 can be broken into 2 byte segments and have different
+values set for each grouping.
+
+
+Records Order
+-------------
+
+Though the order of this sections is described in eReader header,
+DropBook makes the following order:
+
+   1. eReader Header
+   2. Compressed text
+   3. Small font page index
+   4. Large font page index
+   5. Chapter index
+   6. Links index
+   7. Images
+   8. (Extrapolation: there should be one more record type here though it has
+       not yet been uncovered what it might be).
+   9. Metadata
+  10. Sidebar records
+  11. Footnote records
+  12. Text block size record
+  13. "MeTaInFo\x00" word record 
+
+
+Text Records
+------------
+
+All text records use cp1252  encoding (although eReader documents talk about
+UTF-8 as well). Their total compressed size is unknown however, anything below
+3560 Bytes is known to work. The text will be either zlib or palmdoc
+compressed. Use the compression value from the eReader header to determine
+which. All text utalizes the Palm Markup Language (PML) for formatting.
+
+Starting with DropBook 1.6.0 text is divided into 8KB (8192 bytes) blocks
+trimming the end to the closest space character and then being compressed.
+Earlier version of DropBook 1.5.2 tries to behave the same way, though
+sometimes it trims the block in unexpected place.
+
+
+Chapter Index Records
+---------------------
+
+Each chapter record corresponds to 1 chapter and points at the place in the
+book. Chapter record takes a form of 'offset name\x00' First 4 bytes are offset
+of the original pml file where the chapter index points to (offset of
+the \x|\X?|\C? tags). Then without a space goes a name of a chapter in chapter
+index. It should contain only text, all formatting tags should be removed.
+\U and \a tags are not permitted in chapter name. To maintain sub-chapters
+4*n spaces (\x20) are added to the beginning of the name, where "n" is level of
+chapter: 0 for \x tag and N for \CN="" and \XN tags. And then an ending
+\x00 symbol.
+
+
+Image Records
+-------------
+
+Image records must be smaller than 65505 Bytes. They must also be 8bit PNG
+images.
+
+An image record takes the form 'PNG name\x00... image_data'
+
+bytes   content         comments
+
+0-4     PNG             There must be a space after PNG.
+4-36    image name.     The image name must be 32 exactly 32 Bytes long. Pad
+                        the right side of the name with \x00 characters for
+                        names shorter than 32 characters.
+36-58   Unknown	
+58-60   width           Width of an image
+60-62   height          Height of an image
+62-?    The image data  raw image data in 8 bit PNG format
+
+Note: DropBooks seems to change something in png raw data. Like reencoding or
+something, but plain insertion of png image there still works. 
+
+
+Links Records
+-------------
+
+Links records are constructed the same way as chapter ones. Each link anchor
+record corresponds to 1 link anchor and points at the place in the book. Link
+record takes a form of 'offset name\x00' First 4 bytes are offset of the
+original pml file where the link anchor points to (offset of the \Q tag). Then
+without a space goes a name of a link anchor. It should contain only text, all
+formatting tags should be removed. \U and \a tags are not permitted in link
+anchor name. And then an ending \x00 symbol.
+
+
+Footnote Records
+----------------
+
+The first footnote record is a \x00 separated list of footnote ids. All
+subsequent footnote records are the footnote text corresponding to the id's
+position in the list. Footnote text is compressed in the same manner as normal
+text records
+
+E.G.
+
+footnote section 1 = 'notice1\x00notice2\x00notice3\x00'
+footnote section 2 = 'Text for notice 1'
+footnote section 3 = 'Text for notice 2'
+footnote section 4 = 'Text for notice 3'
+
+Starting with Dropbook 1.5.2 first record looks a bit different. It is sequence
+of \x00\x01 then 1 byte of footnote id length, then footnote id then \x00.
+
+E.G.
+
+footnote section 1 = '\x00\x01\x07notice1\x00\x00\x01\x0Afootnote10\x00'
+
+
+Sidebar Records
+---------------
+
+The first sidebar record is a \x00 separated list of sidebar ids. All
+subsequent sidebar records are the sidebar text corresponding to the id's
+position in the list. Sidebar text is compressed in the same manner as normal
+text records
+
+E.G.
+
+sidebar section 1 = 'notice1\x00notice2\x00notice3\x00'
+sidebar section 2 = 'Text for notice 1'
+sidebar section 3 = 'Text for notice 2'
+sidebar section 4 = 'Text for notice 3'
+
+Starting with Dropbook 1.5.2 first record looks a bit different. It is sequence
+of \x00\x01 then 1 byte of sidebar's id length, then sidebar's id then \x00.
+
+E.G.
+
+sidebar section 1 = '\x00\x01\x07notice1\x00\x00\x01\x09sidebar10\x00'
+
+
+Metadata Record
+---------------
+
+\x00 separated list of string.
+
+Metadata takes the form:
+
+  title\x00
+  author\x00
+  copyright\x00
+  publisher\x00
+  isbn\x00
+
+E.G.
+
+Gibraltar Earth\x00Michael McCollum\x001999\x00Sci Fi Arizona\x001929381255\x00
+
+The metdata record is always followed by a record which contains 'MeTaInFo\x00'
+
+Note: Starting with DropBook 1.5.2 'MeTaInFo\x00' is not following Metadata
+Record. It is a separate record that ends the file and there are some more
+records between Metadata record and 'MeTaInFo\x00' record.
+
+
+Text Sizes Record
+-----------------
+
+There is a special record that contains the initial size of all text blocks
+before compression. It is just a sequence of 2-byte blocks which are containing
+the sizes.
+
+E.G.
+
+\x1F\xFB\x20\x00\x20\x00\x1F\xFE\x1F\xFD\x09\x46
+
+Note: By this we can judge that theoretical maximum of initial block size is
+65535 bytes. 
+
--- a/format_docs/pdb/mbp.txt
+++ b/format_docs/pdb/mbp.txt
@ -0,0 +1,414 @@
+// BEGINING OF FILE
+//   NOTES:
+//   1* Numeric data stored as big endian, 32 bits.
+//   2* Data padded to 16 bits limits. (Sometimes to 32 bits limits?)
+//   3* Text stored seems to be an 8 bit encoding padded to 16 bits
+//    (may be "ISO-8859-1"?, or may be just a local machine character set?)
+//   4* I initially used the term "MARK" where I should have used "HIGHLIGTH", 
+//     bear that in mind (it was a bad name election when I started reversing)
+
+<0x 31 bytes = book_title_PAR + 0x00 PAD if (book_title_PAR < 31) >
+<0x 00>
+<0x 00 00 00 00>
+...4
+...4
+<0x 00 00 00 00>
+<0x 00 00 00 00>
+<0x 00 00 00 00>
+<0x 00 00 00 00>
+BPAR
+MOBI
+<0x 4 bytes = Next free pointer identifier>
+	// Note: pointer identifiers aren't always consecutive,
+	// so this number is usually bigger than de # of index entries
+<0x 00 00>
+<0x 4 bytes = Number of index entries>
+<0x 4 bytes = Position of BPAR>
+<0x 00 00 00 00>	// BPAR pointer identifier = 0x0
+
+
+// INDEXES:
+// Order of Indexes: from the beginning of this MBP file, 
+// forward to the end of the file.
+// Nevertheless, see these comments for order relative to: 
+//   "BEGINING OF USER DATA": order of Data marks.
+//   "FINAL GROUP OF MARKS": order of final marks.
+[for each {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK,
+		AUTHOR,TITLE,CATEGORY,GENRE,ABSTRACT,COVER,PUBLISHER,
+		...} 
+	  || "last DATA"]
+// Note: Pointer identifiers to DATA's assigned so the number
+// shrinks as the table grows down.
+[if NOTE || CORRECTION]
+	<0x 4 bytes = Position of DATA....EBVS>
+	<0x 4 bytes = Pointer identifier, used by BKMK blocks>
+[fi NOTE || CORRECTION]
+<0x 4 bytes = Position of DATA>
+<0x 4 bytes = Pointer identifier, used by BKMK blocks>
+[if NOTE || CORRECTION]
+	<0x 4 bytes = Position of DATA>
+	<0x 4 bytes = Pointer identifier, used by BKMK blocks>
+[fi NOTE || CORRECTION]
+[if MARK || DRAWING || BOOKMARK]
+	<0x 4 bytes = Position of DATA....EBVS>
+	<0x 4 bytes = Pointer identifier, used by BKMK blocks>
+[fi MARK || DRAWING || BOOKMARK]
+[if AUTHOR || TITLE || CATEGORY || GENRE || ABSTRACT || COVER || PUBLISHER]
+	<0x 4 bytes = Position of [AUTH || TITL || CATE || GENR || ABST || COVE || PUBL] >
+	<0x 4 bytes = Pointer identifier>
+[fi AUTHOR || TITLE || CATEGORY || GENRE || ABSTRACT || COVER || PUBLISHER]
+[if last DATA] // there's always a last piece of DATA (not user data?)
+	<0x 4 bytes = Position of last DATA>
+	<0x 4 bytes = Pointer identifier>	// usually <0x 00 00 00 01>
+[fi last DATA]
+[next {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK,
+		AUTHOR,TITLE,CATEGORY,GENRE,ABSTRACT,COVER,PUBLISHER,
+		...} 
+      || "last DATA"]
+
+
+[for each {NOTE,MARK,CORRECTION,DRAWING}]
+<0x 4 bytes = Position of BKMK>
+<0x 4 bytes = Pointer identifier>
+	// Note: pointer identifiers for BKMK's are usually the minor
+	// of all the identifiers associated to an annotation. All
+	// other DATA references in INDEXES table associated to this
+	// BKMK, have bigger pointer identifiers.
+	// Note: Pointer identifiers to BKMK's assigned so the number
+	// grows as the table grows down.
+[next {NOTE,MARK,CORRECTION,DRAWING}]
+
+
+<0x 2 bytes random PAD>
+BPAR
+<0x 4 bytes = size of BPAR block>
+<0x FF FF FF FF>
+...4	<-- 'position of last read' related
+...4	<-- 'position of last read' related
+...4
+<0x FF FF FF FF>
+...4
+...4
+...4	<-- 'position of last read' related
+...(rest of size of BPAR block, if bigger than 0x20)
+[if (size of BPAR block) mod 32 != 0]
+<0x FF FF FF FF>
+[fi]
+
+// BEGINING OF USER DATA:
+// Order of {NOTE,MARK,CORRECTION,DRAWING} : 
+// starts with user data at the end of the file, 
+// going backwards to the begining of the file:
+//--------------------------------------------------------------------
+[for each {NOTE,MARK,CORRECTION,DRAWING}]
+//-------------------------------
+[if NOTE]
+DATA
+<0x 4 bytes = size of DATA block>
+[if EBAR]	// this block can appear, or not... ???
+	EBAR
+	...various {4 x byte} ???
+[fi EBAR]
+EBVS
+<0x 00 00 00 03> ???
+<0x 4 bytes = IDENTIFIER> ???
+[<0x 00 00 00 01>, or nothing at all] ???
+<0x 00 00 00 08>
+<0x FF FF FF FF>
+<0x 00 00 00 00>
+<0x 00 00 00 10>
+...(rest of size of DATA block)
+<0x FD EA = PAD? (ýê)>
+DATA
+<0x 4 bytes = size of <marked text (see 3rd note)> >
+<marked text (see 3rd note)>
+[if (size of <marked text (see 3rd note)>) mod 4 !=0]
+<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
+[fi]
+DATA
+<0x 4 bytes = size of <note text (see 3rd note)> >
+<note text (see 3rd note)>
+[if (size of <note text (see 3rd note)>) mod 4 !=0]
+<0x random PAD until (size of <note text (see 3rd note)>) mod 4 ==0>
+[fi]
+[fi NOTE]
+//-------------------------------
+[if MARK || BOOKMARK]
+DATA
+<0x 4 bytes = size of <marked text (see 3rd note)> >
+<marked text (see 3rd note)>
+[if (size of <marked text (see 3rd note)>) mod 4 !=0]
+<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
+[fi]
+DATA
+<0x 4 bytes = size of DATA block>
+[if EBAR]	// this block can appear, or not... ???
+	EBAR
+	...various {4 x byte} ???
+[fi EBAR]
+EBVS
+<0x 00 00 00 03> ???
+<0x 4 bytes = IDENTIFIER> ???
+[<0x 00 00 00 01>, or nothing at all] ???
+<0x 00 00 00 08>
+<0x FF FF FF FF>
+<0x 00 00 00 00>
+<0x 00 00 00 10>
+...(rest of size of DATA block)
+<0x FD EA = PAD? (ýê)>
+[fi MARK || BOOKMARK]
+//-------------------------------
+[if CORRECTION]
+DATA
+<0x 4 bytes = size of DATA block>
+[if EBAR]	// this block can appear, or not... ???
+	EBAR
+	...various {4 x byte} ???
+[fi EBAR]
+EBVS
+<0x 00 00 00 03> ???
+<0x 4 bytes = IDENTIFIER> ???
+[<0x 00 00 00 01>, or nothing at all] ???
+<0x 00 00 00 08>
+<0x FF FF FF FF>
+<0x 00 00 00 00>
+<0x 00 00 00 10>
+...(rest of size of DATA block)
+<0x FD EA = PAD? (ýê)>
+DATA
+<0x 4 bytes = size of <marked text (see 3rd note)> >
+<marked text (see 3rd note)>
+[if (size of <marked text (see 3rd note)>) mod 4 !=0]
+<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
+[fi]
+DATA
+<0x 4 bytes = size of <note text (see 3rd note)> >
+<note text (see 3rd note)>
+[if (size of <note text (see 3rd note)>) mod 4 !=0]
+<0x random PAD until (size of <note text (see 3rd note)>) mod 4 ==0>
+[fi]
+[fi CORRECTION]
+//-------------------------------
+[if DRAWING]
+DATA
+<0x 4 bytes = size of raw data>
+ADQM
+	// NOTE: bakground color is stored in corresponding BKMK.
+	[begin DRAWING format]
+		...4 = <0x 00 00 00 01> ???
+		<0x 4 bytes = X POSITION OF UPPER LEFT CORNER??? > 
+		<0x 4 bytes = Y POSITION OF UPPER LEFT CORNER??? > 
+		<0x 4 bytes = X SIZE in pixels > 
+		<0x 4 bytes = Y SIZE in pixels > 
+		...4 = <0x 00 00 00 00> ???
+		<0x 4 bytes = number of STROKES>
+		[if "number of STROKES" == 0]
+			<0x 00 00 00 00>
+			[end DRAWING format]	
+		[fi]
+		[for each STROKE]
+			<0x 00 00 00 01> ???
+			<0x 4 bytes> = 
+				Stroke's beginning position in list of coordinates.
+			<0x 4 bytes> = 
+				Stroke's ending position in list of coordinates.
+			<0x 00 RR GG BB> = RRGGBB color of stroke.
+		[next STROKE]
+		<0x 4 bytes> = number of coordinate pairs in array of coordinates.
+		// NOTE: each stroke is formed out of at least three 
+		// coordinate pairs: begin, {next point}(1-n), end point.
+		[for each COORDINATE]
+			<0x 4 bytes> = X coordinate
+			<0x 4 bytes> = Y coordinate
+		[next COORDINATE]
+	[end DRAWING format]
+[if (size of <marked text (see 3rd note)>) mod 4 !=0]
+<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
+[fi]
+DATA
+<0x 4 bytes = size of <marked text (see 3rd note)> >
+<marked text (see 3rd note)>
+[if (size of <marked text (see 3rd note)>) mod 4 !=0]
+<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
+[fi]
+DATA
+<0x 4 bytes = size of DATA block>
+[if EBAR]	// this block can appear, or not... ???
+	EBAR
+	...various {4 x byte} ???
+[fi EBAR]
+EBVS
+<0x 00 00 00 03>
+<0x 4 bytes = IDENTIFIER>
+[<0x 00 00 00 01>, or nothing at all] ???
+<0x 00 00 00 08>
+<0x FF FF FF FF>
+<0x 00 00 00 00>
+<0x 00 00 00 10>
+...(size of DATA block - 30)
+<0x FD EA = PAD? (ýê)>
+[fi DRAWING]
+//-------------------------------
+[next {NOTE,MARK,CORRECTION,DRAWING}]
+
+// AUTHOR (if any)
+//--------------------------------------------------------------------
+[if AUTHOR]
+AUTH
+<0x 4 bytes = size of AUTHOR block>
+<text (see 3rd note)>
+[fi AUTHOR]
+//--------------------------------------------------------------------
+// TITLE (if any)
+//--------------------------------------------------------------------
+[if TITLE]
+TITL
+<0x 4 bytes = size of TITLE block>
+<text (see 3rd note)>
+[fi TITLE]
+//--------------------------------------------------------------------
+// GENRE (if any)
+//--------------------------------------------------------------------
+[if GENRE]
+GENR
+<0x 4 bytes = size of GENRE block>
+<text (see 3rd note)>
+[fi GENRE]
+//--------------------------------------------------------------------
+// ABSTRACT (if any)
+//--------------------------------------------------------------------
+[if ABSTRACT]
+ABST
+<0x 4 bytes = size of ABSTRACT block>
+<text (see 3rd note)>
+[fi ABSTRACT]
+//--------------------------------------------------------------------
+
+// FINAL DATA
+// Note: 'FINAL DATA' can occur anytime between these marks: 
+//   AUTHOR,TITLE,CATEGORY,GENRE,ABSTRACT,COVER,PUBLISHER,...
+//--------------------------------------------------------------------
+DATA
+<0x 4 bytes = size of EBVS block>
+[if EBAR]	// this block can appear, or not... ???
+	EBAR
+	...various {4 x byte} ???
+[fi EBAR]
+EBVS
+<0x 00 00 00 03> || <0x 00 00 00 04> 
+<0x 4 bytes || 8 bytes = IDENTIFIER>
+<0x 00 00 00 08>
+<0x FF FF FF FF>
+<0x 00 00 00 00>
+<0x 00 00 00 10>
+...(size of EBVS block - 30) :
+	...4	<-- 'position of last read' related
+	...various {4 x byte} ???
+	...4	<-- 'position of last read' related
+	...4
+	...4
+	...4
+<0x FD EA = PAD? (ýê)>
+//--------------------------------------------------------------------
+
+// CATEGORY (if any)
+//--------------------------------------------------------------------
+[if CATEGORY]
+CATE
+<0x 4 bytes = size of CATEGORY block>
+<text (see 3rd note)>
+[fi CATEGORY]
+//--------------------------------------------------------------------
+// COVER (if any)
+//--------------------------------------------------------------------
+[if COVER]
+COVE
+<0x 4 bytes = size of COVER block>
+<text (see 3rd note)>
+[fi COVER]
+//--------------------------------------------------------------------
+// PUBLISHER (if any)
+//--------------------------------------------------------------------
+[if PUBLISHER]
+PUBL
+<0x 4 bytes = size of PUBLISHER block>
+<text (see 3rd note)>
+[fi PUBLISHER]
+//--------------------------------------------------------------------
+
+
+// FINAL GROUP OF MARKS
+// Order of {NOTE,MARK,CORRECTION} : 
+// starts with user data at the begining of the file, 
+// going forwards to the end:
+//--------------------------------------------------------------------
+[for each {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK}]
+BKMK
+<0x 4 bytes = size of BKMK>
+<0x 4 bytes = TEXT position of the beginning of {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK}>
+//-------------------------------
+[if DRAWING]
+<0x FF FF FF FF>
+[else]
+<0x 4 bytes = TEXT position of the end of {NOTE,MARK,CORRECTION,BOOKMARK}>
+[fi DRAWING]
+...4
+...4
+//-------------------------------
+[if NOTE]
+	<0x xx xx xx (20)?>, xxxxxx=>RRGGBB color ???
+	<0x 00 00 00 02>
+[fi NOTE]
+[if MARK]
+	<0x xx xx xx (0F/00)??>, xxxxxx=>RRGGBB color ???
+	<0x 00 00 00 04>
+[fi MARK]
+[if CORRECTION]
+	<0x xx xx xx (6F)?>, xxxxxx=>RRGGBB color ???
+	<0x 00 00 00 02>
+[fi CORRECTION]
+[if DRAWING]
+	<0x xx xx xx (0F)?>, xxxxxx=>RRGGBB DRAWING's background color.
+	<0x 00 00 00 08>
+[fi DRAWING]
+[if BOOKMARK]
+	<0x xx xx xx 00>
+	<0x 00 00 00 01>
+[fi BOOKMARK]
+	// this one is a strange type of mark, of yet not identified use:
+	[if UNKNOWN_TYPE_YET_1]
+		<0x xx xx xx 00>
+		<0x 00 00 40 00>
+	[fi UNKNOWN_TYPE_YET_1]
+
+//-------------------------------
+[if BOOKMARK || (NOTE "without stored marked text")]
+	<0x FF FF FF FF>
+[else]
+	<0x 4 bytes = DATA pointer in INDEXES>
+[fi BOOKMARK]
+[if DRAWING || MARK]
+	<0x FF FF FF FF>
+[else]
+	<0x 4 bytes = DATA pointer in INDEXES>
+[fi]
+<0x 4 bytes = DATA pointer in INDEXES>
+[if DRAWING]
+	<0x 4 bytes = DATA pointer in INDEXES>
+[else]
+	<0x FF FF FF FF>
+[fi]
+//-------------------------------
+<0x FF FF FF FF>
+<0x FF FF FF FF>
+[next {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK}]
+//--------------------------------------------------------------------
+
+[if length % 32 bit != 0] ???
+	<0x FF FF FF FF>
+[fi]
+
+// END OF FILE
+
+// by idleloop@yahoo.com, v0.2.e, 12/2009
+// http://www.angelfire.com/ego2/idleloop
--- a/format_docs/pdb/mobi.txt
+++ b/format_docs/pdb/mobi.txt
@ -0,0 +1,341 @@
+from (http://wiki.mobileread.com/wiki/MOBI)
+
+About
+-----
+
+MOBI is the format used by the the MobiPocket Reader. It may have a .mobi
+extension or it may have a .prc extension. The extension can be changed by the
+user to either of the accepted forms. In either case it may be DRM protected or
+non-DRM. The .prc extension is used because the PalmOS doesn't support any file
+extensions except .prc or .pdb. Note that Mobipocket prohibits their DRM format
+to be used on dedicated eBook readers that support other DRM formats.
+
+
+Description
+-----------
+
+MOBI format was originally an extension of the PalmDOC  format by adding
+certain HTML like tags to the data. Many MOBI formatted documents still use
+this form. However there is also a high compression version of this file format
+that compresses data to a larger degree in a proprietary manner. There are some
+third party programs that can read the eBooks in the original MOBI format but
+there are only a few third party program that can read the eBooks in the new
+compressed form. The higher compression mode is using a huffman coding scheme
+that has been called the Huff/cdic algorithm.
+
+From time to time features have been added to the format so new files may have
+problems if you try and read them with a down level reader. Currently the
+source files follow the guidelines in the Open eBook format.
+
+Note that AZW for the Amazon Kindle is the same format as MOBI except that it
+uses a slightly different DRM scheme.
+
+
+Format
+------
+
+Like PalmDOC, the Mobipocket file format is that of a standard Palm Database
+Format file. The header of that format includes the name of the database
+(usually the book title and sometimes a portion of the authors name) which is
+up to 31 bytes of data. The files are identified as Creator ID of MOBI and a
+Type of BOOK.
+
+
+PalmDOC Header
+--------------
+
+The first record in the Palm Database Format gives more information about the
+Mobipocket file. The first 16 bytes are almost identical to the first sixteen
+bytes of a PalmDOC format file.
+
+bytes   content             comments
+2       Compression         1 == no compression, 2 = PalmDOC compression,
+                            17480 = HUFF/CDIC compression.
+2       Unused              Always zero
+4       text length         Uncompressed length of the entire text of the book
+2       record count        Number of PDB records used for the text of the book.
+2       record size         Maximum size of each record containing text, always
+                            4096.
+4       Current Position    Current reading position, as an offset into the
+                            uncompressed text
+
+There are two differences from a Palm DOC file. There's an additional
+compression type (17480), and the Current Position bytes are used for a
+different purpose:
+
+bytes   content             comments
+2       Encryption Type     0 == no encryption, 1 = Old Mobipocket Encryption,
+                            2 = Mobipocket Encryption.
+2       Unknown             Usually zero
+
+The old Mobipocket Encryption scheme only allows the file to be registered
+with one PID, unlike the current encryption scheme that allows multiple PIDs to
+be used in a single file. Unless specifically mentioned, all the encryption
+information on this page refers to the current scheme.
+
+
+MOBI Header
+-----------
+
+Most Mobipocket file also have a MOBI header in record 0 that follows these
+16 bytes, and newer formats also have an EXTH header following the MOBI header,
+again all in record 0 of the PDB file format.
+
+The MOBI header is of variable length and is not documented. Some fields have
+been tentatively identified as follows:
+
+offset  bytes   content                 comments
+16      4       identifier              The characters M O B I
+20      4       header length           The length of the MOBI header, including
+                                        the previous 4 bytes
+24      4       Mobi type               The kind of Mobipocket file this is
+                                            2 Mobipocket Book
+                                            3 PalmDoc Book
+                                            4 Audio
+                                            257 News
+                                            258 News_Feed
+                                            259 News_Magazine
+                                            513 PICS
+                                            514 WORD
+                                            515 XLS
+                                            516 PPT
+                                            517 TEXT
+                                            518 HTML
+28      4       text Encoding           1252 = CP1252 (WinLatin1); 65001 = UTF-8
+32      4       Unique-ID               Some kind of unique ID number (random?)
+36      4       Generator version       Potentially the version of the
+                                        Mobipocket-generation tool. Always >=
+                                        the value of the "format version" field
+                                        and <= the version of mobigen used to
+                                        produce the file.
+40      40      Reserved                All 0xFF. In case of a dictionary, or
+                                        some newer file formats, a few bytes are
+                                        used from this range of 40 0xFFs
+80      4       First Non-book index?   First record number (starting with 0)
+                                        that's not the book's text
+84      4       Full Name Offset        Offset in record 0 (not from start of
+                                        file) of the full name of the book
+88      4       Full Name Length        Length in bytes of the full name of the
+                                        book
+92      4       Language                Book language code. Low byte is main
+                                        language 09= English, next byte is
+                                        dialect, 08 = British, 04 = US
+96      4       Input Language          Input language for a dictionary
+100     4       Output Language         Output language for a dictionary
+104     4       Format version          Potentially the version of the
+                                        Mobipocket format used in this file.
+                                        Always >= 1 and <= the value of the
+                                        "generator version" field.
+108     4       First Image record      First record number (starting with 0)
+                                        that contains an image. Image records
+                                        should be sequential. If there are
+                                        no images this will be 0xffffffff.
+112     4       HUFF record             Record containing Huff information
+                                        used in HUFF/CDIC decompression.
+116     4       HUFF count              Number of Huff records.
+122     4       DATP record             Unknown: Records starts with DATP.
+124     4       DATP count              Number of DATP records.
+128     4       EXTH flags              Bitfield. if bit 6, 0x40 is set, then
+                                        there's an EXTH record
+The following records are only present if the mobi header is long enough.
+132     36      ?                       32 unknown bytes, if MOBI is long enough
+168     4       DRM Offset              Offset to DRM key info in DRMed files.
+                                        0xFFFFFFFF if no DRM
+172     4       DRM Count               Number of entries in DRM info.
+174     4       DRM Size                Number of bytes in DRM info.
+176     4       DRM Flags               Some flags concerning the DRM info.
+180     6       ?
+186     2       Last Image record       Possible vaule with the last image
+                                        record. If there are no images in the
+                                        book this will be 0xffff.
+188     4       ?
+192     4       FCIS record             Unknown. Record starts with FCIS.
+196     4       ?
+200     4       FLIS record             Unknown. Records starts with FLIS.
+204     ?       ?                       Bytes to the end of the MOBI header,
+                                        including the following if the header
+                                        length >= 228. ( 244 from start of
+                                        record)
+242     2       Extra Data Flags        A set of binary flags, some of which
+                                        indicate extra data at the end of each
+                                        text block. This only seems to be valid
+                                        for Mobipocket format version 5 and 6
+                                        (and higher?), when the header length
+                                        is 228 (0xE4) or 232 (0xE8).
+
+
+EXTH Header
+-----------
+
+If the MOBI header indicates that there's an EXTH header, it follows immediately
+after the MOBI header. since the MOBI header is of variable length, this isn't
+at any fixed offset in record 0. Note that some readers will ignore any EXTH
+header info if the mobipocket version number specified in the MOBI header is 2
+or less (perhaps 3 or less).
+
+The EXTH header is also undocumented, so some of this is guesswork.
+
+bytes   content             comments
+4       identifier          the characters E X T H
+4       header length       the length of the EXTH header, including the previous 4 bytes
+4       record Count        The number of records in the EXTH header. the rest of the EXTH header consists of repeated EXTH records to the end of the EXTH length.
+        EXTH record start   Repeat until done.
+4       record type         Exth Record type. Just a number identifying what's stored in the record
+4       record length       length of EXTH record = L , including the 8 bytes in the type and length fields
+L-8     record data         Data.
+        EXTH record end     Repeat until done.
+
+There are lots of different EXTH Records types. Ones found so far in Mobipocket
+files are listed here, with possible meanings. Hopefully the table will be
+filled in as more information comes to light.
+
+record type    usual length     name             comments
+1                               drm_server_id
+2                               drm_commerce_id
+3                               drm_ebookbase_book_id
+100                             author
+101                             publisher
+102                             imprint
+103                             description
+104                             isbn
+105                             subject
+106                             publishingdate
+107                             review
+108                             contributor
+109                             rights
+110                             subjectcode
+111                             type
+112                             source
+113                             asin
+114                             versionnumber
+115                             sample
+116                             startreading
+118                             retail price (as text)
+119                             retail price currency (as text)
+201                             coveroffset
+202                             thumboffset
+203                             hasfakecover
+204                             204 Unknown
+205                             205 Unknown
+206                             206 Unknown
+207                             207 Unknown
+208                             208 Unknown
+300                             300 Unknown
+401                             clippinglimit
+402                             publisherlimit
+403                             403 Unknown
+404                             404 ttsflag
+501            4                cdetype          PDOC - Personal Doc;
+                                                 EBOK - ebook;
+502                             lastupdatetime
+503                             updatedtitle
+
+And now, at the end of Record 0 of the PDB file format, we usually get the full
+file name, the offset of which is given in the MOBI header.
+
+
+Variable-width integers
+-----------------------
+
+Some parts of the Mobipocket format encode data as variable-width integers.
+These integers are represented big-endian with 7 bits per byte in bits 1-7. They
+may be either forward-encoded, in which case only the LSB has bit 8 set, or
+backward-encoded, in which case only the MSB has bit 8 set. For example, the
+number 0x11111 would be represented forward-encoded as:
+
+    0x04 0x22 0x91
+
+And backward-encoded as: 
+
+    0x84 0x22 0x11
+
+
+Trailing entries
+----------------
+
+The Extra Data Flags field of the MOBI header indicates which, if any, trailing
+entries are appended to the end of each text record. Each set bit in the field
+indicates a trailing entry. The entries appear to occur in bit-order; e.g.,
+trailing entry 1 immediately follows the text content and entry 16 occurs at
+the very end of the record. The effect and exact details of most of these
+entries is unknown. The trailing entries indicated by bits 2-16 appear to
+follow a common format. That format is:
+
+    <data><size>
+
+Where <size> is the size of the entire trailing entry (including the size of
+<size>) as a backward-encoded Mobipocket variable-width integer.
+
+Only a few bits have been identified
+
+bit     Data at end of records
+0x0001  Multi-byte character overlaps
+0x0002  Some data to help with indexing
+0x0004  Some data about uncrossable breaks
+
+
+Multibyte character overlap
+---------------------------
+
+When bit 1 of the Extra Data Flags field is set, each record is followed by a
+trailing entry containing any extra bytes necessary to complete a multibyte
+character which crosses the record boundary. The bytes do not participate in
+compression regardless which compression scheme is used for the file. However,
+unlike the trailing data bytes, the multibytes (including the count byte) do
+get included in any encryption. The overlapping bytes then re-appear as normal
+content at the beginning of the following record. The trailing entry ends with
+a byte containing a count of the overlapping bytes plus additional flags.
+
+offset  bytes   content         comments
+0       0-3	N   terminal bytes
+                of a multibyte
+                character	
+N       1       Size & flags    bits 1-2 encode N, use of bits 3-8 is unknown 
+
+
+PalmDOC Compression
+-------------------
+
+PalmDOC uses LZ77 compression techniques. DOC files can contain only compressed
+text. The format does not allow for any text formatting. This keeps files small,
+in keeping with the Palm philosophy. However, extensions to the format can use
+tags, such as HTML or PML, to include formatting within text. These extensions
+to PalmDoc are not interchangeable and are the basis for most eBook Reader
+formats on Palm devices.
+
+LZ77 algorithms achieve compression by replacing portions of the data with
+references to matching data that has already passed through both encoder and
+decoder. A match is encoded by a pair of numbers called a length-distance pair,
+which is equivalent to the statement "each of the next length characters is
+equal to the character exactly distance characters behind it in the uncompressed
+stream." (The "distance" is sometimes called the "offset" instead.)
+
+In the PalmDoc format, a length-distance pair is always encoded by a two-byte
+sequence. Of the 16 bits that make up these two bytes, 11 bits go to encoding
+the distance, 3 go to encoding the length, and the remaining two are used to
+make sure the decoder can identify the first byte as the beginning of such a
+two-byte sequence. The exact alforithm needed to decode the compressed text can
+be found on the PalmDOC page.
+
+PalmDOC data is always divided into 4096 byte blocks and the blocks are acted
+upon independently.
+
+PalmDOC does have support for bookmarks. These pointers are named and refer to
+an offset location in a file. If the file is edited these locations may no
+longer refer to the correct locations. Some reading programs allow the user to
+enter or edit these bookmarks while others treat them as a TOC. Some reading
+programs may ignore them entirely. They are stored at the end of the file itself
+so the full file needs to be scanned when loaded to find them. 
+
+
+MBP
+---
+
+This is the extension used on a side file (auxiliary) for MOBI formatted eBooks.
+It is used to store metadata  used by the library software and also to store
+user entered data like bookmarks, annotations, last read position. This file is
+created automatically by the reader program when the eBook is first opened and
+has a .mbp extension. The Library management software in MobiPocket uses this
+file to get information displayed in the library window such as title and author
+so that it won't have to open the larger eBook file.
+
--- a/format_docs/pdb/palmdoc.txt
+++ b/format_docs/pdb/palmdoc.txt
@ -0,0 +1,25 @@
+PalmDoc Format
+--------------
+
+The format is that of a standard Palm Database Format file. The header of that
+format includes the name of the database (usually the book title and sometimes
+a portion of the authors name) which is up to 31 bytes of data. This string of
+characters is terminated with a 0 in the C style. The files are identified as
+Creator ID of REAd and a Type of TEXt. 
+
+
+Record 0
+--------
+
+The first record in the Palm Database Format gives more information about the
+PalmDOC file, and contains 16 bytes.
+
+bytes   content             comments 
+
+2       Compression         1 == no compression, 2 = PalmDOC compression (see below)
+2       Unused              Always zero
+4       text length         Uncompressed length of the entire text of the book
+2       record count        Number of PDB records used for the text of the book.
+2       record size         Maximum size of each record containing text, always 4096
+4       Current Position    Current reading position, as an offset into the uncompressed text
+
--- a/format_docs/pdb/pdb_format.txt
+++ b/format_docs/pdb/pdb_format.txt
@ -0,0 +1,104 @@
+Format
+------
+
+A PDB file can be borken into multiple parts. The header, record 0 and data.
+values stored within the various parts are big-endian byte order. The data
+part is is broken down into multiple sections. The section count and offsets
+are referened in the PDB header. Sections can be no more than 65505 bytes in
+length.
+
+
+Layout
+------
+
+PDB files take the format: DB header followed by the record 0 which has
+contained format specific iformation followed by data.
+
+    DB Header
+0   Record 0
+.
+.   Data (borken down into sections)
+.
+
+
+Palm Database Header Format
+
+bytes   content             comments 
+
+32      name                database name. This name is 0 terminated in the
+                            field and will be used as the file name on a
+                            computer. For eBooks this usually contains the
+                            title and may have the author depending on the
+                            length available.
+
+2       attributes          bit field.
+                            0x0002 Read-Only
+                            0x0004 Dirty AppInfoArea
+                            0x0008 Backup this database (i.e. no conduit exists)
+                            0x0010 (16 decimal) Okay to install newer over
+                                    existing copy, if present on PalmPilot
+                            0x0020 (32 decimal) Force the PalmPilot to reset
+                                    after this database is installed
+                            0x0040 (64 decimal) Don't allow copy of file to be
+                                    beamed to other Pilot.
+
+2       version             file version
+
+4       creation date       No. of seconds since start of January 1, 1904.
+
+4       modification date   No. of seconds since start of January 1, 1904.
+
+4       last backup date    No. of seconds since start of January 1, 1904.
+
+4       modificationNumber
+
+4       appInfoID           offset to start of Application Info (if present)
+                            or null
+
+4       sortInfoID          offset to start of Sort Info (if present) or null
+
+4       type                See above table. (For Applications this data will
+                            be 'appl')
+
+4   	creator             See above table. This program will be launched if
+                            the file is tapped
+
+4       uniqueIDseed        used internally to identify record
+
+4       nextRecordListID    Only used when in-memory on Palm OS. Always set to
+                            zero in stored files.
+
+2       number of Records   number of records in the file - N
+
+8N      record Info List
+
+        start of record
+        info entry          Repeat N times to end of record info entry
+
+4       record Data Offset  the offset from the start of the PDB of this record
+
+1       record Attributes   bit field. The least significant four bits are used
+                            to represent the category values. These are the
+                            categories used to split the databases for viewing
+                            on the screen. A few of the 16 categories are
+                            pre-defined but the user can add their own. There
+                            is an undefined category for use if the user or
+                            programmer hasn't set this.
+                            0x10 (16 decimal) Secret record bit.
+                            0x20 (32 decimal) Record in use (busy bit).
+                            0x40 (64 decimal) Dirty record bit.
+                            0x80 (128, unsigned decimal) Delete record on
+                                  next HotSync.
+
+3       UniqueID            The unique ID for this record. Often just a
+                            sequential count from 0
+
+        end of record
+        info entry
+
+2?      Gap to data        traditionally 2 zero bytes to Info or raw data
+
+?       Records            The actual data in the file. AppInfoArea (if
+                           present), SortInfoArea (if present) and then
+                           records sequentially
+
--- a/format_docs/pdb/pdb_types.txt
+++ b/format_docs/pdb/pdb_types.txt
@ -0,0 +1,34 @@
+Palm Database File Code
+-----------------------
+
+Reader                      Type Code
+
+Adobe Reader                .pdfADBE
+PalmDOC                     TEXtREAd
+BDicty                      BVokBDIC
+DB (Database program)       DB99DBOS
+eReader                     PNRdPPrs
+eReader                     DataPPrs
+FireViewer (ImageViewer)    vIMGView
+HanDBase                    PmDBPmDB
+InfoView                    InfoINDB
+iSilo                       ToGoToGo
+iSilo 3                     SDocSilX
+JFile                       JbDbJBas
+JFile Pro                   JfDbJFil
+LIST                        DATALSdb
+MobileDB                    Mdb1Mdb1
+MobiPocket                  BOOKMOBI
+Plucker                     DataPlkr
+QuickSheet                  DataSprd
+SuperMemo                   SM01SMem
+TealDoc                     TEXtTlDc
+TealInfo                    InfoTlIf
+TealMeal                    DataTlMl
+TealPaint                   DataTlPt
+ThinkDB                     dataTDBP
+Tides                       TdatTide
+TomeRaider                  ToRaTRPW
+Weasel                      zTXTGPlm
+WordSmith                   BDOCWrdS 
+
--- a/format_docs/pdb/plucker.html
+++ b/format_docs/pdb/plucker.html
--- a/format_docs/pdb/pml.txt
+++ b/format_docs/pdb/pml.txt
@ -0,0 +1,936 @@
+Palm Markup Language
+--------------------
+
+This page explains how to use the Palm Markup Language (PML) to specify
+formatting and other information in a text file for later reading using the
+eReader.
+
+PML commands start with a backslash, "\", and usually consist of a single
+character after that. Some PML commands are paired, such as those that specify
+italicized text. Other commands are directives, such as the "\p", which
+specifies a page break. PML is not meant to be an industrial-strength markup
+language, but it is easy to understand, easy to parse, and creates high-quality
+electronic books.
+
+Since PML and Palm DropBook are not without flaws, there is a page of Tips and
+Pitfalls.
+
+
+Let's Dive Right In
+-------------------
+
+palmsample.txt contains examples of formatting text, specifying chapters, etc.
+Use it to start from, or just as an example when making your own books.
+
+The following table specifies the Palm Markup Language commands, and what
+they do.
+
+\p                              New page
+\x                              New chapter; also causes a new page break.
+                                Enclose chapter title (and any style codes)
+                                with \x and \x
+\Xn                             New chapter, indented n levels (n between 0 and
+                                4 inclusive) in the Chapter dialog; doesn't
+                                cause a page break. Enclose chapter title (and
+                                any style codes) with \Xn and \Xn
+\Cn="Chapter title"             Insert "Chapter title" into the chapter
+                                listing, with level n (like \Xn). The text is
+                                not shown on the page and does not force a page
+                                break. This can sometimes be useful to insert a
+                                chapter mark at the beginning of an
+                                introduction to the chapter, for example.
+\c                              Center this block of text; close with \c on
+                                beginning of line
+\r                              Right justify text block; close with \r on
+                                beginning of line
+\i                              Italicize block; close with \i
+\u                              Underline block; close with \u
+\o                              Overstrike block; close with \o
+\v                              Invisible text; close with \v (can be used for
+                                comments)
+\t                              Indent block. Start at beginning of a line,
+                                close with \t at end of a line
+\T="50%"                        Indents the specified percentage of the screen
+                                width, 50% in this case. If the current drawing
+                                position is already past the specified screen
+                                location, this tag is ignored.
+\w="50%"                        Embed a horizontal rule of a given percentage
+                                width of the screen, in this case 50%. This tag
+                                causes a line break before and after it. The
+                                rule is centered. The percent sign is mandatory.
+\n                              Switch to the "normal" font, which is specified
+                                by the user
+\s                              Switch to stdFont; close with \s to revert to
+                                normal font
+\b                              Switch to boldFont; close with \b to revert to
+                                normal font (deprecated; use \B instead)
+\l                              Switch to largeFont; close with \l to revert to
+                                normal font
+\B                              Mark text as bold. Unlike the \b tag, \B
+                                doesn't change the font, so you can have large
+                                bold text. You cannot mix \b and \B in the same
+                                PML file.
+\Sp                             Mark text as superscript. Should not be mixed
+                                with other styles such as bold, italic, etc.
+                                Enclose superscripted text with \Sp.
+\Sb                             Mark text as subscript. Should not be mixed
+                                with other styles such as bold, italic, etc.
+                                Enclose subscripted text with \Sb.
+\k                              Make enclosed text into small-caps; close with
+                                \k. Any characters enclosed in \k tags
+                                (including those with accents) are made
+                                uppercase and are rendered at a smaller point
+                                size than a regular uppercase character.
+\\                              Represents a single backslash
+\aXXX                           Insert non-ASCII character whose Windows 1252
+                                code is decimal XXX. See the PML character
+                                table for details.
+\UXXXX                          Insert non-ASCII character whose Unicode code
+                                is hexidecimal XXXX. See the Extended PML
+                                character table for details.
+\m="imagename.png"              Insert the named image. See the section on
+                                Images below.
+\q="#linkanchor"Some text\q     Reference a link anchor which is at another
+                                spot in the document. The string after the
+                                anchor specification and before the trailing\q
+                                is underlined or otherwise shown to be a link
+                                when viewing the document.
+\Q="linkanchor"                 Specify a link anchor in the document.
+\-                              Insert a soft hyphen. A soft hyphen shows up
+                                only if it is necessary to break a word across
+                                a line.
+\Fn="footnote1"1\Fn             Link the "1" to a footnote whose name is
+                                footnote1, tagged at the end of the PML
+                                document. See the section on Footnotes and
+                                Sidebars below.
+\Sd="sidebar1"Sidebar\Sd        Link the "Sidebar" text to a sidebar whose name
+                                is sidebar1, tagged at the end of the PML
+                                document. See the section on Footnotes and
+                                Sidebars below.
+\I                              Mark as a reference index item. Enclose index
+                                item (and any style codes) with \I and \I. See
+                                Creating Dictionaries for more information.
+
+
+Examples
+--------
+
+\pThis is a new page
+
+\xChapter III\x
+
+\X1Chapter III, part A\X1
+
+\p\C="Introduction"The following story is one of my favorites...
+
+\cProperty of
+Gateway Senior High School
+\c
+
+\rJustify my love
+\r
+
+This stuff is \ireally\i cool.
+
+I just read \uMoby Dick.\u
+
+This is a \obig\o mistake.
+
+Copyright 1917\v Date of magazine serialization \v
+
+\tOnce upon a time
+there was a wicked queen
+called Esmerelda.\t
+
+Mammals:\T="40%"Lions
+\T="40%"Tigers
+\T="40%"Bears
+
+He walked away.
+\w="80%"
+Later that day, he ran into an old friend.
+
+\nIn the normal ways...
+
+The \stitle page\s should be formatted...
+
+I just \bcan't\b believe that you...
+
+This \lREALLY\l is a large tiger...
+
+This \Bbold\B text can be either \l\Blarge bold\B\l or \s\Bsmall bold\B\s.
+
+e\Spx + 2\Sp = 9
+
+C\Sb2\SbH\Sb3\SbO\Sb2\Sb should be used in moderation.
+
+See also \kanteater\k.
+
+The DOS prompt said "C:\\windows\\"
+
+The man said \a147Yeah.\a148
+
+Arrows can point \U2190 left or right \U2192.
+
+A Yield sign looks like this: \m="yieldsign.png".
+
+See the \q="#detailedinstructions"Detailed Instructions\q for how to install your eBook.
+
+\Q="detailedinstructions"\bDetailed Instructions\b - This section
+describes how to install an eBook to your handheld device.
+
+Very long words like anti\-dis\-establish\-ment\-arian\-ism may benefit from
+the use of soft hyphens.
+
+The Emerson case\Fn="emerson"[1]\Fn will be very important...
+
+For more information, see the \Sd="moreinfo"sidebar\Sd.
+
+\I\Baardvark\B\I \in.\i a large burrowing nocturnal mammal that feeds especially on termites and ants
+
+
+Footnotes and Sidebars
+----------------------
+
+Footnotes and Sidebars are specified with an XML-like syntax at the end of the
+PML document. For example,
+
+<sidebar id="sidebar1">
+Here's some \itext\i for a sidebar.
+</sidebar>
+
+would specify the sidebar to be displayed when the user taps on a sidebar link
+in the text that was specified using the \Sd tag.
+
+Any text or PML placed after the first footnote or sidebar is ignored as part
+of the book text.
+
+Sidebars and footnotes can include most PML features, but there are some PML
+tags that cannot be used inside of a sidebar or footnote.
+
+These include
+Chapters       \x, \X, \C
+Links       \q, \Q
+Footnotes           \Fn
+Sidebars    \Sd
+
+See the palmsample.txt file for examples of how to use many of the PML tags.
+
+
+Images
+------
+
+The following rules are intended to guarantee that images in your eBook will be
+viewable on all platforms that eReader runs on.
+
+On low-resolution Palm OS handhelds, an image wider than 158 pixels or taller
+than 148 pixels will be represented in the text by a thumbnail that the user
+can tap to view the entire image. Images smaller than 158 x 148 will be
+presented in-line with the text.
+
+On high-resolution Palm OS handhelds (those having screens of 320x320 pixels or
+more), images smaller than 158 by 148 pixels will be pixel-doubled. Images
+larger than 158x148 may be shown in-line with the text, if they will fit on
+the screen.
+
+On non-Palm OS platforms, small images will be scaled up appropriately. Large
+images will be scaled down to fit on the page; in this case the user can tap on
+the image to view the entire image and zoom in or out.
+
+For DropBook to find the image, it must be present in a directory whose name
+matches that of the PML text file. For example, if "pmlsample.txt" contains a
+reference to an image called "intro.png", then there must be a directory called
+"pmlsample_img" that contains intro.png. The directory's name is the name of
+the PML file (without the .txt extension) with "_img" appended.
+
+Images must be in PNG format and cannot be filtered or interlaced. Image depth
+must be 8 bits or less. Any color table may be used for color images.
+
+Image files must be less than or equal to 65505 bytes in size, since they are
+embedded into the .pdb format of the book; Palm database records are limited to
+65505 bytes in length. Since images are compressed, the actual image displayed
+by the reader may be much larger than 64K.
+
+Any or all of these restrictions may eventually be removed.
+
+
+Adding a Title, Cover Art, and Other Meta-information to Your eBook
+-------------------------------------------------------------------
+
+DropBook normally presents a dialog in which the title and other information
+for the eBook may be specified. This information may be embedded in the PML
+file instead.
+
+To specify the eBook title as it will appear in the Open dialog on the
+handheld, place a block of invisible comment text at the beginning of the file
+using \v tags. Inside this comment block, put the string TITLE="My eBook",
+where "My eBook" is replaced with the name of your eBook. It should look
+something like this:
+
+\vTITLE="Palm Sample Document"\v
+
+You can also specify the author using the AUTHOR meta-tag, the publisher with
+PUBLISHER, copyright information with COPYRIGHT, and the eBook ISBN with EISBN.
+A fully-specified set of meta-information might appear in PML as:
+
+\vTITLE="Palm Sample Document" AUTHOR="Sam Morgenstern" PUBLISHER="eReader.com"
+EISBN="X-XXXX-XXXX" COPYRIGHT="Copyright \a169 2004 by Sam Morgenstern"\v
+
+Cover art: If an image named "cover.png" is present in the eBook, it is assumed
+to be the cover art for the eBook. See the rules for images for sizing and
+other information.
+
+Some or all of this information may appear in the book information dialog in
+eReader, and may be used for other purposes in future products.
+
+
+Creating Dictionaries
+---------------------
+
+The \I PML tag is used to delimit an index item. Example: \Iaardvark\I
+
+Each entry must start in the normal font. If DropBook shows an error beginning
+with "No styles permitted before...", there is probably a missing end style tag
+before the text shown in the error message.
+
+Links, chapters and other PML structures are not permitted in dictionaries.
+Images, however, are.
+
+A special dictionary entry, "(Front matter)" is shown before other entries in
+the list of entries, and should be used to include pronunciation symbols and
+other front matter.
+
+Note that use of dictionaries requires eReader Pro.
+
+
+Tips and Pitfalls
+-----------------
+
+This page explains some common mistakes, some bugs in DropBook and/or the
+eReader, and some techniques that will allow you to create quality electronic
+books for the eReader.
+
+    * Check out the Converting to Palm eBooks page for some pointers on
+      converting text from various formats into the Palm Markup Language.
+    * Use a return at the end of each paragraph, not each line.
+    * Using an extra return between paragraphs reads easier than paragraph
+      indentation.
+    * The eReader doesn't display empty lines at the top of a page. If you need
+      to have some "empty" lines at the top of a page, put a space on each line.
+    * Don't use tables if you can possibly avoid it.
+
+      None of the fonts that the eReader supports are monospaced, so tables can
+      be difficult to represent. Break out the information in another way, or
+      use the \T tag, but beware of tables that look great on a Palm OS
+      handheld but not on a Pocket PC or vice versa.
+
+    * The Reader breaks lines on spaces, dashes or underscores. This has
+      several implications.
+
+         1. Don't fill more than a line with spaces, dashes or underscores.
+            There's a bug (which will be fixed in a future release) which
+            causes MakeBook to hang on such a line. Note that in the large
+            font, the number of spaces, dashes or underscores will be much
+            smaller than in the small font.
+         2. A string such as He shouted "Wait!--" may place the last quote on
+            the beginning of a line, since the line would break after the
+            second dash. Prevent this by using the PML string: He shouted
+            "Wait!\a150\a150". The non-breaking dash, code 150, will not break
+            a line. Use \a160 for a non-breaking space. Even better: use \a151,
+            a long dash, instead of two short dashes.
+
+    * The justification codes \c and \r (center and right justification) must
+      have closing codes on the beginning of the line following the justified
+      text.
+    * The indentation tag \t must have a closing tag at the end of a line of
+      the indented text.
+    * Use \s (small font) in the title page(s) of books to force the page(s) to
+      format nicely. Other than that, \n, \s and \l should rarely be necessary;
+      the font size used for most text display should be chosen by the user.
+
+
+Converting Uncommon Characters to PML
+-------------------------------------
+
+Use this chart to convert uncommon characters to their Palm Markup Language
+(PML) equivalent. Most characters are simply represented as themselves in PML
+and don't require this chart. But some uncommon characters can only be
+represented in PML by their "\aXXX" syntax. Use this chart to look up that
+"\aXXX" syntax.
+
+For Example, if you wanted to write the following phrase in PML:
+
+    Copyright © 1999 by Samuel Morgenstern
+
+In PML, you would write it as:
+
+    Copyright \a169 1999 by Samuel Morgenstern
+
+Char    HTML # Code HTML Char Code  PML Char Code  Description
+
+        &#32;       -               Normal space
+!       &#33;       -       !       Exclamation
+"       &#34;       &quot;  "       Double quote
+#       &#35;       -       #       Hash
+$       &#36;       -       $       Dollar
+%       &#37;       -       %       Percent
+&       &#38;       &amp;   &       Ampersand
+'       &#39;       -       '       Apostrophe
+(       &#40;       -       (       Open bracket
+)       &#41;       -       )       Close bracket
+*       &#42;       -       *       Asterisk
+       &#43;       -       +       Plus sign
+,       &#44;       -       ,       Comma
+-       &#45;       -       -       Minus sign
+.       &#46;       -       .       Period
+/       &#47;       -       /       Forward slash
+0       &#48;       -       0       Digit 0
+1       &#49;       -       1       Digit 1
+2       &#50;       -       2       Digit 2
+3       &#51;       -       3       Digit 3
+4       &#52;       -       4       Digit 4
+5       &#53;       -       5       Digit 5
+6       &#54;       -       6       Digit 6
+7       &#55;       -       7       Digit 7
+8       &#56;       -       8       Digit 8
+9       &#57;       -       9       Digit 9
+:       &#58;       -       :       Colon
+;       &#59;       -       ;       Semicolon
+        &#60;       &lt;    <       Less than
+=       &#61;       -       =       Equals
+        &#62;       &gt;    >       Greater than
+?       &#63;       -       ?       Question mark
+@       &#64;       -       @       At sign
+A       &#65;       -       A       A
+B       &#66;       -       B       B
+C       &#67;       -       C       C
+D       &#68;       -       D       D
+E       &#69;       -       E       E
+F       &#70;       -       F       F
+G       &#71;       -       G       G
+H       &#72;       -       H       H
+I       &#73;       -       I       I
+J       &#74;       -       J       J
+K       &#75;       -       K       K
+L       &#76;       -       L       L
+M       &#77;       -       M       M
+N       &#78;       -       N       N
+O       &#79;       -       O       O
+P       &#80;       -       P       P
+Q       &#81;       -       Q       Q
+R       &#82;       -       R       R
+S       &#83;       -       S       S
+T       &#84;       -       T       T
+U       &#85;       -       U       U
+V       &#86;       -       V       V
+W       &#87;       -       W       W
+X       &#88;       -       X       X
+Y       &#89;       -       Y       Y
+Z       &#90;       -       Z       Z
+[       &#91;       -       [       Open square bracket
+\       &#92;       -       \\       Backslash
+]       &#93;       -       ]       Close square bracket
+^       &#94;       -       ^       Caret
+_       &#95;       -       _       Underscore
+`       &#96;       -       `       Grave accent
+a       &#97;       -       a       a
+b       &#98;       -       b       b
+c       &#99;       -       c       c
+d       &#100;       -       d       d
+e       &#101;       -       e       e
+f       &#102;       -       f       f
+g       &#103;       -       g       g
+h       &#104;       -       h       h
+i       &#105;       -       i       i
+j       &#106;       -       j       j
+k       &#107;       -       k       k
+l       &#108;       -       l       l
+m       &#109;       -       m       m
+n       &#110;       -       n       n
+o       &#111;       -       o       o
+p       &#112;       -       p       p
+q       &#113;       -       q       q
+r       &#114;       -       r       r
+s       &#115;       -       s       s
+t       &#116;       -       t       t
+u       &#117;       -       u       u
+v       &#118;       -       v       v
+w       &#119;       -       w       w
+x       &#120;       -       x       x
+y       &#121;       -       y       y
+z       &#122;       -       z       z
+{       &#123;       -       {       Left brace
+|       &#124;       -       |       Vertical bar
+}       &#125;       -       }       Right brace
+~       &#126;       -       ~       Tilde
+
+        &#160;       &nbsp;     \a160       Non-breaking space
+        &#161;       &iexcl;    \a161       Inverted exclamation
+        &#162;       &cent;     \a162       Cent sign
+        &#163;       &pound;    \a163       Pound sign
+        &#164;       &curren;   \a164       Currency sign
+        &#165;       &yen;      \a165       Yen sign
+        &#166;       &brvbar;   \a166       Broken bar
+        &#167;       &sect;     \a167       Section sign
+        &#168;       &uml;      \a168       Umlaut or diaeresis
+        &#169;       &copy;     \a169       Copyright sign
+        &#170;       &ordf;     \a170       Feminine ordinal
+        &#171;       &laquo;    \a171       Left angle quotes
+        &#172;       &not;      \a172       Logical not sign
+        &#173;       &shy;      \a173       Soft hyphen
+        &#174;       &reg;      \a174       Registered trademark
+        &#175;       &macr;     \a175       Spacing macron
+        &#176;       &deg;      \a176       Degree sign
+        &#177;       &plusmn;   \a177       Plus-minus sign
+        &#178;       &sup2;     \a178       Superscript 2
+        &#179;       &sup3;     \a179       Superscript 3
+        &#180;       &acute;    \a180       Spacing acute
+        &#181;       &micro;    \a181       Micro sign
+        &#182;       &para;     \a182       Paragraph sign
+        &#183;       &middot;   \a183       Middle dot
+        &#184;       &cedil;    \a184       Spacing cedilla
+        &#185;       &sup1;     \a185       Superscript 1
+        &#186;       &ordm;     \a186       Masculine ordinal
+        &#187;       &raquo;    \a187       Right angle quotes
+        &#188;       &frac14;   \a188       One quarter
+        &#189;       &frac12;   \a189       One half
+        &#190;       &frac34;   \a190       Three quarters
+        &#191;       &iquest;   \a191       Inverted question mark
+        &#192;       &Agrave;   \a192       A grave
+        &#193;       &Aacute;   \a193       A acute
+        &#194;       &Acirc;    \a194       A circumflex
+        &#195;       &Atilde;   \a195       A tilde
+        &#196;       &Auml;     \a196       A diaeresis
+        &#197;       &Aring;    \a197       A ring
+        &#198;       &Aelig;    \a198       AE ligature
+        &#199;       &Ccedil;   \a199       C cedilla
+        &#200;       &Egrave;   \a200       E grave
+        &#201;       &Eacute;   \a201       E acute
+        &#202;       &Ecirc;    \a202       E circumflex
+        &#203;       &Euml;     \a203       E diaeresis
+        &#204;       &Igrave;   \a204       I grave
+        &#205;       &Iacute;   \a205       I acute
+        &#206;       &Icirc;    \a206       I circumflex
+        &#207;       &Iuml;     \a207       I diaeresis
+        &#208;       &ETH;      \a208       Eth
+        &#209;       &Ntilde;   \a209       N tilde
+        &#210;       &Ograve;   \a210       O grave
+        &#211;       &Oacute;   \a211       O acute
+        &#212;       &Ocirc;    \a212       O circumflex
+        &#213;       &Otilde;   \a213       O tilde
+        &#214;       &Ouml;     \a214       O diaeresis
+        &#215;       &times;    \a215       Multiplication sign
+        &#216;       &Oslash;   \a216       O slash
+        &#217;       &Ugrave;   \a217       U grave
+        &#218;       &Uacute;   \a218       U acute
+        &#219;       &Ucirc;    \a219       U circumflex
+        &#220;       &Uuml;     \a220       U diaeresis
+        &#221;       &Yacute;   \a221       Y acute
+        &#222;       &THORN;    \a222       THORN
+        &#223;       &szlig;    \a223       sharp s
+        &#224;       &agrave;   \a224       a grave
+        &#225;       &aacute;   \a225       a acute
+        &#226;       &acirc;    \a226       a circumflex
+        &#227;       &atilde;   \a227       a tilde
+        &#228;       &auml;     \a228       a diaeresis
+        &#229;       &aring;    \a229       a ring
+        &#230;       &aelig;    \a230       ae ligature
+        &#231;       &ccedil;   \a231       c cedilla
+        &#232;       &egrave;   \a232       e grave
+        &#233;       &eacute;   \a233       e acute
+        &#234;       &ecirc;    \a234       e circumflex
+        &#235;       &euml;     \a235       e diaeresis
+        &#236;       &igrave;   \a236       i grave
+        &#237;       &iacute;   \a237       i acute
+        &#238;       &icirc;    \a238       i circumflex
+        &#239;       &iuml;     \a239       i diaeresis
+        &#240;       &eth;      \a240       eth
+        &#241;       &ntilde;   \a241       n tilde
+        &#242;       &ograve;   \a242       o grave
+        &#243;       &oacute;   \a243       o acute
+        &#244;       &ocirc;    \a244       o circumflex
+        &#245;       &otilde;   \a245       o tilde
+        &#246;       &ouml;     \a246       o diaeresis
+        &#247;       &divide;   \a247       division sign
+        &#248;       &oslash;   \a248       o slash
+        &#249;       &ugrave;   \a249       u grave
+        &#250;       &uacute;   \a250       u acute
+        &#251;       &ucirc;    \a251       u circumflex
+        &#252;       &uuml;     \a252       u diaeresis
+        &#253;       &yacute;   \a253       y acute
+        &#254;       &thorn;    \a254       thorn
+        &#255;       &yuml;     \a255       y diaeresis
+,       &#8218;      &sbquo;    \a130       single low quote
+        &#402;       &fnof;     \a131       Scripted f
+        &#8222;      &bdquo;    \a132       low quote
+        &#8230;      &hellip;   \a133       Ellipsis
+        &#8224;      &dagger;   \a134       Dagger
+        &#8225;      &Dagger    \a135       Double dagger
+        &#352;       &Scaron;   \a138       Large S w/inverted caret
+<       &#8249;      &lsaquo;   \a139       single left angle quote
+        &#338;       &OElig;    \a140       Large combined oe
+        &#8216;      &lsquo;    \a145       Open single smart quote
+        &#8217;      &rsquo;    \a146       Close single smart quote
+        &#8220;      &ldquo;    \a147       Open double smart quote
+        &#8221;      &rdquo;    \a148       Close double smart quote
+        &#8226;      &bull;     \a149       Bullet
+        &#8211;      &ndash;    \a150       Small dash (en dash)
+        &#8212;      &mdash;    \a151       Large dash (em dash)
+        &#8482;      &trade;    \a153       Trademark
+        &#353;       &scaron;   \a154       Small S w/inverted caret
+>       &#8250;      &rsaquo;   \a155       single right angle quote
+        &#339;       &oelig;    \a156       Small combined oe
+        &#376;       &Yuml;     \a159       Large Y with diaeresis
+
+
+Extended Character Set
+----------------------
+
+In addition to the special characters supported by earlier versions of eReader
+(which can be accessed using the \a### tag), all versions of eReader Pro and
+eReader version 2.4 and later include support for additional special characters
+and symbols. These symbols can be accessed using the \U#### tag, where #### are
+four hexidecimal digits giving the Unicode encoding of the special character.
+
+Only the limited subset of Unicode characters given in the table below are
+supported. In addition, some of the characters that are included in the table
+are not present in eReader Pro versions prior to 2.4. To ensure that the
+characters are displayed correctly, books using these tags should be read using
+eReader or eReader Pro version 2.4 or later.
+
+On Palm OS handhelds these special symbols are only available in one size,
+matching the "Small" font. For best results on Palm OS handhelds the \U tag
+should only be used inside blocks set to the "Small" font by way of \s tags.
+On Palm OS handhelds these special characters are not affected by the font tags
+(\s, \l, \b and \n), the bold style tag (\B), or the small caps style tag (\k).
+
+If the \U characters are not showing up correctly using eReader on your Windows
+desktop or laptop this problem is a result of the fonts for eReader not being
+installed properly. The solution is to go to the directory C:\Windows\Fonts\
+and "double click" on each font that starts with "Maynard". This will open each
+font and allow the system to register it. Close the windows that were opened a
+result of the mouse clicks and the problem should be resolved.
+
+Char     HTML Code     PML Code     Description
+
+Latin Extended-A
+Ā     &#256;     \U0100     LATIN CAPITAL LETTER A WITH MACRON
+ā     &#257;     \U0101     LATIN SMALL LETTER A WITH MACRON
+Ă     &#258;     \U0102     LATIN CAPITAL LETTER A WITH BREVE
+ă     &#259;     \U0103     LATIN SMALL LETTER A WITH BREVE
+ą     &#261;     \U0105     LATIN SMALL LETTER A WITH OGONEK
+ć     &#263;     \U0107     LATIN SMALL LETTER C WITH ACUTE
+Č     &#268;     \U010C     LATIN CAPITAL LETTER C WITH CARON
+č     &#269;     \U010D     LATIN SMALL LETTER C WITH CARON
+Ē     &#274;     \U0112     LATIN CAPITAL LETTER E WITH MACRON
+ē     &#275;     \U0113     LATIN SMALL LETTER E WITH MACRON
+ĕ     &#277;     \U0115     LATIN SMALL LETTER E WITH BREVE
+ė     &#279;     \U0117     LATIN SMALL LETTER E WITH DOT ABOVE
+ę     &#281;     \U0119     LATIN SMALL LETTER E WITH OGONEK
+ě     &#283;     \U011B     LATIN SMALL LETTER E WITH CARON
+ĝ     &#285;     \U011D     LATIN SMALL LETTER G WITH CIRCUMFLEX
+ğ     &#287;     \U011F     LATIN SMALL LETTER G WITH BREVE
+Ī     &#298;     \U012A     LATIN CAPITAL LETTER I WITH MACRON
+ī     &#299;     \U012B     LATIN SMALL LETTER I WITH MACRON
+ĭ     &#301;     \U012D     LATIN SMALL LETTER I WITH BREVE
+į     &#303;     \U012F     LATIN SMALL LETTER I WITH OGONEK
+ı     &#305;     \U0131     LATIN SMALL LETTER DOTLESS I
+Ł     &#321;     \U0141     LATIN CAPITAL LETTER L WITH STROKE
+ł     &#322;     \U0142     LATIN SMALL LETTER L WITH STROKE
+ń     &#324;     \U0144     LATIN SMALL LETTER N WITH ACUTE
+ň     &#328;     \U0148     LATIN SMALL LETTER N WITH CARON
+ŋ     &#331;     \U014B     LATIN SMALL LETTER ENG
+Ō     &#332;     \U014C     LATIN CAPITAL LETTER O WITH MACRON
+ō     &#333;     \U014D     LATIN SMALL LETTER O WITH MACRON
+ŏ     &#335;     \U014F     LATIN SMALL LETTER O WITH BREVE
+ő     &#337;     \U0151     LATIN SMALL LETTER O WITH DOUBLE ACUTE
+ŕ     &#341;     \U0155     LATIN SMALL LETTER R WITH ACUTE
+ř     &#345;     \U0159     LATIN SMALL LETTER R WITH CARON
+Ś     &#346;     \U015A     LATIN CAPITAL LETTER S WITH ACUTE
+ś     &#347;     \U015B     LATIN SMALL LETTER S WITH ACUTE
+ş     &#351;     \U015F     LATIN SMALL LETTER S WITH CEDILLA
+ţ     &#355;     \U0163     LATIN SMALL LETTER T WITH CEDILLA
+ũ     &#361;     \U0169     LATIN SMALL LETTER U WITH TILDE
+ū     &#363;     \U016B     LATIN SMALL LETTER U WITH MACRON
+ŭ     &#365;     \U016D     LATIN SMALL LETTER U WITH BREVE
+ŷ     &#375;     \U0177     LATIN SMALL LETTER Y WITH CIRCUMFLEX
+ź     &#378;     \U017A     LATIN SMALL LETTER Z WITH ACUTE
+Ž     &#381;     \U017D     LATIN CAPITAL LETTER Z WITH CARON
+ž     &#382;     \U017E     LATIN SMALL LETTER Z WITH CARON
+Latin Extended-B
+    &#447;     \U01BF     LATIN LETTER WYNN
+    &#462;     \U01CE     LATIN SMALL LETTER A WITH CARON
+    &#464;     \U01D0     LATIN SMALL LETTER I WITH CARON
+    &#466;     \U01D2     LATIN SMALL LETTER O WITH CARON
+    &#468;     \U01D4     LATIN SMALL LETTER U WITH CARON
+    &#481;     \U01E1     LATIN SMALL LETTER A WITH DOT ABOVE AND MACRON
+    &#483;     \U01E3     LATIN SMALL LETTER AE WITH MACRON
+    &#487;     \U01E7     LATIN SMALL LETTER G WITH CARON
+    &#491;     \U01EB     LATIN SMALL LETTER O WITH OGONEK
+    &#496;     \U01F0     LATIN SMALL LETTER J WITH CARON
+    &#519;     \U0207     LATIN SMALL LETTER E WITH INVERTED BREVE
+    &#541;     \U021D     LATIN SMALL LETTER YOGH
+    &#551;     \U0227     LATIN SMALL LETTER A WITH DOT ABOVE
+    &#559;     \U022F     LATIN SMALL LETTER O WITH DOT ABOVE
+    &#563;     \U0233     LATIN SMALL LETTER Y WITH MACRON
+IPA Extensions
+    &#593;     \U0251     LATIN SMALL LETTER SCRIPT A
+    &#594;     \U0252     LATIN SMALL LETTER TURNED SCRIPT A
+    &#596;     \U0254     LATIN SMALL LETTER OPEN O
+    &#601;     \U0259     LATIN SMALL LETTER SCHWA
+    &#604;     \U025C     LATIN SMALL LETTER REVERSED OPEN E
+    &#613;     \U0265     LATIN LETTER SMALL LETTER TURNED H
+    &#618;     \U026A     LATIN LETTER SMALL CAPITAL I
+    &#626;     \U0272     LATIN SMALL LETTER N WITH LEFT HOOK
+    &#643;     \U0283     LATIN SMALL LETTER ESH
+    &#649;     \U0289     LATIN SMALL LETTER U BAR
+    &#650;     \U028A     LATIN SMALL LETTER UPSILON
+    &#652;     \U028C     LATIN SMALL LETTER TURNED V
+    &#655;     \U028F     LATIN LETTER SMALL CAPITAL Y
+    &#658;     \U0292     LATIN SMALL LETTER EZH
+    &#660;     \U0294     LATIN LETTER GLOTTAL STOP
+    &#668;     \U029C     LATIN LETTER SMALL CAPITAL H
+Spacing Modifier Letters
+    &#702;     \U02BE     MODIFIER LETTER RIGHT HALF RING
+    &#703;     \U02BF     MODIFIER LETTER LEFT HALF RING
+ˇ   &#711;     \U02C7     CARON
+    &#712;     \U02C8     MODIFIER LETTER VERTICAL LINE
+    &#716;     \U02CC     MODIFIER LETTER LOW VERTICAL LINE
+    &#720;     \U02D0     MODIFIER LETTER TRIANGULAR COLON
+˘   &#728;     \U02D8     BREVE
+˙   &#729;     \U02D9     DOT ABOVE
+Greek and Coptic
+Α     &#913;     \U0391     GREEK CAPTIAL LETTER ALPHA
+Β     &#914;     \U0392     GREEK CAPTIAL LETTER BETA
+Γ     &#915;     \U0393     GREEK CAPTIAL LETTER GAMMA
+Δ     &#917;     \U0394     GREEK CAPTIAL LETTER DELTA
+Ε     &#917;     \U0395     GREEK CAPTIAL LETTER EPSILON
+Ζ     &#918;     \U0396     GREEK CAPTIAL LETTER ZETA
+Η     &#919;     \U0397     GREEK CAPTIAL LETTER ETA
+Θ     &#920;     \U0398     GREEK CAPTIAL LETTER THETA
+Ι     &#921;     \U0399     GREEK CAPTIAL LETTER IOTA
+Κ     &#922;     \U039A     GREEK CAPTIAL LETTER KAPPA
+Λ     &#923;     \U039B     GREEK CAPTIAL LETTER LAMBDA
+Μ     &#924;     \U039C     GREEK CAPTIAL LETTER MU
+Ν     &#925;     \U039D     GREEK CAPTIAL LETTER NU
+Ξ     &#926;     \U039E     GREEK CAPTIAL LETTER XI
+Ο     &#927;     \U039F     GREEK CAPTIAL LETTER OMICRON
+Π     &#928;     \U03A0     GREEK CAPTIAL LETTER PI
+Ρ     &#929;     \U03A1     GREEK CAPTIAL LETTER RHO
+Σ     &#931;     \U03A3     GREEK CAPTIAL LETTER SIGMA
+Τ     &#932;     \U03A4     GREEK CAPTIAL LETTER TAU
+Υ     &#933;     \U03A5     GREEK CAPTIAL LETTER UPSILON
+Φ     &#934;     \U03A6     GREEK CAPTIAL LETTER PHI
+Χ     &#935;     \U03A7     GREEK CAPTIAL LETTER CHI
+Ψ     &#936;     \U03A8     GREEK CAPTIAL LETTER PSI
+Ω     &#937;     \U03A9     GREEK CAPTIAL LETTER OMEGA
+α     &#945;     \U03B1     GREEK SMALL LETTER ALPHA
+β     &#946;     \U03B2     GREEK SMALL LETTER BETA
+γ     &#947;     \U03B3     GREEK SMALL LETTER GAMMA
+δ     &#948;     \U03B4     GREEK SMALL LETTER DELTA
+ε     &#949;     \U03B5     GREEK SMALL LETTER EPSILON
+ζ     &#950;     \U03B6     GREEK SMALL LETTER ZETA
+η     &#951;     \U03B7     GREEK SMALL LETTER ETA
+θ     &#952;     \U03B8     GREEK SMALL LETTER THETA
+ι     &#953;     \U03B9     GREEK SMALL LETTER IOTA
+κ     &#954;     \U03BA     GREEK SMALL LETTER KAPPA
+λ     &#955;     \U03BB     GREEK SMALL LETTER LAMBDA
+μ     &#956;     \U03BC     GREEK SMALL LETTER MU
+ν     &#957;     \U03BD     GREEK SMALL LETTER NU
+ξ     &#958;     \U03BE     GREEK SMALL LETTER XI
+ο     &#959;     \U03BF     GREEK SMALL LETTER OMICRON
+π     &#960;     \U03C0     GREEK SMALL LETTER PI
+ρ     &#961;     \U03C1     GREEK SMALL LETTER RHO
+ς     &#962;     \U03C2     GREEK SMALL LETTER FINAL SIGMA
+σ     &#963;     \U03C3     GREEK SMALL LETTER SIGMA
+τ     &#964;     \U03C4     GREEK SMALL LETTER TAU
+υ     &#965;     \U03C5     GREEK SMALL LETTER UPSILON
+φ     &#966;     \U03C6     GREEK SMALL LETTER PHI
+χ     &#967;     \U03C7     GREEK SMALL LETTER CHI
+ψ     &#968;     \U03C8     GREEK SMALL LETTER PSI
+ω     &#969;     \U03C9     GREEK SMALL LETTER OMEGA
+      &#977;     \U03D1     GREEK THETA SYMBOL
+      &#989;     \U03DD     GREEK SMALL LETTER DIGAMMA
+Hebrew
+א     &#1488;     \U05D0     HEBREW LETTER ALEPH
+ב     &#1489;     \U05D1     HEBREW LETTER BET
+ג     &#1490;     \U05D2     HEBREW LETTER GIMEL
+ד     &#1491;     \U05D3     HEBREW LETTER DALET
+ה     &#1492;     \U05D4     HEBREW LETTER HE
+ו     &#1493;     \U05D5     HEBREW LETTER VAV
+ז     &#1494;     \U05D6     HEBREW LETTER ZAYIN
+ח     &#1495;     \U05D7     HEBREW LETTER HET
+ט     &#1496;     \U05D8     HEBREW LETTER TET
+י     &#1497;     \U05D9     HEBREW LETTER YOD
+ך     &#1498;     \U05DA     HEBREW LETTER FINAL KAF
+כ     &#1499;     \U05DB     HEBREW LETTER KAF
+ל     &#1500;     \U05DC     HEBREW LETTER LAMED
+ם     &#1501;     \U05DD     HEBREW LETTER FINAL MEM
+מ     &#1502;     \U05DE     HEBREW LETTER MEM
+ן     &#1503;     \U05DF     HEBREW LETTER FINAL NUN
+נ     &#1504;     \U05E0     HEBREW LETTER NUN
+ס     &#1505;     \U05E1     HEBREW LETTER SAMEKH
+ע     &#1506;     \U05E2     HEBREW LETTER AYIN
+ף     &#1507;     \U05E3     HEBREW LETTER FINAL PE
+פ     &#1508;     \U05E4     HEBREW LETTER PE
+ץ     &#1509;     \U05E5     HEBREW LETTER FINAL TSADI
+צ     &#1510;     \U05E6     HEBREW LETTER TSADI
+ק     &#1511;     \U05E7     HEBREW LETTER QOF
+ר     &#1512;     \U05E8     HEBREW LETTER RESH
+ת     &#1514;     \U05EA     HEBREW LETTER TAV
+Latin Extended Additional
+    &#7691;     \U1E0B     LATIN SMALL LETTER D WITH DOT ABOVE
+    &#7693;     \U1E0D     LATIN SMALL LETTER D WITH DOT BELOW
+    &#7703;     \U1E17     LATIN SMALL LETTER E WITH MACRON AND ACUTE
+    &#7714;     \U1E22     LATIN CAPITAL LETTER H WITH DOT ABOVE
+    &#7716;     \U1E24     LATIN CAPITAL LETTER H WITH DOT BELOW
+    &#7717;     \U1E25     LATIN SMALL LETTER H WITH DOT BELOW
+    &#7723;     \U1E2B     LATIN SMALL LETTER H WITH BREVE BELOW
+    &#7731;     \U1E33     LATIN SMALL LETTER K WITH DOT BELOW
+    &#7735;     \U1E37     LATIN SMALL LETTER L WITH DOT BELOW
+    &#7745;     \U1E41     LATIN SMALL LETTER M WITH DOT ABOVE
+    &#7747;     \U1E43     LATIN SMALL LETTER M WITH DOT BELOW
+    &#7749;     \U1E45     LATIN SMALL LETTER N WITH DOT ABOVE
+    &#7751;     \U1E47     LATIN SMALL LETTER N WITH DOT BELOW
+    &#7763;     \U1E53     LATIN SMALL LETTER O WITH MACRON AND ACUTE
+    &#7769;     \U1E59     LATIN SMALL LETTER R WITH DOT ABOVE
+    &#7770;     \U1E5A     LATIN CAPITAL LETTER R WITH DOT BELOW
+    &#7771;     \U1E5B     LATIN SMALL LETTER R WITH DOT BELOW
+    &#7777;     \U1E61     LATIN SMALL LETTER S WITH DOT ABOVE
+    &#7779;     \U1E63     LATIN SMALL LETTER S WITH DOT BELOW
+    &#7787;     \U1E6B     LATIN SMALL LETTER T WITH DOT ABOVE
+    &#7789;     \U1E6D     LATIN SMALL LETTER T WITH DOT BELOW
+    &#7791;     \U1E6F     LATIN SMALL LETTER T WITH LINE BELOW
+    &#7825;     \U1E91     LATIN SMALL LETTER Z WITH CIRCUMFLEX
+    &#7827;     \U1E93     LATIN SMALL LETTER Z WITH DOT BELOW
+    &#7830;     \U1E96     LATIN SMALL LETTER H WITH LINE BELOW
+    &#7841;     \U1EA1     LATIN SMALL LETTER A WITH DOT BELOW
+    &#7885;     \U1ECD     LATIN SMALL LETTER O WITH DOT BELOW
+    &#7929;     \U1EF9     LATIN SMALL LETTER Y WITH TILDE
+General Punctuation
+-   &#8209;     \U2011     NON-BREAKING HYPHEN
+    &#8248;     \U2038     CARET
+    &#8253;     \U203D     INTERROBANG
+    &#8258;     \U2042     ASTERISM
+Arrows
+←   &#8592;     \U2190     LEFTWARDS ARROW
+→   &#8594;     \U2192     RIGHTWARDS ARROW
+Mathematical Operators
+∂   &#8706;     \U2202     PARTIAL DIFFERENTIAL
+√   &#8730;     \U221A     SQUARE ROOT
+∞   &#8734;     \U221E     INFINITY
+∥   &#8741;     \U2225     PARALLEL TO
+∫   &#8747;     \U222B     INTEGRAL
+≠   &#8800;     \U2260     NOT EQUAL TO
+    &#8852;     \U2294     SQUARE CUP
+    &#8853;     \U2295     CIRCLED PLUS
+    &#8942;     \U22EE     VERTICAL ELLIPSIS
+Enclosed Alphanumerics
+    &#9418;     \U24CA     CIRCLED LATIN CAPITAL LETTER U
+Miscellaneous Symbols
+☜   &#9756;     \U261C     WHITE LEFT POINTING INDEX
+☞   &#9758;     \U261E     WHITE RIGHT POINTING INDEX
+    &#9791;     \U263F     MERCURY
+    &#9792;     \U2640     FEMALE SIGN
+    &#9794;     \U2642     MALE SIGN
+    &#9795;     \U2643     JUPITER
+    &#9796;     \U2644     SATURN
+    &#9797;     \U2645     URANUS
+    &#9798;     \U2646     NEPTUNE
+    &#9799;     \U2647     PLUTO
+    &#9824;     \U2660     BLACK SPADE SUIT
+    &#9825;     \U2661     WHITE HEART SUIT
+    &#9826;     \U2662     WHITE DIAMOND SUIT
+    &#9827;     \U2663     BLACK CLUB SUIT
+    &#9837;     \U266D     MUSIC FLAT SIGN
+    &#9838;     \U266E     MUSIC NATURAL SIGN
+    &#9839;     \U266F     MUSIC SHARP SIGN
+Dingbats
+    &#10003;     \U2713     CHECK MARK
+    &#10016;     \U2720     MALTESE CROSS
+Private Use Area
+    -     \UE000     LATIN SMALL LETTER A WITH MACRON AND ACUTE
+    -     \UE001     LATIN SMALL LETTER A WITH MACRON AND TILDE
+    -     \UE002     LATIN SMALL LETTER A WITH VERTICAL LINE ABOVE
+    -     \UE003     LATIN CAPITAL LETTER C WITH MACRON
+    -     \UE004     LATIN SMALL LETTER C WITH MACRON
+    -     \UE005     LATIN SMALL LETTER C WITH BREVE
+    -     \UE006     LATIN SMALL LETTER C WITH DOT BELOW
+    -     \UE007     LATIN SMALL LIGATURE CH
+    -     \UE008     LATIN CAPITAL LETTER D WITH MACRON
+    -     \UE009     LATIN SMALL LETTER E WITH BAR BELOW
+    -     \UE00A     LATIN SMALL LETTER E WITH TILDE
+    -     \UE00B     LATIN SMALL LETTER E WITH MACRON AND BREVE
+    -     \UE00C     LATIN SMALL LETTER E WITH TILDE AND DOT ABOVE
+    -     \UE00D     LATIN SMALL LETTER E WITH HOOK RIGHT BELOW
+    -     \UE00E     LATIN SMALL LETTER G WITH INVERTED BREVE
+    -     \UE00F     LATIN SMALL LETTER I WITH INVERTED BREVE BELOW
+    -     \UE010     LATIN SMALL LETTER I WITH MACRON AND ACUTE
+    -     \UE011     LATIN SMALL LETTER K WITH CIRCUMFLEX
+    -     \UE012     LATIN SMALL LETTER K WITH BREVE
+    -     \UE013     LATIN SMALL LETTER K WITH INVERTED BREVE
+    -     \UE014     LATIN SMALL LIGATURE KH
+    -     \UE015     LATIN CAPITAL LETTER L WITH MACRON
+    -     \UE016     LATIN SMALL LETTER L WITH TILDE
+    -     \UE017     LATIN SMALL LETTER L WITH INVERTED BREVE
+    -     \UE018     LATIN CAPITAL LETTER M WITH MACRON
+    -     \UE019     LATIN SMALL LETTER M WITH MACRON
+    -     \UE01A     LATIN SMALL LETTER M WITH TILDE
+    -     \UE01B     LATIN SMALL LETTER O WITH CEDILLA
+    -     \UE01C     LATIN SMALL LETTER O WITH MACRON AND CIRUMFLEX
+    -     \UE01E     LATIN SMALL LIGATURE OI
+    -     \UE01F     LATIN SMALL LIGATURE OO
+    -     \UE020     LATIN SMALL LIGATURE OO WITH MACRON
+    -     \UE021     LATIN SMALL LIGATURE OU
+    -     \UE022     LATIN SMALL LETTER OPEN O WITH ACUTE
+    -     \UE023     LATIN SMALL LETTER R WITH DIARESIS
+    -     \UE024     LATIN SMALL LETTER R WITH CIRCUMFLEX
+    -     \UE025     LATIN SMALL LETTER R WITH RING BELOW
+    -     \UE026     LATIN SMALL LETTER S WITH VERTICAL LINE ABOVE
+    -     \UE027     LATIN SMALL LETTER S WITH OGONEK
+    -     \UE028     LATIN SMALL LETTER S WITH COMMA
+    -     \UE02A     LATIN SMALL LETTER S WITH BREVE
+    -     \UE02B     LATIN SMALL LIGATURE SH
+    -     \UE02C     LATIN SMALL LIGATURE TH
+    -     \UE02D     LATIN SMALL LETTER U WITH MACRON AND ACUTE
+    -     \UE02E     LATIN CAPITAL LETTER V WITH MACRON
+    -     \UE02F     LATIN CAPITAL LETTER X WITH MACRON
+    -     \UE030     LATIN SMALL LETTER X WITH CIRCUMFLEX
+    -     \UE031     LATIN SMALL LETTER Y WITH BREVE
+    -     \UE032     LATIN SMALL LIGATURE ZH
+    -     \UE033     LATIN SMALL LETTER TURNED E WITH ACUTE
+    -     \UE034     LATIN SMALL LETTER TURNED E WITH CIRCUMFLEX
+    -     \UE035     GREEK SMALL LETTER ALPHA WITH GRAVE
+    -     \UE036     MUSICAL SYMBOL SEGNO
+    -     \UE037     MUSICAL SYMBOL FERMATA
+    -     \UE038     MUSICAL SYMBOL CRESCENDO
+    -     \UE039     MUSICAL SYMBOL DECRESCENDO
+    -     \UE03A     MUSICAL SYMBOL DOUBLE SHARP
+    -     \UE03B     MUSICAL SYMBOL BREVE
+    -     \UE03C     MUSICAL SYMBOL DOWN BOW
+    -     \UE03D     MUSICAL SYMBOL UP BOW
+    -     \UE03E     MUSICAL SYMBOL BREVE ALTERNATE
+    -     \UE03F     PRINTING SYMBOL DELE
+    -     \UE040     PRINTING SYMBOL FRACTIONAL EM
+    -     \UE041     INVERTED ASTERISM
+    -     \UE042     LATIN SMALL LETTER SCHWA SUPERSCRIPT
+    -     \UE043     LATIN SMALL LETTER TURNED Y
+    -     \UE044     LATIN SMALL LIGATURE OE WITH MACRON
+    -     \UE045     SQUARE ROOT WITH BAR
+    -     \UE046     LATIN SMALL LETTER U WITH DOT ABOVE
+    -     \UE047     LATIN SMALL LIGATURE UE
+    -     \UE048     LATIN SMALL LIGATURE UE WITH MACRON
+    -     \UE049     LATIN SMALL LETTER OPEN O WITH TILDE
+    -     \UE04A     LATIN SMALL LETTER T WITH CARON BELOW
+    -     \UE04B     LATIN SMALL LETTER SCRIPT A WITH TILDE
+    -     \UE04C     GREEK SMALL LETTER EPSILON WITH TILDE
+    -     \UE04D     LATIN SMALL LIGATURE OE WITH TILDE
+    -     \UE04E     MODIFIER LETTER DOUBLE VERTICAL LINE
+    -     \UE04F     DOUBLE HYPHEN
+    -     \UE050     LATIN SMALL LETTER SCHWA WITH DOT ABOVE
+    -     \UE051     LATIN SMALL LETTER SCHWA WITH MACRON
+Alphabetic Presentation Forms
+ﬂ     &#64258;     \UFB02     LATIN SMALL LIGATURE FL
+שׁ     &#64298;     \UFB2A     HEBREW LETTER SINH WITH SHIN DOT
+שׂ     &#64299;     \UFB2B     HEBREW LETTER SINH WITH SIN DOT
+
--- a/format_docs/pdb/ztxt.txt
+++ b/format_docs/pdb/ztxt.txt
@ -0,0 +1,226 @@
+The zTXT Format
+---------------
+
+The zTXT format is relatively straightforward. The simplest zTXT contains a
+Palm database header, followed by zTXT record #0, followed by the compressed
+data. The compressed data can be in one of two formats: one long data stream,
+or split into chunks for random access. If there are any bookmarks, they occupy
+the record immediately after the compressed data. If there are any annotations,
+the annotation index occupies the record immediately after the bookmarks with
+each annotation in the index having a record immediately after the annotation
+index. Here are diagrams of a simple zTXT and a full featured zTXT:
+
+    DB Header
+0   Record 0
+1
+2
+3
+... Compressed Data
+36
+37
+38
+
+    DB Header
+0   Record 0
+1
+2
+3
+... Compressed Data
+36
+37
+38
+39  Bookmarks
+40  Annotation Index
+41  Annotation 1
+42  Annotation 2
+43  Annotation 3
+
+
+Compression Modes
+-----------------
+
+zTXT version 1.40 and later supports two modes of compression. Mode 1 is a
+random access mode, and mode 2 consists of one long data stream. Both modes
+work on 8K (the default record size) blocks of text.
+
+Please note, however, that as of Weasel Reader version 1.60 the old style
+(mode 2) zTXT format is no longer supported. makeztxt and libztxt still support
+creating these documents for backwards compatibility, but you should not use
+mode 2 if possible.
+
+
+Mode 1
+------
+
+In mode one, 8K blocks of text are compressed into an equal number of blocks of
+compressed data. Using the Z_FULL_FLUSH flush mode with zLib allows for random
+access among the blocks of data. In order for this to function, the first block
+must be decompressed first, and after that any block in the file may be
+decompressed in any order. In mode 1, the blocks of compressed data will likely
+not all have the same size.
+
+
+Mode 2
+------
+
+In zTXT versions before 1.40, this was the only method of compression. This
+mode involves compressing the entire input buffer into a single output buffer
+and then splitting the resulting buffer into 8K segments. This mode requires
+that all of the compressed data be decompressed in one pass. Since there are no
+real 'blocks' of data, the resulting output can be of any blocksize, though
+typically the default of 8K should be fine. The advantage to mode 2 is that it
+will give about 10% - 15% more compression.
+
+
+zTXT Record #0 Definition (version 1.44)
+----------------------------------------
+
+Record 0 provides all of the information about the zTXT contents. Be sure it is
+correct, lest firey death rain down upon your program.
+
+typedef struct zTXT_record0Type {
+  UInt16        version;
+  UInt16        numRecords;
+  UInt32        size;
+  UInt16        recordSize;
+  UInt16        numBookmarks;
+  UInt16        bookmarkRecord;
+  UInt16        numAnnotations;
+  UInt16        annotationRecord;
+  UInt8         flags;
+  UInt8         reserved;
+  UInt32        crc32;
+  UInt8         padding[0x20 - 24];
+} zTXT_record0;
+
+
+Structure Elements
+------------------
+
+UInt16        version;
+
+This is mostly just informational. Your program can figure out what features
+might be available from the version. However, the remaining parts of the
+structure are designed such that their value will be 0 if that particular
+feature is not present, so that is the correct way to test. The version is
+stored as two 8 bit integers. For example, version 1.42 is 0x012A.
+
+UInt16        numRecords;
+
+This is the number of DATA records only and does not include record 0,
+bookmarks, or annotations. With compression mode 1, this is also the number of
+uncompressed text records. With mode 2, you must decompress the file to figure
+out how many text records there will be.
+
+UInt32        size;
+
+The size in bytes of the uncompressed data in the zTXT. Check this value with
+the amount of free storage memory on the Palm to make sure there's enough room
+to decompress the data in full or in part.
+
+UInt16        recordSize;
+
+recordSize is the size in bytes of a text record. This field is important, as
+the size of text and decompression buffers is based on this value. It is used
+by Weasel to navigate though the text so it can map absolute offsets to record
+numberss. 8192 is the default. With compression mode 1, this is the amount of
+data inside each compressed record (except maybe the last one), but the actual
+compressed records will likely have varying sizes. In mode 2, both compressed
+records and the resulting text records are all of this size (except, again, the
+last record).
+
+UInt16        numBookmarks;
+
+The definitive count of how many bookmarks are stored in the bookmark index
+record. See the section on bookmarks below.
+
+UInt16        bookmarkRecord;
+
+If there are any bookmarks, this is set to the record index number that
+contains the bookmark listing, otherwise it is 0.
+
+UInt16        numAnnotations;
+
+Like the bookmark count, this is the definitive count of how many annotations
+are in the annotation index and how many annotation records follow it. See the
+section on annotation below.
+
+UInt16        annotationRecord;
+
+If there are any annotations, this is set to the record index number that
+contains the annotation index, otherwise it is 0.
+
+UInt8         flags;
+
+These flags indicate various features of the zTXT database. flags is a bitmask
+and at present the only two defined bits are:
+
+ZTXT_RANDOMACCESS (0x01)
+    If the zTXT was compressed according to the method in mode 1, then it
+    supports random access and this should be set.
+ZTXT_NONUNIFORM (0x02)
+    Setting this bit indicates that the text records within the zTXT database
+    are not of uniform length. That is, when the blocks of text are
+    decompressed they will not have identical block sizes. If this is not set,
+    the compressed blocks are assumed to all have the same size when
+    decompressed (typically 8K) except for the last block which can be smaller.
+
+UInt32        crc32;
+
+A CRC32 value for checking data integrity. This value is computer over all text
+data record only and does not include record 0 nor any bookmark/annotation
+records. The current implementation in makeztxt/Weasel computes this value
+using the crc32 function in zLib which should be the standard CRC32 definition.
+
+UInt8         padding[0x20 - 24];
+
+zTXT record zero is 32 bytes in length, so the unused portion is padded.
+
+
+zTXT Bookmarks
+--------------
+
+zTXT bookmarks are stored in a simple array in a record at the end of a zTXT.
+The format is as follows:
+
+#define MAX_BMRK_LENGTH         20
+
+typedef struct GPlmMarkType {
+  UInt32        offset;
+  Char          title[MAX_BMRK_LENGTH];
+} GPlmMark;
+
+In the structure, offset is counted as an absolute offset into the text. The
+bookmarks must be sorted in ascending order.
+
+If there are no bookmarks, then the bookmark index does not exist. When the
+user creates the first bookmark, the record containing the index will then be
+created. If there are annotations, when the bookmark record is created it must
+go before the annotation index. This will require incrementing annotationRecord
+in record 0 to point to the new record index.
+
+Similarly, when all bookmarks are deleted the bookmark index record is also
+deleted. If there are annotations, annotationRecord in record 0 must be
+decremented to point to the new index.
+
+
+zTXT Annotations
+----------------
+
+zTXT annotations have a format almost identical to that of the bookmark index:
+
+typedef struct GPlmAnnotationType {
+  UInt32        offset;
+  Char          title[MAX_BMRK_LENGTH];
+} GPlmAnnotation;
+
+Like the bookmarks, offset is an absolute offset into the text. The annotation
+index is organized just as the bookmarks are, as a single array in a record.
+Note that this structure does NOT store the actual annotation text.
+
+The text of each annotation is stored in its own record immediately following
+the index. So, the first annotation in the index will occupy the first record
+following the index, and the second annotation will be in the second record
+following the index, and so on. The text of each annotation is limited to
+4096 bytes.
+
--- a/format_docs/rb.txt
+++ b/format_docs/rb.txt
@ -0,0 +1,303 @@
+Rocket eBook File Format
+------------------------
+
+from http://rbmake.sourceforge.net/rb_format.html
+
+
+Overview
+--------
+
+This document attempts to describe the format of a .rb file -- the book
+format that is downloaded into NuvoMedia's <http://www.nuvomedia.com>
+hand-held wonder, the Rocket eBook
+<http://www.rocket-ebook.com/enter.html>.
+
+*Note:* All multi-byte integers are stored in Vax/Intel order (the
+opposite of network byte order). Most integers are 4 bytes (an int32),
+but there are some minor exceptions (as detailed below).
+
+Also, the following document refers to the .rb file sections as "pages".
+
+
+Details
+-------
+
+The first 4 bytes of the file seem to be a magic number (in hex): B0 0C
+B0 0C. I like to think of this as a hexidecimal pun on the word "book"
+(repeated). [Matt Greenwood has reported seeing a magic number of "B0 0C
+F0 0D" in another type of ReB-related file -- i.e. "book food".]
+
+The next two bytes appear to be a version number, currently "02 00". I
+assume this means major version 2, minor version 0.
+
+The next 4 bytes are the string "NUVO", followed by 4 bytes of 00h. (I
+have also seen an old title that had 0s in place of the "NUVO".)
+
+This brings us up to offset 0Eh, at which point we have a 4-byte
+representation of the date the book was created (Matt Greenwood pointed
+this out to me -- thanks!). The year is encoded as an int16. On older
+version of the RocketLibrary was encoding the year's full value (e.g.
+1999 was "CF 07" and 2000 was "D0 07"), but a more recent version is now
+using the tm_year value verbatim -- i.e. it's storing 100 for the year
+2000 ("64 00"). The year is followed by an int8 for the 1-relative month
+number, and an int8 for the day of the month.
+
+After that is 6 bytes of 00h. These may be reserved for setting the time
+of creation (at a guess).
+
+Then, at offset 18h, we have an int32 that contains the absolute offset
+of the "Table of Contents" (the directory of the pages contained within
+this .rb file). In all of the .rb file's I've seen, this remains
+constant with a value of 128h. However, I have tested an atypical .rb
+file where I placed the ToC at the end of the file (after all the file
+contents), and it worked fine. (I've chosen not to build any books in
+such a non-standard format, however.)
+
+Immediately following this is an int32 with the length of the .rb file
+(so we can check if the file is complete or not).
+
+All the bytes from here (offset 20h) up to offset 128h appear to only be
+used by an encrypted title. In a non-encrypted title, they are always 0.
+
+The table of contents typically comes next (at offset 128h). It starts
+with an int32 count of the number of "page" entries (.rb-file sections)
+in the ToC. Each entry consists of a name (zero-padded to 32 bytes),
+followed by 3 int32s: the length of this entry's data segment, the
+absolute offset of the data in the .rb file, and a flag. The known flag
+values are: 1 (encrypted), 2 (info page), and 8 (deflated). The names
+are tweaked as needed to ensure that they are all unique. The current
+RocketWriter software uses a unique 6-digit number, a dash, up to 8
+characters from the filename, and then the re-mapped suffix for the data
+(.html, .hidx, .png, .info, etc.). My rbmake library simply ensures that
+the names are no longer than 15 characters (not counting the suffix) and
+are all unique.
+
+Often the first item in the ToC is the info page, but it doesn't have to
+be. This page of information contains NAME=VALUE pairs that note the
+author, title, what the root-page's name is, etc. (See appendix A). This
+data is never encrypted nor compressed, so this entry's flag value is
+always "2".
+
+An image page is always stored as a B&W image in PNG format. Since it
+has its own compression, it is stored without any additional attempt at
+deflation. I have also never seen an encrypted image, so its flag value
+is always 0.
+
+An HTML page contains the tags and text that were re-written into a
+consistent syntax (this presumably makes the HTML renderer in the ReB
+itself simpler). HTML pages are typically compressed (See appendix B).
+Every HTML page appears to use the suffix .html no matter what the file
+name was on import (but I have seen older files with .htm used as the
+suffix, so the rocket appears to support both).
+
+For every HTML page there is a corresponding .hidx page that contains a
+summary of the paragraph formatting and the position of the anchor names
+in the associated .html page (See appendix C). This page is sometimes
+compressed, depending on length (See appendix B).
+
+There are also reference titles that have a .hkey page that contains a
+list of words that can be looked up in the associated .html page (See
+appendix D).
+
+Immediately following the ToC is the data for each piece mentioned in
+the ToC, in the same order as it appeared in the ToC.
+
+Finally, the end of the file appears to be padded with 20 bytes of 01h.
+
+
+Appendix A: Info Page Format
+----------------------------
+
+The info page consists of a series of lines that contain "NAME=VALUE"
+strings. Each line is terminated by a single newline. Here are the
+values that the RocketWriter generates:
+
+    COMMENT=Info file for <title>
+    TYPE=2
+    TITLE=<title>
+    AUTHOR=<author>
+    URL=ebook:<long, unique string used for the file's name by the librarian>
+    GENERATOR=<e.g. RocketLibrarian 1.3.216>
+    PARSE=1
+    OUTPUT=1
+    BODY=<name of root HTML page (as it appears in the ToC)>
+    MENUMARK=menumark.html
+    SuggestedRetailPrice=<usually empty>
+
+Encrypted titles have a few more entries (including those listed above):
+
+    ISBN=<ISBN number, including dashes>
+    REVISION=<digits>
+    TITLE_LANGUAGE=<en-us>
+    PUB_NAME=<Publisher's name>
+    PUBSERVER_ID=<digits>
+    GENERATOR=<e.g. RocketPress 1.3.121>
+    VERSION=<digits>
+    USERNAME=<rocket-ID>
+    COPY_ID=<digits>
+    COPYRIGHT=<copyright>
+    COPYTITLE=<another copyright?>
+
+A reference title also has an indication that there is a .hkey page
+present, and may also have a GENRE of "Reference":
+
+    HKEY=1
+    GENRE=Reference
+
+
+Appendix B: The format of compressed data
+-----------------------------------------
+
+Compressed pages have a data section in the .rb file with the following
+format:
+
+The first int32 is a count of the number of 4096-byte chunks of data we
+broke the uncompressed page into (the last chunk can be shorter than
+4096 bytes, of course).
+
+This is immediately followed by an int32 with the length of the entire
+uncompressed data.
+
+After this there are <count> int32s that indicate the size of each
+chunk's compressed data.
+
+Following these length int32s is the output from a deflation (the
+algorithm used in gzip) for each 4096-byte chunk of the original data.
+It appears that you must use a window-bit size of 13 and a compression
+level of "best" to be compatible with the Rocket eBook's system software.
+
+
+Appendix C: HTML-index Page Format
+----------------------------------
+
+The .hidx page's purpose is to allow the renderer to quickly look up the
+format of each paragraph (useful for random access to the data), and the
+position of the anchor names.
+
+The first section lists the various paragraph-producing tags. It is
+headed by a line of "[tags <count>]", where <count> is the number of
+tags that follow this header. The tags are listed one per line, and have
+an implied enumeration from 0 to N-1 (which the other tags and the
+upcoming paragraph sections reference).
+
+The first tag is typically (always?) "<HTML> -1". The number trailing
+the tag indicates what other tag (or sequence of tags, one per line) in
+which we are nested. So, if we have a <BR> nested inside a <P
+ALIGN="center">, it would be listed separately from a <BR> that was
+nested inside a normal paragraph, and each one would have a different
+trailing index number.
+
+Following the tag section is the paragraph section. The heading is
+"[paragraphs <count>]", and is followed by a line for each paragraph.
+These lines consist of a character offset into the .html page for the
+start of the paragraph followed by a 0-relative offset into the tag
+section (indicating what kind of formatting to use for the indicated
+paragraph).
+
+The paragraph-section character offsets point to the first bit of text
+after the associated tag.
+
+The last section details the anchor names. The heading is
+"[names <count>]", and each item that follows is a quoted string of the
+anchor name, followed by a character offset into the .html page where
+we'll find that name. If there are no names in the associated HTML
+section, the heading is included with a 0 count (i.e. "[names 0]").
+
+The name-section character offsets point to the start of the anchor tag
+(not after the tag, like the offsets in the "paragraphs" section).
+
+The lines are terminated by newlines (in standard unix fashion).
+
+For example:
+
+    [tags 10]
+    <HTML> -1
+    <BODY> 0
+    <P ALIGN="right"> 1
+    <P ALIGN="left"> 1
+    <P> 1
+    <H3 ALIGN="center"> 1
+    <P ALIGN="center"> 1
+    <BR> 6
+    <H2 ALIGN="center"> 1
+    <BR> 1
+
+    [paragraphs 42]
+    160 9
+    164 9
+    184 8
+    220 8
+    261 6
+    316 5
+    359 1
+    379 6
+    410 6
+    460 7
+    511 7
+    564 7
+    616 7
+    668 7
+    720 7
+    773 7
+    827 7
+    880 7
+    933 7
+    988 7
+    1043 7
+    1100 7
+    1157 7
+    1214 7
+    1270 7
+    1328 7
+    1385 7
+    1442 7
+    1497 7
+    1556 7
+    1561 7
+    1635 1
+    1656 5
+    1690 6
+    1737 7
+    1773 5
+    1798 4
+    1826 3
+    2663 1
+    2668 4
+    2689 2
+    2730 8
+
+    [names 1]
+    "ch1" 2689
+
+
+Appendix D: HTML-key Page Format
+--------------------------------
+
+The .hkey page contains a list of words, one per line, sorted in a
+strict ASCII sequence, each one followed by a tab and the offset in the
+.html page of the word's data. I presume that the .hkey page must share
+the same name prefix as its related .html page.
+
+If the names contain high-bit characters, they are translated into
+regular ASCII in the .hkey file, since this allows the user to search
+for the words using unaccented characters.
+
+The lines are terminated with a newline (in standard unix fashion).
+
+An example:
+
+    a	5
+    apple	38
+    b	84
+    book	104
+
+Each of these offsets points to a paragraph tag in the associated .html
+page. I have only seen this sequence of tags used so far:
+
+    <P><BIG><B>word</B></BIG> other stuff</P>
+
+I have seen multiple <B>...</B> tags in the middle of the single set of
+<BIG>...</BIG> tags, but this is the basic tag format.
+
+The offset in the .hkey page points to the start of the <P> tag.
+
--- a/format_docs/tcr.txt
+++ b/format_docs/tcr.txt
@ -0,0 +1,56 @@
+About
+-----
+
+Text compression format that can be decompressed starting at any point.
+Little-endian byte ordering is used.
+
+
+Header
+------
+
+TCR files always start with:
+
+!!8-Bit!!
+
+
+Layout
+------
+
+Header
+256 key dictionary
+compressed text
+
+
+Dictionary
+----------
+
+A dictionary of key and replacement string. There are a total of 256 keys,
+0 - 255. Each string is preceded with one byte that represents the length of
+the string.
+
+
+Compressed text
+---------------
+
+The compressed text is a series of values 0-255 which correspond to a key and
+thus a string. Reassembling is replacing each key in the compressed text with
+its corresponding string.
+
+
+Compressor
+-----------------
+
+From Andrew Giddings TCR.c (http://www.cix.co.uk/~gidds/Software/TCR.html):
+
+The TCR compression format is easy to describe: after the fixed header is a
+dictionary of 256 strings, each preceded by a length byte.  The rest of the
+file is a list of codes from this dictionary.
+
+The compressor works by starting with each code defined as itself.  While
+there's an unused code, it finds the most common two-code combination, and
+creates a new code for it, replacing all occurrences in the text with the
+new code.
+
+It also searches for codes that are always followed by another, which it can
+merge, possibly freeing up some.
+
--- a/resources/catalog/stylesheet.css
+++ b/resources/catalog/stylesheet.css
@ -52,6 +52,17 @@ p.formats {
 	text-indent: 0.0in;
 	}

+/*
+* 	Minimize widows and orphans by logically grouping chunks
+*   Some reports of problems with Sony (ADE) ereaders
+*	   ADE: page-break-inside:avoid;
+*	iBooks: display:inline-block;
+*		    width:100%;
+*/
+div.author_logical_group {
+	page-break-inside:avoid;
+	}
+
 div.description > p:first-child {
 	margin: 0 0 0 0;
 	text-indent: 0em;
@ -62,27 +73,19 @@ div.description {
 	text-indent: 1em;
 	}

-/*
-* 	Attempt to minimize widows and orphans by logically grouping chunks
-* 	Recommend enabling for iPad
-*   Some reports of problems with Sony ereaders, presumably ADE engines
-*/
-/*
-div.logical_group {
-	display:inline-block;
-	width:100%;
+div.initial_letter {
+	page-break-before:always;
 	}
-*/

-p.date_index {
+p.author_title_letter_index {
 	font-size:x-large;
 	text-align:center;
 	font-weight:bold;
-	margin-top:1em;
+	margin-top:0px;
 	margin-bottom:0px;
 	}

-p.letter_index {
+p.date_index {
 	font-size:x-large;
 	text-align:center;
 	font-weight:bold;
@ -99,6 +102,14 @@ p.series {
 	text-indent:-2em;
 	}

+p.series_letter_index {
+	font-size:x-large;
+	text-align:center;
+	font-weight:bold;
+	margin-top:1em;
+	margin-bottom:0px;
+	}
+
 p.read_book {
 	text-align:left;
 	margin-top:0px;
--- a/resources/recipes/msnsankei.recipe
+++ b/resources/recipes/msnsankei.recipe
@ -13,15 +13,12 @@ class MSNSankeiNewsProduct(BasicNewsRecipe):
    description     = 'Products release from Japan'
    oldest_article = 7
    max_articles_per_feed = 100
-    encoding       = 'Shift_JIS'
+    encoding       = 'utf-8'
    language       = 'ja'
    cover_url       = 'http://sankei.jp.msn.com/images/common/sankeShinbunLogo.jpg'
    masthead_url = 'http://sankei.jp.msn.com/images/common/sankeiNewsLogo.gif'

    feeds          = [(u'\u65b0\u5546\u54c1', u'http://sankei.jp.msn.com/rss/news/release.xml')]

-    remove_tags_before = dict(id="__r_article_title__")
-    remove_tags_after  = dict(id="ajax_release_news")
-    remove_tags = [{'class':"parent chromeCustom6G"},
-                              dict(id="RelatedImg")
-                            ]
+    remove_tags_before = dict(id="NewsTitle")
+    remove_tags_after  = dict(id="RelatedTitle")
--- a/resources/recipes/theonion.recipe
+++ b/resources/recipes/theonion.recipe
@ -1,7 +1,5 @@
-#!/usr/bin/env  python
-
 __license__   = 'GPL v3'
-__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>'
+__copyright__ = '2009-2011, Darko Miletic <darko.miletic at gmail.com>'

 '''
 theonion.com
@ -15,26 +13,39 @@ class TheOnion(BasicNewsRecipe):
    description           = "America's finest news source"
    oldest_article        = 2
    max_articles_per_feed = 100
-    publisher             = u'Onion, Inc.'
-    category              = u'humor, news, USA'    
-    language = 'en'
-
+    publisher             = 'Onion, Inc.'
+    category              = 'humor, news, USA'
+    language              = 'en'
    no_stylesheets        = True
    use_embedded_content  = False
    encoding              = 'utf-8'
-    remove_javascript     = True
-    html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"' 
+    publication_type      = 'newsportal'
+    masthead_url          = 'http://o.onionstatic.com/img/headers/onion_190.png'
+    extra_css             = """
+                                body{font-family: Helvetica,Arial,sans-serif}
+                                .section_title{color: gray; text-transform: uppercase}
+                                .title{font-family: Georgia,serif}
+                                .meta{color: gray; display: inline}
+                                .has_caption{display: block}
+                                .caption{font-size: x-small; color: gray; margin-bottom: 0.8em}
+                            """

-    html2lrf_options = [
-                          '--comment'       , description
-                        , '--category'      , category
-                        , '--publisher'     , publisher
-                        ]
-
-    keep_only_tags = [dict(name='div', attrs={'id':'main'})]
+    conversion_options = {
+                          'comment'  : description
+                        , 'tags'     : category
+                        , 'publisher': publisher
+                        , 'language' : language
+                        }

+    keep_only_tags = [
+                         dict(name='h2', attrs={'class':['section_title','title']})
+                        ,dict(attrs={'class':['main_image','meta','article_photo_lead','article_body']})
+                        ,dict(attrs={'id':['entries']})
+                     ]
+    remove_attributes=['lang','rel']
+    remove_tags_after = dict(attrs={'class':['article_body','feature_content']})
    remove_tags = [
-                     dict(name=['object','link','iframe','base'])
+                     dict(name=['object','link','iframe','base','meta'])
                    ,dict(name='div', attrs={'class':['toolbar_side','graphical_feature','toolbar_bottom']})
                    ,dict(name='div', attrs={'id':['recent_slider','sidebar','pagination','related_media']})
                  ]
@ -44,3 +55,28 @@ class TheOnion(BasicNewsRecipe):
              (u'Daily'  , u'http://feeds.theonion.com/theonion/daily' )
             ,(u'Sports' , u'http://feeds.theonion.com/theonion/sports' )
            ]
+
+    def get_article_url(self, article):
+        artl = BasicNewsRecipe.get_article_url(self, article)
+        if artl.startswith('http://www.theonion.com/audio/'):
+           artl = None
+        return artl
+
+    def preprocess_html(self, soup):
+        for item in soup.findAll(style=True):
+            del item['style']
+        for item in soup.findAll('a'):
+            limg = item.find('img')
+            if item.string is not None:
+               str = item.string
+               item.replaceWith(str)
+            else:
+               if limg:
+                  item.name  = 'div'
+                  item.attrs = []
+                  if not limg.has_key('alt'):
+                     limg['alt'] = 'image'
+               else:
+                   str = self.tag_to_string(item)
+                   item.replaceWith(str)
+        return soup
--- a/src/calibre/devices/nook/driver.py
+++ b/src/calibre/devices/nook/driver.py
@ -89,21 +89,21 @@ class NOOK_COLOR(NOOK):
    BCD         = [0x216]
    WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = 'EBOOK_DISK'

-    EBOOK_DIR_MAIN = 'My Files/Books'
+    EBOOK_DIR_MAIN = 'My Files'

-    '''
    def create_upload_path(self, path, mdata, fname, create_dirs=True):
        filepath = NOOK.create_upload_path(self, path, mdata, fname,
-                create_dirs=create_dirs)
-        edm = self.EBOOK_DIR_MAIN.replace('/', os.sep)
-        npath = os.path.join(edm, _('News')) + os.sep
-        if npath in filepath:
-            filepath = filepath.replace(npath, os.sep.join('My Files',
-                'Magazines')+os.sep)
-            filedir = os.path.dirname(filepath)
-            if create_dirs and not os.path.exists(filedir):
-                os.makedirs(filedir)
+                create_dirs=False)
+        edm = self.EBOOK_DIR_MAIN
+        subdir = 'Books'
+        if mdata.tags:
+            if _('News') in mdata.tags:
+                subdir = 'Magazines'
+        filepath = filepath.replace(os.sep+edm+os.sep,
+                os.sep+edm+os.sep+subdir+os.sep)
+        filedir = os.path.dirname(filepath)
+        if create_dirs and not os.path.exists(filedir):
+            os.makedirs(filedir)

        return filepath
-    '''

--- a/src/calibre/ebooks/fb2/fb2ml.py
+++ b/src/calibre/ebooks/fb2/fb2ml.py
@ -71,19 +71,28 @@ class FB2MLizer(object):
            return u'<?xml version="1.0" encoding="UTF-8"?>' + output

    def clean_text(self, text):
+        # Condense empty paragraphs into a line break. 
+        text = re.sub(r'(?miu)(<p>\s*</p>\s*){3,}', '<p><empty-line /></p>', text)
+        # Remove empty paragraphs.
        text = re.sub(r'(?miu)<p>\s*</p>', '', text)
+        # Clean up pargraph endings.
        text = re.sub(r'(?miu)\s*</p>', '</p>', text)
+        # Put paragraphs following a paragraph on a separate line.
        text = re.sub(r'(?miu)</p>\s*<p>', '</p>\n\n<p>', text)

+        # Remove empty title elements.
        text = re.sub(r'(?miu)<title>\s*</title>', '', text)
        text = re.sub(r'(?miu)\s+</title>', '</title>', text)

+        # Remove empty sections.
        text = re.sub(r'(?miu)<section>\s*</section>', '', text)
+        # Clean up sections start and ends.
        text = re.sub(r'(?miu)\s*</section>', '\n</section>', text)
        text = re.sub(r'(?miu)</section>\s*', '</section>\n\n', text)
        text = re.sub(r'(?miu)\s*<section>', '\n<section>', text)
        text = re.sub(r'(?miu)<section>\s*', '<section>\n', text)
-        text = re.sub(r'(?miu)</section><section>', '</section>\n\n<section>', text)
+        # Put sectnions followed by sections on a separate line.
+        text = re.sub(r'(?miu)</section>\s*<section>', '</section>\n\n<section>', text)

        if self.opts.insert_blank_line:
            text = re.sub(r'(?miu)</p>', '</p><empty-line />', text)
@ -338,6 +347,11 @@ class FB2MLizer(object):
        tags = []
        # First tag in tree
        tag = barename(elem_tree.tag)
+        # Number of blank lines above tag
+        try:
+            ems = int(round((float(style.marginTop) / style.fontSize) - 1))
+        except:
+            ems = 0

        # Convert TOC entries to <title>s and add <section>s
        if self.opts.sectionize == 'toc':
@ -370,7 +384,9 @@ class FB2MLizer(object):
                fb2_out.append('<section>')
                self.section_level += 1

-        # Process the XHTML tag if it needs to be converted to an FB2 tag.
+        # Process the XHTML tag and styles. Converted to an FB2 tag.
+        # Use individual if statement not if else. There can be
+        # only one XHTML tag but it can have multiple styles.
        if tag == 'img':
            if elem_tree.attrib.get('src', None):
                # Only write the image tag if it is in the manifest.
@ -381,7 +397,11 @@ class FB2MLizer(object):
                    fb2_out += p_txt
                    tags += p_tag
                    fb2_out.append('<image xlink:href="#%s" />' % self.image_hrefs[page.abshref(elem_tree.attrib['src'])])
-        elif tag == 'br':
+        if tag in ('br', 'hr') or ems:
+            if not ems:
+                multiplier = 1
+            else:
+                multiplier = ems
            if self.in_p:
                closed_tags = []
                open_tags = tag_stack+tags
@ -391,52 +411,38 @@ class FB2MLizer(object):
                    closed_tags.append(t)
                    if t == 'p':
                        break
-                fb2_out.append('<empty-line />')
+                fb2_out.append('<empty-line />' * multiplier)
                closed_tags.reverse()
                for t in closed_tags:
                    fb2_out.append('<%s>' % t)
            else:
-                fb2_out.append('<empty-line />')
-        elif tag in ('div', 'li', 'p'):
+                fb2_out.append('<empty-line />' * multiplier)
+        if tag in ('div', 'li', 'p'):
            p_text, added_p = self.close_open_p(tag_stack+tags)
            fb2_out += p_text
            if added_p:
                tags.append('p')
-        elif tag == 'b':
+        if tag == 'b' or style['font-weight'] in ('bold', 'bolder'):
            s_out, s_tags = self.handle_simple_tag('strong', tag_stack+tags)
            fb2_out += s_out
            tags += s_tags
-        elif tag == 'i':
+        if tag == 'i' or style['font-style'] == 'italic':
            s_out, s_tags = self.handle_simple_tag('emphasis', tag_stack+tags)
            fb2_out += s_out
            tags += s_tags
-        elif tag in ('del', 'strike'):
+        if tag in ('del', 'strike') or style['text-decoration'] == 'line-through':
            s_out, s_tags = self.handle_simple_tag('strikethrough', tag_stack+tags)
            fb2_out += s_out
            tags += s_tags
-        elif tag == 'sub':
+        if tag == 'sub':
            s_out, s_tags = self.handle_simple_tag('sub', tag_stack+tags)
            fb2_out += s_out
            tags += s_tags
-        elif tag == 'sup':
+        if tag == 'sup':
            s_out, s_tags = self.handle_simple_tag('sup', tag_stack+tags)
            fb2_out += s_out
            tags += s_tags

-        # Processes style information.
-        if style['font-style'] == 'italic':
-            s_out, s_tags = self.handle_simple_tag('emphasis', tag_stack+tags)
-            fb2_out += s_out
-            tags += s_tags
-        elif style['font-weight'] in ('bold', 'bolder'):
-            s_out, s_tags = self.handle_simple_tag('strong', tag_stack+tags)
-            fb2_out += s_out
-            tags += s_tags
-        elif style['text-decoration'] == 'line-through':
-            s_out, s_tags = self.handle_simple_tag('strikethrough', tag_stack+tags)
-            fb2_out += s_out
-            tags += s_tags
-
        # Process element text.
        if hasattr(elem_tree, 'text') and elem_tree.text:
            if not self.in_p:
--- a/src/calibre/ebooks/oeb/stylizer.py
+++ b/src/calibre/ebooks/oeb/stylizer.py
@ -633,7 +633,7 @@ class Style(object):
    def lineHeight(self):
        if self._lineHeight is None:
            result = None
-            parent = self._getparent()
+            parent = self._get_parent()
            if 'line-height' in self._style:
                lineh = self._style['line-height']
                if lineh == 'normal':
--- a/src/calibre/ebooks/txt/txtml.py
+++ b/src/calibre/ebooks/txt/txtml.py
@ -67,10 +67,11 @@ class TXTMLizer(object):
        output.append(self.get_toc())
        for item in self.oeb_book.spine:
            self.log.debug('Converting %s to TXT...' % item.href)
-            stylizer = Stylizer(item.data, item.href, self.oeb_book, self.opts, self.opts.output_profile)
-            content = unicode(etree.tostring(item.data.find(XHTML('body')), encoding=unicode))
+            content = unicode(etree.tostring(item.data, encoding=unicode))
            content = self.remove_newlines(content)
-            output += self.dump_text(etree.fromstring(content), stylizer, item)
+            content = etree.fromstring(content)
+            stylizer = Stylizer(content, item.href, self.oeb_book, self.opts, self.opts.output_profile)
+            output += self.dump_text(content.find(XHTML('body')), stylizer, item)
            output += '\n\n\n\n\n\n'
        output = u''.join(output)
        output = u'\n'.join(l.rstrip() for l in output.splitlines())
@ -219,11 +220,16 @@ class TXTMLizer(object):
        if tag in SPACE_TAGS:
            text.append(u' ')

-        # Scene breaks.
+        # Hard scene breaks.
        if tag == 'hr':
            text.append('\n\n* * *\n\n')
-        elif style['margin-top']:
-            text.append('\n\n' + '\n' * round(style['margin-top']))
+        # Soft scene breaks.
+        try:
+            ems = int(round((float(style.marginTop) / style.fontSize) - 1))
+            if ems:
+                text.append('\n' * ems)
+        except:
+            pass

        # Process tags that contain text.
        if hasattr(elem, 'text') and elem.text:
--- a/src/calibre/gui2/dialogs/metadata_bulk.ui
+++ b/src/calibre/gui2/dialogs/metadata_bulk.ui
@ -492,8 +492,7 @@ title and author are swapped before the title case is set</string>
             <item>
              <widget class="QCheckBox" name="update_title_sort">
               <property name="toolTip">
-                <string>Recompute the title sort value and store it in title sort.
-This will happen after any title case changes</string>
+                <string>Update title sort based on the current title. This will be applied only after other changes to title.</string>
               </property>
               <property name="text">
                <string>Update &amp;title sort</string>
--- a/src/calibre/library/caches.py
+++ b/src/calibre/library/caches.py
@ -420,7 +420,8 @@ class ResultCache(SearchQueryParser): # {{{
            return candidates - res
        return res

-    def get_matches(self, location, query, allow_recursion=True, candidates=None):
+    def get_matches(self, location, query, candidates=None,
+            allow_recursion=True):
        matches = set([])
        if candidates is None:
            candidates = self.universal_set()
@ -434,8 +435,8 @@ class ResultCache(SearchQueryParser): # {{{
            if isinstance(location, list):
                if allow_recursion:
                    for loc in location:
-                        matches |= self.get_matches(loc, query, candidates,
-                                                    allow_recursion=False)
+                        matches |= self.get_matches(loc, query,
+                                candidates=candidates, allow_recursion=False)
                    return matches
                raise ParseException(query, len(query), 'Recursive query group detected', self)

--- a/src/calibre/library/catalog.py
+++ b/src/calibre/library/catalog.py
@ -1841,8 +1841,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
                body.insert(btc,pTag)
                btc += 1

-            # <p class="letter_index">
-            # <p class="book_title">
            divTag = Tag(soup, "div")
            dtc = 0
            current_letter = ""
@ -1870,11 +1868,12 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
                        divTag.insert(dtc, divRunningTag)
                        dtc += 1
                    divRunningTag = Tag(soup, 'div')
-                    divRunningTag['class'] = "logical_group"
+                    if dtc > 0:
+                        divRunningTag['class'] = "initial_letter"
                    drtc = 0
                    current_letter = self.letter_or_symbol(book['title_sort'][0])
                    pIndexTag = Tag(soup, "p")
-                    pIndexTag['class'] = "letter_index"
+                    pIndexTag['class'] = "author_title_letter_index"
                    aTag = Tag(soup, "a")
                    aTag['name'] = "%s" % self.letter_or_symbol(book['title_sort'][0])
                    pIndexTag.insert(0,aTag)
@ -1982,8 +1981,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
            body.insert(btc, aTag)
            btc += 1

-            # <p class="letter_index">
-            # <p class="author_index">
            divTag = Tag(soup, "div")
            dtc = 0
            divOpeningTag = None
@ -2017,10 +2014,11 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
                    current_letter = self.letter_or_symbol(book['author_sort'][0].upper())
                    author_count = 0
                    divOpeningTag = Tag(soup, 'div')
-                    divOpeningTag['class'] = "logical_group"
+                    if dtc > 0:
+                        divOpeningTag['class'] = "initial_letter"
                    dotc = 0
                    pIndexTag = Tag(soup, "p")
-                    pIndexTag['class'] = "letter_index"
+                    pIndexTag['class'] = "author_title_letter_index"
                    aTag = Tag(soup, "a")
                    aTag['name'] = "%sauthors" % self.letter_or_symbol(current_letter)
                    pIndexTag.insert(0,aTag)
@ -2032,16 +2030,21 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
                    # Start a new author
                    current_author = book['author']
                    author_count += 1
-                    if author_count == 2:
+                    if author_count >= 2:
                        # Add divOpeningTag to divTag, kill divOpeningTag
-                        divTag.insert(dtc, divOpeningTag)
-                        dtc += 1
-                        divOpeningTag = None
-                        dotc = 0
+                        if divOpeningTag:
+                            divTag.insert(dtc, divOpeningTag)
+                            dtc += 1
+                            divOpeningTag = None
+                            dotc = 0
+
+                        # Create a divRunningTag for the next author
+                        if author_count > 2:
+                            divTag.insert(dtc, divRunningTag)
+                            dtc += 1

-                        # Create a divRunningTag for the rest of the authors in this letter
                        divRunningTag = Tag(soup, 'div')
-                        divRunningTag['class'] = "logical_group"
+                        divRunningTag['class'] = "author_logical_group"
                        drtc = 0

                    non_series_books = 0
@ -2373,8 +2376,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
                body.insert(btc,pTag)
                btc += 1

-            # <p class="letter_index">
-            # <p class="author_index">
            divTag = Tag(soup, "div")
            dtc = 0

@ -2558,8 +2559,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
            body.insert(btc, aTag)
            btc += 1

-            # <p class="letter_index">
-            # <p class="author_index">
            divTag = Tag(soup, "div")
            dtc = 0

@ -2661,8 +2660,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
            body.insert(btc, aTag)
            btc += 1

-            # <p class="letter_index">
-            # <p class="author_index">
            divTag = Tag(soup, "div")
            dtc = 0
            current_letter = ""
@ -2677,7 +2674,7 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
                    # Start a new letter with Index letter
                    current_letter = self.letter_or_symbol(sort_title[0].upper())
                    pIndexTag = Tag(soup, "p")
-                    pIndexTag['class'] = "letter_index"
+                    pIndexTag['class'] = "series_letter_index"
                    aTag = Tag(soup, "a")
                    aTag['name'] = "%s_series" % self.letter_or_symbol(current_letter)
                    pIndexTag.insert(0,aTag)
--- a/src/calibre/library/custom_columns.py
+++ b/src/calibre/library/custom_columns.py
@ -457,7 +457,7 @@ class CustomColumns(object):
        if num is not None:
            data = self.custom_column_num_map[num]
        if data['datatype'] == 'composite':
-            return set()
+            return set([])
        if not data['editable']:
            raise ValueError('Column %r is not editable'%data['label'])
        table, lt = self.custom_table_names(data['num'])
@ -468,7 +468,7 @@ class CustomColumns(object):
        if data['datatype'] == 'series' and extra is None:
            (val, extra) = self._get_series_values(val)

-        books_to_refresh = set()
+        books_to_refresh = set([])
        if data['normalized']:
            if data['datatype'] == 'enumeration' and (
                    val and val not in data['display']['enum_values']):
@ -497,7 +497,7 @@ class CustomColumns(object):
                    ex = existing[idx]
                    xid = self.conn.get(
                        'SELECT id FROM %s WHERE value=?'%table, (ex,), all=False)
-                    if ex != x:
+                    if allow_case_change and ex != x:
                        case_change = True
                        self.conn.execute(
                            'UPDATE %s SET value=? WHERE id=?'%table, (x, xid))
--- a/src/calibre/library/database2.py
+++ b/src/calibre/library/database2.py
@ -1636,7 +1636,8 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
        if not authors:
            authors = [_('Unknown')]
        self.conn.execute('DELETE FROM books_authors_link WHERE book=?',(id,))
-        books_to_refresh = set()
+        books_to_refresh = set([])
+        final_authors = []
        for a in authors:
            case_change = False
            if not a:
@ -1648,13 +1649,17 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
            if aus:
                aid, name = aus[0]
                # Handle change of case
-                if allow_case_change and name != a:
-                    self.conn.execute('''UPDATE authors
-                                         SET name=? WHERE id=?''', (a, aid))
-                    case_change = True
+                if name != a:
+                    if allow_case_change:
+                        self.conn.execute('''UPDATE authors
+                                            SET name=? WHERE id=?''', (a, aid))
+                        case_change = True
+                    else:
+                        a = name
            else:
                aid = self.conn.execute('''INSERT INTO authors(name)
                                           VALUES (?)''', (a,)).lastrowid
+            final_authors.append(a.replace('|', ','))
            try:
                self.conn.execute('''INSERT INTO books_authors_link(book, author)
                                     VALUES (?,?)''', (id, aid))
@ -1668,7 +1673,7 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
        self.conn.execute('UPDATE books SET author_sort=? WHERE id=?',
                          (ss, id))
        self.data.set(id, self.FIELD_MAP['authors'],
-                      ','.join([a.replace(',', '|') for a in authors]),
+                      ','.join([a.replace(',', '|') for a in final_authors]),
                      row_is_id=True)
        self.data.set(id, self.FIELD_MAP['author_sort'], ss, row_is_id=True)
        aum = self.authors_with_sort_strings(id, index_is_id=True)
@ -1716,6 +1721,10 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
            title = title.decode(preferred_encoding, 'replace')
        self.conn.execute('UPDATE books SET title=? WHERE id=?', (title, id))
        self.data.set(id, self.FIELD_MAP['title'], title, row_is_id=True)
+        ts = self.conn.get('SELECT sort FROM books WHERE id=?', (id,),
+                all=False)
+        if ts:
+            self.data.set(id, self.FIELD_MAP['sort'], ts, row_is_id=True)
        return True

    def set_title(self, id, title, notify=True, commit=True):
@ -1768,10 +1777,13 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
                                    WHERE name=?''', (publisher,))
            if pubx:
                aid, cur_name = pubx[0]
-                if allow_case_change and publisher != cur_name:
-                    self.conn.execute('''UPDATE publishers SET name=?
+                if publisher != cur_name:
+                    if allow_case_change:
+                        self.conn.execute('''UPDATE publishers SET name=?
                                         WHERE id=?''', (publisher, aid))
-                    case_change = True
+                        case_change = True
+                    else:
+                        publisher = cur_name
            else:
                aid = self.conn.execute('''INSERT INTO publishers(name)
                                           VALUES (?)''', (publisher,)).lastrowid
@ -2163,7 +2175,7 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
                                 FROM books_tags_link WHERE tag=tags.id) < 1''')
        otags = self.get_tags(id)
        tags = self.cleanup_tags(tags)
-        books_to_refresh = set()
+        books_to_refresh = set([])
        for tag in (set(tags)-otags):
            case_changed = False
            tag = tag.strip()
@ -2258,7 +2270,7 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
                             WHERE (SELECT COUNT(id) FROM books_series_link
                                    WHERE series=series.id) < 1''')
        (series, idx) = self._get_series_values(series)
-        books_to_refresh = set()
+        books_to_refresh = set([])
        if series:
            case_change = False
            if not isinstance(series, unicode):
@ -2268,9 +2280,12 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
            sx = self.conn.get('SELECT id,name from series WHERE name=?', (series,))
            if sx:
                aid, cur_name = sx[0]
-                if allow_case_change and cur_name != series:
-                    self.conn.execute('UPDATE series SET name=? WHERE id=?', (series, aid))
-                    case_change = True
+                if cur_name != series:
+                    if allow_case_change:
+                        self.conn.execute('UPDATE series SET name=? WHERE id=?', (series, aid))
+                        case_change = True
+                    else:
+                        series = cur_name
            else:
                aid = self.conn.execute('INSERT INTO series(name) VALUES (?)', (series,)).lastrowid
            self.conn.execute('INSERT INTO books_series_link(book, series) VALUES (?,?)', (id, aid))