mirror of
				https://github.com/kovidgoyal/calibre.git
				synced 2025-10-24 23:38:55 -04:00 
			
		
		
		
	
		
			
				
	
	
		
			310 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			310 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| About
 | ||
| -----
 | ||
| 
 | ||
| The eReader format has evolved and changed over time. Subsequently, there are
 | ||
| multiple versions of the eReader format. There are also two different tools
 | ||
| that can create eReader files. The official tools are Makebook and Dropbook.
 | ||
| Dropbook is the newer official tool that has replaced Makebook. However,
 | ||
| Makebook is still in wide use because it supports a wider range of platforms
 | ||
| than Dropbook. Dropbook is a GUI application that only runs on Windows and
 | ||
| Apple’s OS X.
 | ||
| 
 | ||
| 
 | ||
| PDB Identity
 | ||
| -------
 | ||
| 
 | ||
| PNRdPPrs
 | ||
| 
 | ||
| 
 | ||
| 202 and 132 headers
 | ||
| -----------------------------------------
 | ||
| 
 | ||
| Older files have a record 0 size of 202 and occasionally 116. Newer files have
 | ||
| a record 0 size of 132. As of this writing the 202 files only support text and
 | ||
| images. The image format in the 202 files is the same as the 132 files. The 132
 | ||
| files support a number of additional features.
 | ||
| 
 | ||
| 
 | ||
| Record 0, eReader header (202)
 | ||
| ------------------
 | ||
| 
 | ||
| Note all values are in 2 byte increments. Like values are condensed into a
 | ||
| range. The range can be broken into 2 byte sections which represent the actual
 | ||
| stored values.
 | ||
| 
 | ||
| bytes       content             comments
 | ||
| 
 | ||
| 0-2         Version             Non-DRM books 2 and 4.
 | ||
| 2-8         Garbage
 | ||
| 8-10        Non-Text Offset     Start of Non text area (images) will run to the
 | ||
|                                 end of the section list.
 | ||
| 10-14       Unknown
 | ||
| 14-24       Garbage
 | ||
| 24-28       Unknown
 | ||
| 28-98       Garbage
 | ||
| 98-100      Unknown
 | ||
| 100-110     Garbage
 | ||
| 110-114     Unknown
 | ||
| 114-116     Garbage
 | ||
| 116-202     Unknown
 | ||
| 
 | ||
| * Garbage: Intentionally random values.
 | ||
| 
 | ||
| 
 | ||
| Text Records (202)
 | ||
| ------------------
 | ||
| 
 | ||
| Text starts with section 1 and continues until the section indicated by the
 | ||
| Non-Text Offset. All text records are PalmDoc compressed.
 | ||
| 
 | ||
| Each character in the compressed data is xored with 0xA5.
 | ||
| 
 | ||
| A decompression example in sudo Python:
 | ||
| 
 | ||
| for num in range(1, Non-Text Offset):
 | ||
|     text += decompress_pamldoc(''.join([chr(ord(x) ^ 0xA5) for x in section_data(num)])).decode('cp1252', 'replace')
 | ||
| 
 | ||
| 
 | ||
| Dropbook 132 files
 | ||
| ------------------
 | ||
| 
 | ||
| The following sections apply to the newer Dropbook created files.
 | ||
| 
 | ||
| 
 | ||
| Record 0, eReader header (132)
 | ||
| ----------------------------
 | ||
| 
 | ||
| This is only for 132 byte header files created by Dropbook.
 | ||
| 
 | ||
| bytes   content                     comments
 | ||
| 
 | ||
| 0-2     compression                 Specifies compression and drm. 2 = palmdoc,
 | ||
|                                     10 = zlib. 260 and 272 = DRM
 | ||
| 2-6     unknown                     Value of 0 is used
 | ||
| 6-8     encoding                    Always 25152 (0x6240). All text must be
 | ||
|                                     encoded as Latin-1 cp1252
 | ||
| 8-10    Number of small pages       The number of small font pages. If page
 | ||
|                                     index is not build in then 0.
 | ||
| 10-12   Number of large pages       The number of large font pages. If page
 | ||
|                                     index is not build in then 0.
 | ||
| 12-14   Non-Text record start       The location of the first non text records.
 | ||
|                                     record 1 to this value minus 1 are all text
 | ||
|                                     records
 | ||
| 14-16   Number of chapters          The number of chapter index records
 | ||
|                                     contained in the file
 | ||
| 16-18   Number of small index       The number of small font page index records
 | ||
|                                     contained in the file
 | ||
| 18-20   Number of large index       The number of large font page index records
 | ||
|                                     contained in the file
 | ||
| 20-22   Number of images            The number of images contained in the file
 | ||
| 22-24   Number of links             The number of links contained in the file
 | ||
| 24-26   Metadata available          Is there a metadata record in the file?
 | ||
|                                     0 = None, 1 = There is a metadata record
 | ||
| 26-28   Unknown                     Value of 0 is used
 | ||
| 28-30   Number of Footnotes         The number of footnote records in the file
 | ||
| 30-32   Number of Sidebars          The number of sidebar records in the file
 | ||
| 32-34   Chapter index record start  The location of chapter index records. If
 | ||
|                                     there are no chapters use the value for the
 | ||
|                                     Last data record.
 | ||
| 34-36   2560                        Magic value that must be set to 2560
 | ||
| 36-38   Small page index start      The location of small font page index
 | ||
|                                     records. If page table is not built in use
 | ||
|                                     the value for the Last data record.
 | ||
| 38-40   Large page index start      The location of large font page index
 | ||
|                                     records. If page table is not built in use
 | ||
|                                     the value for the Last data record.
 | ||
| 40-42   Image data record start     The location of the first image record. If
 | ||
|                                     there are no images use the value for the
 | ||
|                                     Last data record.
 | ||
| 42-44   Links record start          The location of the first link index
 | ||
|                                     record. If there are no links use the value
 | ||
|                                     for the Last data record.
 | ||
| 44-46   Metadata record start       The location of the metadata record. If
 | ||
|                                     there is no metadata use the value for the
 | ||
|                                     Last data record.
 | ||
| 46-48   Unknown                     Value of 0 is used
 | ||
| 48-50   Footnote record start       The location of the first footnote record.
 | ||
|                                     If there are no footnotes use the value for
 | ||
|                                     the Last data record.
 | ||
| 50-52   Sidebar record start        The location of the first sidebar record.
 | ||
|                                     If there are no sidebars use the value for
 | ||
|                                     the Last data record.
 | ||
| 52-54   Last data record            The location of the last data record
 | ||
| 54-132  Unknown                     Value of 0 is used
 | ||
| 
 | ||
| Note: All values are in 2 byte increments. All bytes in the table that have a
 | ||
| range larger than 2 can be broken into 2 byte segments and have different
 | ||
| values set for each grouping.
 | ||
| 
 | ||
| 
 | ||
| Records Order
 | ||
| -------------
 | ||
| 
 | ||
| Though the order of this sections is described in eReader header,
 | ||
| DropBook makes the following order:
 | ||
| 
 | ||
|    1. eReader Header
 | ||
|    2. Compressed text
 | ||
|    3. Small font page index
 | ||
|    4. Large font page index
 | ||
|    5. Chapter index
 | ||
|    6. Links index
 | ||
|    7. Images
 | ||
|    8. (Extrapolation: there should be one more record type here though it has
 | ||
|        not yet been uncovered what it might be).
 | ||
|    9. Metadata
 | ||
|   10. Sidebar records
 | ||
|   11. Footnote records
 | ||
|   12. Text block size record
 | ||
|   13. "MeTaInFo\x00" word record 
 | ||
| 
 | ||
| 
 | ||
| Text Records
 | ||
| ------------
 | ||
| 
 | ||
| All text records use cp1252  encoding (although eReader documents talk about
 | ||
| UTF-8 as well). Their total compressed size is unknown however, anything below
 | ||
| 3560 Bytes is known to work. The text will be either zlib or palmdoc
 | ||
| compressed. Use the compression value from the eReader header to determine
 | ||
| which. All text utalizes the Palm Markup Language (PML) for formatting.
 | ||
| 
 | ||
| Starting with DropBook 1.6.0 text is divided into 8KB (8192 bytes) blocks
 | ||
| trimming the end to the closest space character and then being compressed.
 | ||
| Earlier version of DropBook 1.5.2 tries to behave the same way, though
 | ||
| sometimes it trims the block in unexpected place.
 | ||
| 
 | ||
| 
 | ||
| Chapter Index Records
 | ||
| ---------------------
 | ||
| 
 | ||
| Each chapter record corresponds to 1 chapter and points at the place in the
 | ||
| book. Chapter record takes a form of 'offset name\x00' First 4 bytes are offset
 | ||
| of the original pml file where the chapter index points to (offset of
 | ||
| the \x|\X?|\C? tags). Then without a space goes a name of a chapter in chapter
 | ||
| index. It should contain only text, all formatting tags should be removed.
 | ||
| \U and \a tags are not permitted in chapter name. To maintain sub-chapters
 | ||
| 4*n spaces (\x20) are added to the beginning of the name, where "n" is level of
 | ||
| chapter: 0 for \x tag and N for \CN="" and \XN tags. And then an ending
 | ||
| \x00 symbol.
 | ||
| 
 | ||
| 
 | ||
| Image Records
 | ||
| -------------
 | ||
| 
 | ||
| Image records must be smaller than 65505 Bytes. They must also be 8bit PNG
 | ||
| images.
 | ||
| 
 | ||
| An image record takes the form 'PNG name\x00... image_data'
 | ||
| 
 | ||
| bytes   content         comments
 | ||
| 
 | ||
| 0-4     PNG             There must be a space after PNG.
 | ||
| 4-36    image name.     The image name must be 32 exactly 32 Bytes long. Pad
 | ||
|                         the right side of the name with \x00 characters for
 | ||
|                         names shorter than 32 characters.
 | ||
| 36-58   Unknown	
 | ||
| 58-60   width           Width of an image
 | ||
| 60-62   height          Height of an image
 | ||
| 62-?    The image data  raw image data in 8 bit PNG format
 | ||
| 
 | ||
| Note: DropBooks seems to change something in png raw data. Like reencoding or
 | ||
| something, but plain insertion of png image there still works. 
 | ||
| 
 | ||
| 
 | ||
| Links Records
 | ||
| -------------
 | ||
| 
 | ||
| Links records are constructed the same way as chapter ones. Each link anchor
 | ||
| record corresponds to 1 link anchor and points at the place in the book. Link
 | ||
| record takes a form of 'offset name\x00' First 4 bytes are offset of the
 | ||
| original pml file where the link anchor points to (offset of the \Q tag). Then
 | ||
| without a space goes a name of a link anchor. It should contain only text, all
 | ||
| formatting tags should be removed. \U and \a tags are not permitted in link
 | ||
| anchor name. And then an ending \x00 symbol.
 | ||
| 
 | ||
| 
 | ||
| Footnote Records
 | ||
| ----------------
 | ||
| 
 | ||
| The first footnote record is a \x00 separated list of footnote ids. All
 | ||
| subsequent footnote records are the footnote text corresponding to the id's
 | ||
| position in the list. Footnote text is compressed in the same manner as normal
 | ||
| text records
 | ||
| 
 | ||
| E.G.
 | ||
| 
 | ||
| footnote section 1 = 'notice1\x00notice2\x00notice3\x00'
 | ||
| footnote section 2 = 'Text for notice 1'
 | ||
| footnote section 3 = 'Text for notice 2'
 | ||
| footnote section 4 = 'Text for notice 3'
 | ||
| 
 | ||
| Starting with Dropbook 1.5.2 first record looks a bit different. It is sequence
 | ||
| of \x00\x01 then 1 byte of footnote id length, then footnote id then \x00.
 | ||
| 
 | ||
| E.G.
 | ||
| 
 | ||
| footnote section 1 = '\x00\x01\x07notice1\x00\x00\x01\x0Afootnote10\x00'
 | ||
| 
 | ||
| 
 | ||
| Sidebar Records
 | ||
| ---------------
 | ||
| 
 | ||
| The first sidebar record is a \x00 separated list of sidebar ids. All
 | ||
| subsequent sidebar records are the sidebar text corresponding to the id's
 | ||
| position in the list. Sidebar text is compressed in the same manner as normal
 | ||
| text records
 | ||
| 
 | ||
| E.G.
 | ||
| 
 | ||
| sidebar section 1 = 'notice1\x00notice2\x00notice3\x00'
 | ||
| sidebar section 2 = 'Text for notice 1'
 | ||
| sidebar section 3 = 'Text for notice 2'
 | ||
| sidebar section 4 = 'Text for notice 3'
 | ||
| 
 | ||
| Starting with Dropbook 1.5.2 first record looks a bit different. It is sequence
 | ||
| of \x00\x01 then 1 byte of sidebar's id length, then sidebar's id then \x00.
 | ||
| 
 | ||
| E.G.
 | ||
| 
 | ||
| sidebar section 1 = '\x00\x01\x07notice1\x00\x00\x01\x09sidebar10\x00'
 | ||
| 
 | ||
| 
 | ||
| Metadata Record
 | ||
| ---------------
 | ||
| 
 | ||
| \x00 separated list of string.
 | ||
| 
 | ||
| Metadata takes the form:
 | ||
| 
 | ||
|   title\x00
 | ||
|   author\x00
 | ||
|   copyright\x00
 | ||
|   publisher\x00
 | ||
|   isbn\x00
 | ||
| 
 | ||
| E.G.
 | ||
| 
 | ||
| Gibraltar Earth\x00Michael McCollum\x001999\x00Sci Fi Arizona\x001929381255\x00
 | ||
| 
 | ||
| The metadata record is always followed by a record which contains 'MeTaInFo\x00'
 | ||
| 
 | ||
| Note: Starting with DropBook 1.5.2 'MeTaInFo\x00' is not following Metadata
 | ||
| Record. It is a separate record that ends the file and there are some more
 | ||
| records between Metadata record and 'MeTaInFo\x00' record.
 | ||
| 
 | ||
| 
 | ||
| Text Sizes Record
 | ||
| -----------------
 | ||
| 
 | ||
| There is a special record that contains the initial size of all text blocks
 | ||
| before compression. It is just a sequence of 2-byte blocks which are containing
 | ||
| the sizes.
 | ||
| 
 | ||
| E.G.
 | ||
| 
 | ||
| \x1F\xFB\x20\x00\x20\x00\x1F\xFE\x1F\xFD\x09\x46
 | ||
| 
 | ||
| Note: By this we can judge that theoretical maximum of initial block size is
 | ||
| 65535 bytes. 
 | ||
| 
 |