mirror of
https://github.com/kovidgoyal/calibre.git
synced 2025-07-09 03:04:10 -04:00
Merge from trunk
This commit is contained in:
commit
90ef9949ca
54
format_docs/compression/palmdoc.txt
Normal file
54
format_docs/compression/palmdoc.txt
Normal file
@ -0,0 +1,54 @@
|
|||||||
|
About
|
||||||
|
-----
|
||||||
|
|
||||||
|
PalmDOC uses LZ77 compression techniques. DOC files can contain only compressed
|
||||||
|
text. The format does not allow for any text formatting. This keeps files
|
||||||
|
small, in keeping with the Palm philosophy. However, extensions to the format
|
||||||
|
can use tags, such as HTML or PML, to include formatting within text. These
|
||||||
|
extensions to PalmDoc are not interchangeable and are the basis for most eBook
|
||||||
|
Reader formats on Palm devices.
|
||||||
|
|
||||||
|
LZ77 algorithms achieve compression by replacing portions of the data with
|
||||||
|
references to matching data that has already passed through both encoder and
|
||||||
|
decoder. A match is encoded by a pair of numbers called a length-distance pair,
|
||||||
|
which is equivalent to the statement "each of the next length characters is
|
||||||
|
equal to the character exactly distance characters behind it in the
|
||||||
|
uncompressed stream." (The "distance" is sometimes called the "offset" instead.)
|
||||||
|
|
||||||
|
In the PalmDoc format, a length-distance pair is always encoded by a two-byte
|
||||||
|
sequence. Of the 16 bits that make up these two bytes, 11 bits go to encoding
|
||||||
|
the distance, 3 go to encoding the length, and the remaining two are used to
|
||||||
|
make sure the decoder can identify the first byte as the beginning of such a
|
||||||
|
two-byte sequence.
|
||||||
|
|
||||||
|
PalmDoc combines LZ77 with a simple kind of byte pair compression.
|
||||||
|
|
||||||
|
|
||||||
|
PalmDoc files are decoded as follows:
|
||||||
|
-------------------------------------
|
||||||
|
|
||||||
|
Read a byte from the compressed stream. If the byte is
|
||||||
|
|
||||||
|
0x00: "1 literal" copy that byte unmodified to the decompressed stream.
|
||||||
|
|
||||||
|
0x09 to 0x7f: "1 literal" copy that byte unmodified to the decompressed stream.
|
||||||
|
|
||||||
|
0x01 to 0x08: "literals": the byte is interpreted as a count from 1 to 8, and
|
||||||
|
that many literals are copied unmodified from the compressed stream to the
|
||||||
|
decompressed stream.
|
||||||
|
|
||||||
|
0x80 to 0xbf: "length, distance" pair: the 2 leftmost bits of this byte ('10')
|
||||||
|
are discarded, and the following 6 bits are combined with the 8 bits of the
|
||||||
|
next byte to make a 14 bit "distance, length" item. Those 14 bits are broken
|
||||||
|
into 11 bits of distance backwards from the current location in the
|
||||||
|
uncompressed text, and 3 bits of length to copy from that point
|
||||||
|
(copying n+3 bytes, 3 to 10 bytes).
|
||||||
|
|
||||||
|
0xc0 to 0xff: "byte pair": this byte is decoded into 2 characters: a space
|
||||||
|
character, and a letter formed from this byte XORed with 0x80.
|
||||||
|
|
||||||
|
Repeat from the beginning until there is no more bytes in the compressed file.
|
||||||
|
|
||||||
|
PalmDOC data is always divided into 4096 byte blocks and the blocks are acted
|
||||||
|
upon independently.
|
||||||
|
|
3217
format_docs/compression/zip.txt
Normal file
3217
format_docs/compression/zip.txt
Normal file
File diff suppressed because it is too large
Load Diff
309
format_docs/pdb/ereader.txt
Normal file
309
format_docs/pdb/ereader.txt
Normal file
@ -0,0 +1,309 @@
|
|||||||
|
About
|
||||||
|
-----
|
||||||
|
|
||||||
|
The eReader format has evolved and changed over time. Subsequently, there are
|
||||||
|
multiple versions of the eReader format. There are also two different tools
|
||||||
|
that can create eReader files. The official tools are Makebook and Dropbook.
|
||||||
|
Dropbook is the newer official tool that has replaced Makebook. However,
|
||||||
|
Makebook is still in wide use because it supports a wider range of platforms
|
||||||
|
than Dropbook. Dropbook is a GUI application that only runs on Windows and
|
||||||
|
Apple’s OS X.
|
||||||
|
|
||||||
|
|
||||||
|
PDB Identiy
|
||||||
|
-------
|
||||||
|
|
||||||
|
PNRdPPrs
|
||||||
|
|
||||||
|
|
||||||
|
202 and 132 headers
|
||||||
|
-----------------------------------------
|
||||||
|
|
||||||
|
Older files have a record 0 size of 202 and occasionally 116. Newer files have
|
||||||
|
a record 0 size of 132. As of this writing the 202 files only support text and
|
||||||
|
images. The image format in the 202 files is the same as the 132 files. The 132
|
||||||
|
files support a number of additional features.
|
||||||
|
|
||||||
|
|
||||||
|
Record 0, eReader header (202)
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Note all values are in 2 byte increments. Like values are condensed into a
|
||||||
|
range. The range can be borken into 2 byte sections which represent the actual
|
||||||
|
stored values.
|
||||||
|
|
||||||
|
bytes content comments
|
||||||
|
|
||||||
|
0-2 Version Non-DRM books 2 and 4.
|
||||||
|
2-8 Garbage
|
||||||
|
8-10 Non-Text Offset Start of Non text area (images) will run to the
|
||||||
|
end of the section list.
|
||||||
|
10-14 Unknown
|
||||||
|
14-24 Garbage
|
||||||
|
24-28 Unknown
|
||||||
|
28-98 Garbage
|
||||||
|
98-100 Unknown
|
||||||
|
100-110 Garbage
|
||||||
|
110-114 Unknown
|
||||||
|
114-116 Garbage
|
||||||
|
116-202 Unknown
|
||||||
|
|
||||||
|
* Garbage: Intentially random values.
|
||||||
|
|
||||||
|
|
||||||
|
Text Records (202)
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Text starts with section 1 and continues until the section indicated by the
|
||||||
|
Non-Text Offset. All text records are PalmDoc compressed.
|
||||||
|
|
||||||
|
Each character in the compressed data is xored with 0xA5.
|
||||||
|
|
||||||
|
A decompression example in sudo Python:
|
||||||
|
|
||||||
|
for num in range(1, Non-Text Offset):
|
||||||
|
text += decompress_pamldoc(''.join([chr(ord(x) ^ 0xA5) for x in section_data(num)])).decode('cp1252', 'replace')
|
||||||
|
|
||||||
|
|
||||||
|
Dropbook 132 files
|
||||||
|
------------------
|
||||||
|
|
||||||
|
The following sections apply to the newer Dropbook created files.
|
||||||
|
|
||||||
|
|
||||||
|
Record 0, eReader header (132)
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
This is only for 132 byte header files created by Dropbook.
|
||||||
|
|
||||||
|
bytes content comments
|
||||||
|
|
||||||
|
0-2 compression Specifies compression and drm. 2 = palmdoc,
|
||||||
|
10 = zlib. 260 and 272 = DRM
|
||||||
|
2-6 unknown Value of 0 is used
|
||||||
|
6-8 encoding Always 25152 (0x6240). All text must be
|
||||||
|
encoded as Latin-1 cp1252
|
||||||
|
8-10 Number of small pages The number of small font pages. If page
|
||||||
|
index is not build in then 0.
|
||||||
|
10-12 Number of large pages The number of large font pages. If page
|
||||||
|
index is not build in then 0.
|
||||||
|
12-14 Non-Text record start The location of the first non text records.
|
||||||
|
record 1 to this value minus 1 are all text
|
||||||
|
records
|
||||||
|
14-16 Number of chapters The number of chapter index records
|
||||||
|
contained in the file
|
||||||
|
16-18 Number of small index The number of small font page index records
|
||||||
|
contained in the file
|
||||||
|
18-20 Number of large index The number of large font page index records
|
||||||
|
contained in the file
|
||||||
|
20-22 Number of images The number of images contained in the file
|
||||||
|
22-24 Number of links The number of links contained in the file
|
||||||
|
24-26 Metadata avaliable Is there a metadata record in the file?
|
||||||
|
0 = None, 1 = There is a metadata record
|
||||||
|
26-28 Unknown Value of 0 is used
|
||||||
|
28-30 Number of Footnotes The number of footnote records in the file
|
||||||
|
30-32 Number of Sidebars The number of sidebar records in the file
|
||||||
|
32-34 Chapter index record start The location of chapter index records. If
|
||||||
|
there are no chapters use the value for the
|
||||||
|
Last data record.
|
||||||
|
34-36 2560 Magic value that must be set to 2560
|
||||||
|
36-38 Small page index start The location of small font page index
|
||||||
|
records. If page table is not built in use
|
||||||
|
the value for the Last data record.
|
||||||
|
38-40 Large page index start The location of large font page index
|
||||||
|
records. If page table is not built in use
|
||||||
|
the value for the Last data record.
|
||||||
|
40-42 Image data record start The location of the first image record. If
|
||||||
|
there are no images use the value for the
|
||||||
|
Last data record.
|
||||||
|
42-44 Links record start The location of the first link index
|
||||||
|
record. If there are no links use the value
|
||||||
|
for the Last data record.
|
||||||
|
44-46 Metadata record start The location of the metadata record. If
|
||||||
|
there is no metadata use the value for the
|
||||||
|
Last data record.
|
||||||
|
46-48 Unknown Value of 0 is used
|
||||||
|
48-50 Footnote record start The location of the first footnote record.
|
||||||
|
If there are no footnotes use the value for
|
||||||
|
the Last data record.
|
||||||
|
50-52 Sidebar record start The location of the first sidebar record.
|
||||||
|
If there are no sidebars use the value for
|
||||||
|
the Last data record.
|
||||||
|
52-54 Last data record The location of the last data record
|
||||||
|
54-132 Unknown Value of 0 is used
|
||||||
|
|
||||||
|
Note: All values are in 2 byte increments. All bytes in the table that have a
|
||||||
|
range larger than 2 can be broken into 2 byte segments and have different
|
||||||
|
values set for each grouping.
|
||||||
|
|
||||||
|
|
||||||
|
Records Order
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Though the order of this sections is described in eReader header,
|
||||||
|
DropBook makes the following order:
|
||||||
|
|
||||||
|
1. eReader Header
|
||||||
|
2. Compressed text
|
||||||
|
3. Small font page index
|
||||||
|
4. Large font page index
|
||||||
|
5. Chapter index
|
||||||
|
6. Links index
|
||||||
|
7. Images
|
||||||
|
8. (Extrapolation: there should be one more record type here though it has
|
||||||
|
not yet been uncovered what it might be).
|
||||||
|
9. Metadata
|
||||||
|
10. Sidebar records
|
||||||
|
11. Footnote records
|
||||||
|
12. Text block size record
|
||||||
|
13. "MeTaInFo\x00" word record
|
||||||
|
|
||||||
|
|
||||||
|
Text Records
|
||||||
|
------------
|
||||||
|
|
||||||
|
All text records use cp1252 encoding (although eReader documents talk about
|
||||||
|
UTF-8 as well). Their total compressed size is unknown however, anything below
|
||||||
|
3560 Bytes is known to work. The text will be either zlib or palmdoc
|
||||||
|
compressed. Use the compression value from the eReader header to determine
|
||||||
|
which. All text utalizes the Palm Markup Language (PML) for formatting.
|
||||||
|
|
||||||
|
Starting with DropBook 1.6.0 text is divided into 8KB (8192 bytes) blocks
|
||||||
|
trimming the end to the closest space character and then being compressed.
|
||||||
|
Earlier version of DropBook 1.5.2 tries to behave the same way, though
|
||||||
|
sometimes it trims the block in unexpected place.
|
||||||
|
|
||||||
|
|
||||||
|
Chapter Index Records
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
Each chapter record corresponds to 1 chapter and points at the place in the
|
||||||
|
book. Chapter record takes a form of 'offset name\x00' First 4 bytes are offset
|
||||||
|
of the original pml file where the chapter index points to (offset of
|
||||||
|
the \x|\X?|\C? tags). Then without a space goes a name of a chapter in chapter
|
||||||
|
index. It should contain only text, all formatting tags should be removed.
|
||||||
|
\U and \a tags are not permitted in chapter name. To maintain sub-chapters
|
||||||
|
4*n spaces (\x20) are added to the beginning of the name, where "n" is level of
|
||||||
|
chapter: 0 for \x tag and N for \CN="" and \XN tags. And then an ending
|
||||||
|
\x00 symbol.
|
||||||
|
|
||||||
|
|
||||||
|
Image Records
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Image records must be smaller than 65505 Bytes. They must also be 8bit PNG
|
||||||
|
images.
|
||||||
|
|
||||||
|
An image record takes the form 'PNG name\x00... image_data'
|
||||||
|
|
||||||
|
bytes content comments
|
||||||
|
|
||||||
|
0-4 PNG There must be a space after PNG.
|
||||||
|
4-36 image name. The image name must be 32 exactly 32 Bytes long. Pad
|
||||||
|
the right side of the name with \x00 characters for
|
||||||
|
names shorter than 32 characters.
|
||||||
|
36-58 Unknown
|
||||||
|
58-60 width Width of an image
|
||||||
|
60-62 height Height of an image
|
||||||
|
62-? The image data raw image data in 8 bit PNG format
|
||||||
|
|
||||||
|
Note: DropBooks seems to change something in png raw data. Like reencoding or
|
||||||
|
something, but plain insertion of png image there still works.
|
||||||
|
|
||||||
|
|
||||||
|
Links Records
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Links records are constructed the same way as chapter ones. Each link anchor
|
||||||
|
record corresponds to 1 link anchor and points at the place in the book. Link
|
||||||
|
record takes a form of 'offset name\x00' First 4 bytes are offset of the
|
||||||
|
original pml file where the link anchor points to (offset of the \Q tag). Then
|
||||||
|
without a space goes a name of a link anchor. It should contain only text, all
|
||||||
|
formatting tags should be removed. \U and \a tags are not permitted in link
|
||||||
|
anchor name. And then an ending \x00 symbol.
|
||||||
|
|
||||||
|
|
||||||
|
Footnote Records
|
||||||
|
----------------
|
||||||
|
|
||||||
|
The first footnote record is a \x00 separated list of footnote ids. All
|
||||||
|
subsequent footnote records are the footnote text corresponding to the id's
|
||||||
|
position in the list. Footnote text is compressed in the same manner as normal
|
||||||
|
text records
|
||||||
|
|
||||||
|
E.G.
|
||||||
|
|
||||||
|
footnote section 1 = 'notice1\x00notice2\x00notice3\x00'
|
||||||
|
footnote section 2 = 'Text for notice 1'
|
||||||
|
footnote section 3 = 'Text for notice 2'
|
||||||
|
footnote section 4 = 'Text for notice 3'
|
||||||
|
|
||||||
|
Starting with Dropbook 1.5.2 first record looks a bit different. It is sequence
|
||||||
|
of \x00\x01 then 1 byte of footnote id length, then footnote id then \x00.
|
||||||
|
|
||||||
|
E.G.
|
||||||
|
|
||||||
|
footnote section 1 = '\x00\x01\x07notice1\x00\x00\x01\x0Afootnote10\x00'
|
||||||
|
|
||||||
|
|
||||||
|
Sidebar Records
|
||||||
|
---------------
|
||||||
|
|
||||||
|
The first sidebar record is a \x00 separated list of sidebar ids. All
|
||||||
|
subsequent sidebar records are the sidebar text corresponding to the id's
|
||||||
|
position in the list. Sidebar text is compressed in the same manner as normal
|
||||||
|
text records
|
||||||
|
|
||||||
|
E.G.
|
||||||
|
|
||||||
|
sidebar section 1 = 'notice1\x00notice2\x00notice3\x00'
|
||||||
|
sidebar section 2 = 'Text for notice 1'
|
||||||
|
sidebar section 3 = 'Text for notice 2'
|
||||||
|
sidebar section 4 = 'Text for notice 3'
|
||||||
|
|
||||||
|
Starting with Dropbook 1.5.2 first record looks a bit different. It is sequence
|
||||||
|
of \x00\x01 then 1 byte of sidebar's id length, then sidebar's id then \x00.
|
||||||
|
|
||||||
|
E.G.
|
||||||
|
|
||||||
|
sidebar section 1 = '\x00\x01\x07notice1\x00\x00\x01\x09sidebar10\x00'
|
||||||
|
|
||||||
|
|
||||||
|
Metadata Record
|
||||||
|
---------------
|
||||||
|
|
||||||
|
\x00 separated list of string.
|
||||||
|
|
||||||
|
Metadata takes the form:
|
||||||
|
|
||||||
|
title\x00
|
||||||
|
author\x00
|
||||||
|
copyright\x00
|
||||||
|
publisher\x00
|
||||||
|
isbn\x00
|
||||||
|
|
||||||
|
E.G.
|
||||||
|
|
||||||
|
Gibraltar Earth\x00Michael McCollum\x001999\x00Sci Fi Arizona\x001929381255\x00
|
||||||
|
|
||||||
|
The metdata record is always followed by a record which contains 'MeTaInFo\x00'
|
||||||
|
|
||||||
|
Note: Starting with DropBook 1.5.2 'MeTaInFo\x00' is not following Metadata
|
||||||
|
Record. It is a separate record that ends the file and there are some more
|
||||||
|
records between Metadata record and 'MeTaInFo\x00' record.
|
||||||
|
|
||||||
|
|
||||||
|
Text Sizes Record
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
There is a special record that contains the initial size of all text blocks
|
||||||
|
before compression. It is just a sequence of 2-byte blocks which are containing
|
||||||
|
the sizes.
|
||||||
|
|
||||||
|
E.G.
|
||||||
|
|
||||||
|
\x1F\xFB\x20\x00\x20\x00\x1F\xFE\x1F\xFD\x09\x46
|
||||||
|
|
||||||
|
Note: By this we can judge that theoretical maximum of initial block size is
|
||||||
|
65535 bytes.
|
||||||
|
|
414
format_docs/pdb/mbp.txt
Normal file
414
format_docs/pdb/mbp.txt
Normal file
@ -0,0 +1,414 @@
|
|||||||
|
// BEGINING OF FILE
|
||||||
|
// NOTES:
|
||||||
|
// 1* Numeric data stored as big endian, 32 bits.
|
||||||
|
// 2* Data padded to 16 bits limits. (Sometimes to 32 bits limits?)
|
||||||
|
// 3* Text stored seems to be an 8 bit encoding padded to 16 bits
|
||||||
|
// (may be "ISO-8859-1"?, or may be just a local machine character set?)
|
||||||
|
// 4* I initially used the term "MARK" where I should have used "HIGHLIGTH",
|
||||||
|
// bear that in mind (it was a bad name election when I started reversing)
|
||||||
|
|
||||||
|
<0x 31 bytes = book_title_PAR + 0x00 PAD if (book_title_PAR < 31) >
|
||||||
|
<0x 00>
|
||||||
|
<0x 00 00 00 00>
|
||||||
|
...4
|
||||||
|
...4
|
||||||
|
<0x 00 00 00 00>
|
||||||
|
<0x 00 00 00 00>
|
||||||
|
<0x 00 00 00 00>
|
||||||
|
<0x 00 00 00 00>
|
||||||
|
BPAR
|
||||||
|
MOBI
|
||||||
|
<0x 4 bytes = Next free pointer identifier>
|
||||||
|
// Note: pointer identifiers aren't always consecutive,
|
||||||
|
// so this number is usually bigger than de # of index entries
|
||||||
|
<0x 00 00>
|
||||||
|
<0x 4 bytes = Number of index entries>
|
||||||
|
<0x 4 bytes = Position of BPAR>
|
||||||
|
<0x 00 00 00 00> // BPAR pointer identifier = 0x0
|
||||||
|
|
||||||
|
|
||||||
|
// INDEXES:
|
||||||
|
// Order of Indexes: from the beginning of this MBP file,
|
||||||
|
// forward to the end of the file.
|
||||||
|
// Nevertheless, see these comments for order relative to:
|
||||||
|
// "BEGINING OF USER DATA": order of Data marks.
|
||||||
|
// "FINAL GROUP OF MARKS": order of final marks.
|
||||||
|
[for each {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK,
|
||||||
|
AUTHOR,TITLE,CATEGORY,GENRE,ABSTRACT,COVER,PUBLISHER,
|
||||||
|
...}
|
||||||
|
|| "last DATA"]
|
||||||
|
// Note: Pointer identifiers to DATA's assigned so the number
|
||||||
|
// shrinks as the table grows down.
|
||||||
|
[if NOTE || CORRECTION]
|
||||||
|
<0x 4 bytes = Position of DATA....EBVS>
|
||||||
|
<0x 4 bytes = Pointer identifier, used by BKMK blocks>
|
||||||
|
[fi NOTE || CORRECTION]
|
||||||
|
<0x 4 bytes = Position of DATA>
|
||||||
|
<0x 4 bytes = Pointer identifier, used by BKMK blocks>
|
||||||
|
[if NOTE || CORRECTION]
|
||||||
|
<0x 4 bytes = Position of DATA>
|
||||||
|
<0x 4 bytes = Pointer identifier, used by BKMK blocks>
|
||||||
|
[fi NOTE || CORRECTION]
|
||||||
|
[if MARK || DRAWING || BOOKMARK]
|
||||||
|
<0x 4 bytes = Position of DATA....EBVS>
|
||||||
|
<0x 4 bytes = Pointer identifier, used by BKMK blocks>
|
||||||
|
[fi MARK || DRAWING || BOOKMARK]
|
||||||
|
[if AUTHOR || TITLE || CATEGORY || GENRE || ABSTRACT || COVER || PUBLISHER]
|
||||||
|
<0x 4 bytes = Position of [AUTH || TITL || CATE || GENR || ABST || COVE || PUBL] >
|
||||||
|
<0x 4 bytes = Pointer identifier>
|
||||||
|
[fi AUTHOR || TITLE || CATEGORY || GENRE || ABSTRACT || COVER || PUBLISHER]
|
||||||
|
[if last DATA] // there's always a last piece of DATA (not user data?)
|
||||||
|
<0x 4 bytes = Position of last DATA>
|
||||||
|
<0x 4 bytes = Pointer identifier> // usually <0x 00 00 00 01>
|
||||||
|
[fi last DATA]
|
||||||
|
[next {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK,
|
||||||
|
AUTHOR,TITLE,CATEGORY,GENRE,ABSTRACT,COVER,PUBLISHER,
|
||||||
|
...}
|
||||||
|
|| "last DATA"]
|
||||||
|
|
||||||
|
|
||||||
|
[for each {NOTE,MARK,CORRECTION,DRAWING}]
|
||||||
|
<0x 4 bytes = Position of BKMK>
|
||||||
|
<0x 4 bytes = Pointer identifier>
|
||||||
|
// Note: pointer identifiers for BKMK's are usually the minor
|
||||||
|
// of all the identifiers associated to an annotation. All
|
||||||
|
// other DATA references in INDEXES table associated to this
|
||||||
|
// BKMK, have bigger pointer identifiers.
|
||||||
|
// Note: Pointer identifiers to BKMK's assigned so the number
|
||||||
|
// grows as the table grows down.
|
||||||
|
[next {NOTE,MARK,CORRECTION,DRAWING}]
|
||||||
|
|
||||||
|
|
||||||
|
<0x 2 bytes random PAD>
|
||||||
|
BPAR
|
||||||
|
<0x 4 bytes = size of BPAR block>
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
...4 <-- 'position of last read' related
|
||||||
|
...4 <-- 'position of last read' related
|
||||||
|
...4
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
...4
|
||||||
|
...4
|
||||||
|
...4 <-- 'position of last read' related
|
||||||
|
...(rest of size of BPAR block, if bigger than 0x20)
|
||||||
|
[if (size of BPAR block) mod 32 != 0]
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
[fi]
|
||||||
|
|
||||||
|
// BEGINING OF USER DATA:
|
||||||
|
// Order of {NOTE,MARK,CORRECTION,DRAWING} :
|
||||||
|
// starts with user data at the end of the file,
|
||||||
|
// going backwards to the begining of the file:
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
[for each {NOTE,MARK,CORRECTION,DRAWING}]
|
||||||
|
//-------------------------------
|
||||||
|
[if NOTE]
|
||||||
|
DATA
|
||||||
|
<0x 4 bytes = size of DATA block>
|
||||||
|
[if EBAR] // this block can appear, or not... ???
|
||||||
|
EBAR
|
||||||
|
...various {4 x byte} ???
|
||||||
|
[fi EBAR]
|
||||||
|
EBVS
|
||||||
|
<0x 00 00 00 03> ???
|
||||||
|
<0x 4 bytes = IDENTIFIER> ???
|
||||||
|
[<0x 00 00 00 01>, or nothing at all] ???
|
||||||
|
<0x 00 00 00 08>
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
<0x 00 00 00 00>
|
||||||
|
<0x 00 00 00 10>
|
||||||
|
...(rest of size of DATA block)
|
||||||
|
<0x FD EA = PAD? (ýê)>
|
||||||
|
DATA
|
||||||
|
<0x 4 bytes = size of <marked text (see 3rd note)> >
|
||||||
|
<marked text (see 3rd note)>
|
||||||
|
[if (size of <marked text (see 3rd note)>) mod 4 !=0]
|
||||||
|
<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
|
||||||
|
[fi]
|
||||||
|
DATA
|
||||||
|
<0x 4 bytes = size of <note text (see 3rd note)> >
|
||||||
|
<note text (see 3rd note)>
|
||||||
|
[if (size of <note text (see 3rd note)>) mod 4 !=0]
|
||||||
|
<0x random PAD until (size of <note text (see 3rd note)>) mod 4 ==0>
|
||||||
|
[fi]
|
||||||
|
[fi NOTE]
|
||||||
|
//-------------------------------
|
||||||
|
[if MARK || BOOKMARK]
|
||||||
|
DATA
|
||||||
|
<0x 4 bytes = size of <marked text (see 3rd note)> >
|
||||||
|
<marked text (see 3rd note)>
|
||||||
|
[if (size of <marked text (see 3rd note)>) mod 4 !=0]
|
||||||
|
<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
|
||||||
|
[fi]
|
||||||
|
DATA
|
||||||
|
<0x 4 bytes = size of DATA block>
|
||||||
|
[if EBAR] // this block can appear, or not... ???
|
||||||
|
EBAR
|
||||||
|
...various {4 x byte} ???
|
||||||
|
[fi EBAR]
|
||||||
|
EBVS
|
||||||
|
<0x 00 00 00 03> ???
|
||||||
|
<0x 4 bytes = IDENTIFIER> ???
|
||||||
|
[<0x 00 00 00 01>, or nothing at all] ???
|
||||||
|
<0x 00 00 00 08>
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
<0x 00 00 00 00>
|
||||||
|
<0x 00 00 00 10>
|
||||||
|
...(rest of size of DATA block)
|
||||||
|
<0x FD EA = PAD? (ýê)>
|
||||||
|
[fi MARK || BOOKMARK]
|
||||||
|
//-------------------------------
|
||||||
|
[if CORRECTION]
|
||||||
|
DATA
|
||||||
|
<0x 4 bytes = size of DATA block>
|
||||||
|
[if EBAR] // this block can appear, or not... ???
|
||||||
|
EBAR
|
||||||
|
...various {4 x byte} ???
|
||||||
|
[fi EBAR]
|
||||||
|
EBVS
|
||||||
|
<0x 00 00 00 03> ???
|
||||||
|
<0x 4 bytes = IDENTIFIER> ???
|
||||||
|
[<0x 00 00 00 01>, or nothing at all] ???
|
||||||
|
<0x 00 00 00 08>
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
<0x 00 00 00 00>
|
||||||
|
<0x 00 00 00 10>
|
||||||
|
...(rest of size of DATA block)
|
||||||
|
<0x FD EA = PAD? (ýê)>
|
||||||
|
DATA
|
||||||
|
<0x 4 bytes = size of <marked text (see 3rd note)> >
|
||||||
|
<marked text (see 3rd note)>
|
||||||
|
[if (size of <marked text (see 3rd note)>) mod 4 !=0]
|
||||||
|
<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
|
||||||
|
[fi]
|
||||||
|
DATA
|
||||||
|
<0x 4 bytes = size of <note text (see 3rd note)> >
|
||||||
|
<note text (see 3rd note)>
|
||||||
|
[if (size of <note text (see 3rd note)>) mod 4 !=0]
|
||||||
|
<0x random PAD until (size of <note text (see 3rd note)>) mod 4 ==0>
|
||||||
|
[fi]
|
||||||
|
[fi CORRECTION]
|
||||||
|
//-------------------------------
|
||||||
|
[if DRAWING]
|
||||||
|
DATA
|
||||||
|
<0x 4 bytes = size of raw data>
|
||||||
|
ADQM
|
||||||
|
// NOTE: bakground color is stored in corresponding BKMK.
|
||||||
|
[begin DRAWING format]
|
||||||
|
...4 = <0x 00 00 00 01> ???
|
||||||
|
<0x 4 bytes = X POSITION OF UPPER LEFT CORNER??? >
|
||||||
|
<0x 4 bytes = Y POSITION OF UPPER LEFT CORNER??? >
|
||||||
|
<0x 4 bytes = X SIZE in pixels >
|
||||||
|
<0x 4 bytes = Y SIZE in pixels >
|
||||||
|
...4 = <0x 00 00 00 00> ???
|
||||||
|
<0x 4 bytes = number of STROKES>
|
||||||
|
[if "number of STROKES" == 0]
|
||||||
|
<0x 00 00 00 00>
|
||||||
|
[end DRAWING format]
|
||||||
|
[fi]
|
||||||
|
[for each STROKE]
|
||||||
|
<0x 00 00 00 01> ???
|
||||||
|
<0x 4 bytes> =
|
||||||
|
Stroke's beginning position in list of coordinates.
|
||||||
|
<0x 4 bytes> =
|
||||||
|
Stroke's ending position in list of coordinates.
|
||||||
|
<0x 00 RR GG BB> = RRGGBB color of stroke.
|
||||||
|
[next STROKE]
|
||||||
|
<0x 4 bytes> = number of coordinate pairs in array of coordinates.
|
||||||
|
// NOTE: each stroke is formed out of at least three
|
||||||
|
// coordinate pairs: begin, {next point}(1-n), end point.
|
||||||
|
[for each COORDINATE]
|
||||||
|
<0x 4 bytes> = X coordinate
|
||||||
|
<0x 4 bytes> = Y coordinate
|
||||||
|
[next COORDINATE]
|
||||||
|
[end DRAWING format]
|
||||||
|
[if (size of <marked text (see 3rd note)>) mod 4 !=0]
|
||||||
|
<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
|
||||||
|
[fi]
|
||||||
|
DATA
|
||||||
|
<0x 4 bytes = size of <marked text (see 3rd note)> >
|
||||||
|
<marked text (see 3rd note)>
|
||||||
|
[if (size of <marked text (see 3rd note)>) mod 4 !=0]
|
||||||
|
<0x random PAD until (size of <marked text (see 3rd note)>) mod 4 ==0>
|
||||||
|
[fi]
|
||||||
|
DATA
|
||||||
|
<0x 4 bytes = size of DATA block>
|
||||||
|
[if EBAR] // this block can appear, or not... ???
|
||||||
|
EBAR
|
||||||
|
...various {4 x byte} ???
|
||||||
|
[fi EBAR]
|
||||||
|
EBVS
|
||||||
|
<0x 00 00 00 03>
|
||||||
|
<0x 4 bytes = IDENTIFIER>
|
||||||
|
[<0x 00 00 00 01>, or nothing at all] ???
|
||||||
|
<0x 00 00 00 08>
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
<0x 00 00 00 00>
|
||||||
|
<0x 00 00 00 10>
|
||||||
|
...(size of DATA block - 30)
|
||||||
|
<0x FD EA = PAD? (ýê)>
|
||||||
|
[fi DRAWING]
|
||||||
|
//-------------------------------
|
||||||
|
[next {NOTE,MARK,CORRECTION,DRAWING}]
|
||||||
|
|
||||||
|
// AUTHOR (if any)
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
[if AUTHOR]
|
||||||
|
AUTH
|
||||||
|
<0x 4 bytes = size of AUTHOR block>
|
||||||
|
<text (see 3rd note)>
|
||||||
|
[fi AUTHOR]
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
// TITLE (if any)
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
[if TITLE]
|
||||||
|
TITL
|
||||||
|
<0x 4 bytes = size of TITLE block>
|
||||||
|
<text (see 3rd note)>
|
||||||
|
[fi TITLE]
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
// GENRE (if any)
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
[if GENRE]
|
||||||
|
GENR
|
||||||
|
<0x 4 bytes = size of GENRE block>
|
||||||
|
<text (see 3rd note)>
|
||||||
|
[fi GENRE]
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
// ABSTRACT (if any)
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
[if ABSTRACT]
|
||||||
|
ABST
|
||||||
|
<0x 4 bytes = size of ABSTRACT block>
|
||||||
|
<text (see 3rd note)>
|
||||||
|
[fi ABSTRACT]
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
|
||||||
|
// FINAL DATA
|
||||||
|
// Note: 'FINAL DATA' can occur anytime between these marks:
|
||||||
|
// AUTHOR,TITLE,CATEGORY,GENRE,ABSTRACT,COVER,PUBLISHER,...
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
DATA
|
||||||
|
<0x 4 bytes = size of EBVS block>
|
||||||
|
[if EBAR] // this block can appear, or not... ???
|
||||||
|
EBAR
|
||||||
|
...various {4 x byte} ???
|
||||||
|
[fi EBAR]
|
||||||
|
EBVS
|
||||||
|
<0x 00 00 00 03> || <0x 00 00 00 04>
|
||||||
|
<0x 4 bytes || 8 bytes = IDENTIFIER>
|
||||||
|
<0x 00 00 00 08>
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
<0x 00 00 00 00>
|
||||||
|
<0x 00 00 00 10>
|
||||||
|
...(size of EBVS block - 30) :
|
||||||
|
...4 <-- 'position of last read' related
|
||||||
|
...various {4 x byte} ???
|
||||||
|
...4 <-- 'position of last read' related
|
||||||
|
...4
|
||||||
|
...4
|
||||||
|
...4
|
||||||
|
<0x FD EA = PAD? (ýê)>
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
|
||||||
|
// CATEGORY (if any)
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
[if CATEGORY]
|
||||||
|
CATE
|
||||||
|
<0x 4 bytes = size of CATEGORY block>
|
||||||
|
<text (see 3rd note)>
|
||||||
|
[fi CATEGORY]
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
// COVER (if any)
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
[if COVER]
|
||||||
|
COVE
|
||||||
|
<0x 4 bytes = size of COVER block>
|
||||||
|
<text (see 3rd note)>
|
||||||
|
[fi COVER]
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
// PUBLISHER (if any)
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
[if PUBLISHER]
|
||||||
|
PUBL
|
||||||
|
<0x 4 bytes = size of PUBLISHER block>
|
||||||
|
<text (see 3rd note)>
|
||||||
|
[fi PUBLISHER]
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
// FINAL GROUP OF MARKS
|
||||||
|
// Order of {NOTE,MARK,CORRECTION} :
|
||||||
|
// starts with user data at the begining of the file,
|
||||||
|
// going forwards to the end:
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
[for each {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK}]
|
||||||
|
BKMK
|
||||||
|
<0x 4 bytes = size of BKMK>
|
||||||
|
<0x 4 bytes = TEXT position of the beginning of {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK}>
|
||||||
|
//-------------------------------
|
||||||
|
[if DRAWING]
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
[else]
|
||||||
|
<0x 4 bytes = TEXT position of the end of {NOTE,MARK,CORRECTION,BOOKMARK}>
|
||||||
|
[fi DRAWING]
|
||||||
|
...4
|
||||||
|
...4
|
||||||
|
//-------------------------------
|
||||||
|
[if NOTE]
|
||||||
|
<0x xx xx xx (20)?>, xxxxxx=>RRGGBB color ???
|
||||||
|
<0x 00 00 00 02>
|
||||||
|
[fi NOTE]
|
||||||
|
[if MARK]
|
||||||
|
<0x xx xx xx (0F/00)??>, xxxxxx=>RRGGBB color ???
|
||||||
|
<0x 00 00 00 04>
|
||||||
|
[fi MARK]
|
||||||
|
[if CORRECTION]
|
||||||
|
<0x xx xx xx (6F)?>, xxxxxx=>RRGGBB color ???
|
||||||
|
<0x 00 00 00 02>
|
||||||
|
[fi CORRECTION]
|
||||||
|
[if DRAWING]
|
||||||
|
<0x xx xx xx (0F)?>, xxxxxx=>RRGGBB DRAWING's background color.
|
||||||
|
<0x 00 00 00 08>
|
||||||
|
[fi DRAWING]
|
||||||
|
[if BOOKMARK]
|
||||||
|
<0x xx xx xx 00>
|
||||||
|
<0x 00 00 00 01>
|
||||||
|
[fi BOOKMARK]
|
||||||
|
// this one is a strange type of mark, of yet not identified use:
|
||||||
|
[if UNKNOWN_TYPE_YET_1]
|
||||||
|
<0x xx xx xx 00>
|
||||||
|
<0x 00 00 40 00>
|
||||||
|
[fi UNKNOWN_TYPE_YET_1]
|
||||||
|
|
||||||
|
//-------------------------------
|
||||||
|
[if BOOKMARK || (NOTE "without stored marked text")]
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
[else]
|
||||||
|
<0x 4 bytes = DATA pointer in INDEXES>
|
||||||
|
[fi BOOKMARK]
|
||||||
|
[if DRAWING || MARK]
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
[else]
|
||||||
|
<0x 4 bytes = DATA pointer in INDEXES>
|
||||||
|
[fi]
|
||||||
|
<0x 4 bytes = DATA pointer in INDEXES>
|
||||||
|
[if DRAWING]
|
||||||
|
<0x 4 bytes = DATA pointer in INDEXES>
|
||||||
|
[else]
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
[fi]
|
||||||
|
//-------------------------------
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
[next {NOTE,MARK,CORRECTION,DRAWING,BOOKMARK}]
|
||||||
|
//--------------------------------------------------------------------
|
||||||
|
|
||||||
|
[if length % 32 bit != 0] ???
|
||||||
|
<0x FF FF FF FF>
|
||||||
|
[fi]
|
||||||
|
|
||||||
|
// END OF FILE
|
||||||
|
|
||||||
|
// by idleloop@yahoo.com, v0.2.e, 12/2009
|
||||||
|
// http://www.angelfire.com/ego2/idleloop
|
341
format_docs/pdb/mobi.txt
Normal file
341
format_docs/pdb/mobi.txt
Normal file
@ -0,0 +1,341 @@
|
|||||||
|
from (http://wiki.mobileread.com/wiki/MOBI)
|
||||||
|
|
||||||
|
About
|
||||||
|
-----
|
||||||
|
|
||||||
|
MOBI is the format used by the the MobiPocket Reader. It may have a .mobi
|
||||||
|
extension or it may have a .prc extension. The extension can be changed by the
|
||||||
|
user to either of the accepted forms. In either case it may be DRM protected or
|
||||||
|
non-DRM. The .prc extension is used because the PalmOS doesn't support any file
|
||||||
|
extensions except .prc or .pdb. Note that Mobipocket prohibits their DRM format
|
||||||
|
to be used on dedicated eBook readers that support other DRM formats.
|
||||||
|
|
||||||
|
|
||||||
|
Description
|
||||||
|
-----------
|
||||||
|
|
||||||
|
MOBI format was originally an extension of the PalmDOC format by adding
|
||||||
|
certain HTML like tags to the data. Many MOBI formatted documents still use
|
||||||
|
this form. However there is also a high compression version of this file format
|
||||||
|
that compresses data to a larger degree in a proprietary manner. There are some
|
||||||
|
third party programs that can read the eBooks in the original MOBI format but
|
||||||
|
there are only a few third party program that can read the eBooks in the new
|
||||||
|
compressed form. The higher compression mode is using a huffman coding scheme
|
||||||
|
that has been called the Huff/cdic algorithm.
|
||||||
|
|
||||||
|
From time to time features have been added to the format so new files may have
|
||||||
|
problems if you try and read them with a down level reader. Currently the
|
||||||
|
source files follow the guidelines in the Open eBook format.
|
||||||
|
|
||||||
|
Note that AZW for the Amazon Kindle is the same format as MOBI except that it
|
||||||
|
uses a slightly different DRM scheme.
|
||||||
|
|
||||||
|
|
||||||
|
Format
|
||||||
|
------
|
||||||
|
|
||||||
|
Like PalmDOC, the Mobipocket file format is that of a standard Palm Database
|
||||||
|
Format file. The header of that format includes the name of the database
|
||||||
|
(usually the book title and sometimes a portion of the authors name) which is
|
||||||
|
up to 31 bytes of data. The files are identified as Creator ID of MOBI and a
|
||||||
|
Type of BOOK.
|
||||||
|
|
||||||
|
|
||||||
|
PalmDOC Header
|
||||||
|
--------------
|
||||||
|
|
||||||
|
The first record in the Palm Database Format gives more information about the
|
||||||
|
Mobipocket file. The first 16 bytes are almost identical to the first sixteen
|
||||||
|
bytes of a PalmDOC format file.
|
||||||
|
|
||||||
|
bytes content comments
|
||||||
|
2 Compression 1 == no compression, 2 = PalmDOC compression,
|
||||||
|
17480 = HUFF/CDIC compression.
|
||||||
|
2 Unused Always zero
|
||||||
|
4 text length Uncompressed length of the entire text of the book
|
||||||
|
2 record count Number of PDB records used for the text of the book.
|
||||||
|
2 record size Maximum size of each record containing text, always
|
||||||
|
4096.
|
||||||
|
4 Current Position Current reading position, as an offset into the
|
||||||
|
uncompressed text
|
||||||
|
|
||||||
|
There are two differences from a Palm DOC file. There's an additional
|
||||||
|
compression type (17480), and the Current Position bytes are used for a
|
||||||
|
different purpose:
|
||||||
|
|
||||||
|
bytes content comments
|
||||||
|
2 Encryption Type 0 == no encryption, 1 = Old Mobipocket Encryption,
|
||||||
|
2 = Mobipocket Encryption.
|
||||||
|
2 Unknown Usually zero
|
||||||
|
|
||||||
|
The old Mobipocket Encryption scheme only allows the file to be registered
|
||||||
|
with one PID, unlike the current encryption scheme that allows multiple PIDs to
|
||||||
|
be used in a single file. Unless specifically mentioned, all the encryption
|
||||||
|
information on this page refers to the current scheme.
|
||||||
|
|
||||||
|
|
||||||
|
MOBI Header
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Most Mobipocket file also have a MOBI header in record 0 that follows these
|
||||||
|
16 bytes, and newer formats also have an EXTH header following the MOBI header,
|
||||||
|
again all in record 0 of the PDB file format.
|
||||||
|
|
||||||
|
The MOBI header is of variable length and is not documented. Some fields have
|
||||||
|
been tentatively identified as follows:
|
||||||
|
|
||||||
|
offset bytes content comments
|
||||||
|
16 4 identifier The characters M O B I
|
||||||
|
20 4 header length The length of the MOBI header, including
|
||||||
|
the previous 4 bytes
|
||||||
|
24 4 Mobi type The kind of Mobipocket file this is
|
||||||
|
2 Mobipocket Book
|
||||||
|
3 PalmDoc Book
|
||||||
|
4 Audio
|
||||||
|
257 News
|
||||||
|
258 News_Feed
|
||||||
|
259 News_Magazine
|
||||||
|
513 PICS
|
||||||
|
514 WORD
|
||||||
|
515 XLS
|
||||||
|
516 PPT
|
||||||
|
517 TEXT
|
||||||
|
518 HTML
|
||||||
|
28 4 text Encoding 1252 = CP1252 (WinLatin1); 65001 = UTF-8
|
||||||
|
32 4 Unique-ID Some kind of unique ID number (random?)
|
||||||
|
36 4 Generator version Potentially the version of the
|
||||||
|
Mobipocket-generation tool. Always >=
|
||||||
|
the value of the "format version" field
|
||||||
|
and <= the version of mobigen used to
|
||||||
|
produce the file.
|
||||||
|
40 40 Reserved All 0xFF. In case of a dictionary, or
|
||||||
|
some newer file formats, a few bytes are
|
||||||
|
used from this range of 40 0xFFs
|
||||||
|
80 4 First Non-book index? First record number (starting with 0)
|
||||||
|
that's not the book's text
|
||||||
|
84 4 Full Name Offset Offset in record 0 (not from start of
|
||||||
|
file) of the full name of the book
|
||||||
|
88 4 Full Name Length Length in bytes of the full name of the
|
||||||
|
book
|
||||||
|
92 4 Language Book language code. Low byte is main
|
||||||
|
language 09= English, next byte is
|
||||||
|
dialect, 08 = British, 04 = US
|
||||||
|
96 4 Input Language Input language for a dictionary
|
||||||
|
100 4 Output Language Output language for a dictionary
|
||||||
|
104 4 Format version Potentially the version of the
|
||||||
|
Mobipocket format used in this file.
|
||||||
|
Always >= 1 and <= the value of the
|
||||||
|
"generator version" field.
|
||||||
|
108 4 First Image record First record number (starting with 0)
|
||||||
|
that contains an image. Image records
|
||||||
|
should be sequential. If there are
|
||||||
|
no images this will be 0xffffffff.
|
||||||
|
112 4 HUFF record Record containing Huff information
|
||||||
|
used in HUFF/CDIC decompression.
|
||||||
|
116 4 HUFF count Number of Huff records.
|
||||||
|
122 4 DATP record Unknown: Records starts with DATP.
|
||||||
|
124 4 DATP count Number of DATP records.
|
||||||
|
128 4 EXTH flags Bitfield. if bit 6, 0x40 is set, then
|
||||||
|
there's an EXTH record
|
||||||
|
The following records are only present if the mobi header is long enough.
|
||||||
|
132 36 ? 32 unknown bytes, if MOBI is long enough
|
||||||
|
168 4 DRM Offset Offset to DRM key info in DRMed files.
|
||||||
|
0xFFFFFFFF if no DRM
|
||||||
|
172 4 DRM Count Number of entries in DRM info.
|
||||||
|
174 4 DRM Size Number of bytes in DRM info.
|
||||||
|
176 4 DRM Flags Some flags concerning the DRM info.
|
||||||
|
180 6 ?
|
||||||
|
186 2 Last Image record Possible vaule with the last image
|
||||||
|
record. If there are no images in the
|
||||||
|
book this will be 0xffff.
|
||||||
|
188 4 ?
|
||||||
|
192 4 FCIS record Unknown. Record starts with FCIS.
|
||||||
|
196 4 ?
|
||||||
|
200 4 FLIS record Unknown. Records starts with FLIS.
|
||||||
|
204 ? ? Bytes to the end of the MOBI header,
|
||||||
|
including the following if the header
|
||||||
|
length >= 228. ( 244 from start of
|
||||||
|
record)
|
||||||
|
242 2 Extra Data Flags A set of binary flags, some of which
|
||||||
|
indicate extra data at the end of each
|
||||||
|
text block. This only seems to be valid
|
||||||
|
for Mobipocket format version 5 and 6
|
||||||
|
(and higher?), when the header length
|
||||||
|
is 228 (0xE4) or 232 (0xE8).
|
||||||
|
|
||||||
|
|
||||||
|
EXTH Header
|
||||||
|
-----------
|
||||||
|
|
||||||
|
If the MOBI header indicates that there's an EXTH header, it follows immediately
|
||||||
|
after the MOBI header. since the MOBI header is of variable length, this isn't
|
||||||
|
at any fixed offset in record 0. Note that some readers will ignore any EXTH
|
||||||
|
header info if the mobipocket version number specified in the MOBI header is 2
|
||||||
|
or less (perhaps 3 or less).
|
||||||
|
|
||||||
|
The EXTH header is also undocumented, so some of this is guesswork.
|
||||||
|
|
||||||
|
bytes content comments
|
||||||
|
4 identifier the characters E X T H
|
||||||
|
4 header length the length of the EXTH header, including the previous 4 bytes
|
||||||
|
4 record Count The number of records in the EXTH header. the rest of the EXTH header consists of repeated EXTH records to the end of the EXTH length.
|
||||||
|
EXTH record start Repeat until done.
|
||||||
|
4 record type Exth Record type. Just a number identifying what's stored in the record
|
||||||
|
4 record length length of EXTH record = L , including the 8 bytes in the type and length fields
|
||||||
|
L-8 record data Data.
|
||||||
|
EXTH record end Repeat until done.
|
||||||
|
|
||||||
|
There are lots of different EXTH Records types. Ones found so far in Mobipocket
|
||||||
|
files are listed here, with possible meanings. Hopefully the table will be
|
||||||
|
filled in as more information comes to light.
|
||||||
|
|
||||||
|
record type usual length name comments
|
||||||
|
1 drm_server_id
|
||||||
|
2 drm_commerce_id
|
||||||
|
3 drm_ebookbase_book_id
|
||||||
|
100 author
|
||||||
|
101 publisher
|
||||||
|
102 imprint
|
||||||
|
103 description
|
||||||
|
104 isbn
|
||||||
|
105 subject
|
||||||
|
106 publishingdate
|
||||||
|
107 review
|
||||||
|
108 contributor
|
||||||
|
109 rights
|
||||||
|
110 subjectcode
|
||||||
|
111 type
|
||||||
|
112 source
|
||||||
|
113 asin
|
||||||
|
114 versionnumber
|
||||||
|
115 sample
|
||||||
|
116 startreading
|
||||||
|
118 retail price (as text)
|
||||||
|
119 retail price currency (as text)
|
||||||
|
201 coveroffset
|
||||||
|
202 thumboffset
|
||||||
|
203 hasfakecover
|
||||||
|
204 204 Unknown
|
||||||
|
205 205 Unknown
|
||||||
|
206 206 Unknown
|
||||||
|
207 207 Unknown
|
||||||
|
208 208 Unknown
|
||||||
|
300 300 Unknown
|
||||||
|
401 clippinglimit
|
||||||
|
402 publisherlimit
|
||||||
|
403 403 Unknown
|
||||||
|
404 404 ttsflag
|
||||||
|
501 4 cdetype PDOC - Personal Doc;
|
||||||
|
EBOK - ebook;
|
||||||
|
502 lastupdatetime
|
||||||
|
503 updatedtitle
|
||||||
|
|
||||||
|
And now, at the end of Record 0 of the PDB file format, we usually get the full
|
||||||
|
file name, the offset of which is given in the MOBI header.
|
||||||
|
|
||||||
|
|
||||||
|
Variable-width integers
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
Some parts of the Mobipocket format encode data as variable-width integers.
|
||||||
|
These integers are represented big-endian with 7 bits per byte in bits 1-7. They
|
||||||
|
may be either forward-encoded, in which case only the LSB has bit 8 set, or
|
||||||
|
backward-encoded, in which case only the MSB has bit 8 set. For example, the
|
||||||
|
number 0x11111 would be represented forward-encoded as:
|
||||||
|
|
||||||
|
0x04 0x22 0x91
|
||||||
|
|
||||||
|
And backward-encoded as:
|
||||||
|
|
||||||
|
0x84 0x22 0x11
|
||||||
|
|
||||||
|
|
||||||
|
Trailing entries
|
||||||
|
----------------
|
||||||
|
|
||||||
|
The Extra Data Flags field of the MOBI header indicates which, if any, trailing
|
||||||
|
entries are appended to the end of each text record. Each set bit in the field
|
||||||
|
indicates a trailing entry. The entries appear to occur in bit-order; e.g.,
|
||||||
|
trailing entry 1 immediately follows the text content and entry 16 occurs at
|
||||||
|
the very end of the record. The effect and exact details of most of these
|
||||||
|
entries is unknown. The trailing entries indicated by bits 2-16 appear to
|
||||||
|
follow a common format. That format is:
|
||||||
|
|
||||||
|
<data><size>
|
||||||
|
|
||||||
|
Where <size> is the size of the entire trailing entry (including the size of
|
||||||
|
<size>) as a backward-encoded Mobipocket variable-width integer.
|
||||||
|
|
||||||
|
Only a few bits have been identified
|
||||||
|
|
||||||
|
bit Data at end of records
|
||||||
|
0x0001 Multi-byte character overlaps
|
||||||
|
0x0002 Some data to help with indexing
|
||||||
|
0x0004 Some data about uncrossable breaks
|
||||||
|
|
||||||
|
|
||||||
|
Multibyte character overlap
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
When bit 1 of the Extra Data Flags field is set, each record is followed by a
|
||||||
|
trailing entry containing any extra bytes necessary to complete a multibyte
|
||||||
|
character which crosses the record boundary. The bytes do not participate in
|
||||||
|
compression regardless which compression scheme is used for the file. However,
|
||||||
|
unlike the trailing data bytes, the multibytes (including the count byte) do
|
||||||
|
get included in any encryption. The overlapping bytes then re-appear as normal
|
||||||
|
content at the beginning of the following record. The trailing entry ends with
|
||||||
|
a byte containing a count of the overlapping bytes plus additional flags.
|
||||||
|
|
||||||
|
offset bytes content comments
|
||||||
|
0 0-3 N terminal bytes
|
||||||
|
of a multibyte
|
||||||
|
character
|
||||||
|
N 1 Size & flags bits 1-2 encode N, use of bits 3-8 is unknown
|
||||||
|
|
||||||
|
|
||||||
|
PalmDOC Compression
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
PalmDOC uses LZ77 compression techniques. DOC files can contain only compressed
|
||||||
|
text. The format does not allow for any text formatting. This keeps files small,
|
||||||
|
in keeping with the Palm philosophy. However, extensions to the format can use
|
||||||
|
tags, such as HTML or PML, to include formatting within text. These extensions
|
||||||
|
to PalmDoc are not interchangeable and are the basis for most eBook Reader
|
||||||
|
formats on Palm devices.
|
||||||
|
|
||||||
|
LZ77 algorithms achieve compression by replacing portions of the data with
|
||||||
|
references to matching data that has already passed through both encoder and
|
||||||
|
decoder. A match is encoded by a pair of numbers called a length-distance pair,
|
||||||
|
which is equivalent to the statement "each of the next length characters is
|
||||||
|
equal to the character exactly distance characters behind it in the uncompressed
|
||||||
|
stream." (The "distance" is sometimes called the "offset" instead.)
|
||||||
|
|
||||||
|
In the PalmDoc format, a length-distance pair is always encoded by a two-byte
|
||||||
|
sequence. Of the 16 bits that make up these two bytes, 11 bits go to encoding
|
||||||
|
the distance, 3 go to encoding the length, and the remaining two are used to
|
||||||
|
make sure the decoder can identify the first byte as the beginning of such a
|
||||||
|
two-byte sequence. The exact alforithm needed to decode the compressed text can
|
||||||
|
be found on the PalmDOC page.
|
||||||
|
|
||||||
|
PalmDOC data is always divided into 4096 byte blocks and the blocks are acted
|
||||||
|
upon independently.
|
||||||
|
|
||||||
|
PalmDOC does have support for bookmarks. These pointers are named and refer to
|
||||||
|
an offset location in a file. If the file is edited these locations may no
|
||||||
|
longer refer to the correct locations. Some reading programs allow the user to
|
||||||
|
enter or edit these bookmarks while others treat them as a TOC. Some reading
|
||||||
|
programs may ignore them entirely. They are stored at the end of the file itself
|
||||||
|
so the full file needs to be scanned when loaded to find them.
|
||||||
|
|
||||||
|
|
||||||
|
MBP
|
||||||
|
---
|
||||||
|
|
||||||
|
This is the extension used on a side file (auxiliary) for MOBI formatted eBooks.
|
||||||
|
It is used to store metadata used by the library software and also to store
|
||||||
|
user entered data like bookmarks, annotations, last read position. This file is
|
||||||
|
created automatically by the reader program when the eBook is first opened and
|
||||||
|
has a .mbp extension. The Library management software in MobiPocket uses this
|
||||||
|
file to get information displayed in the library window such as title and author
|
||||||
|
so that it won't have to open the larger eBook file.
|
||||||
|
|
25
format_docs/pdb/palmdoc.txt
Normal file
25
format_docs/pdb/palmdoc.txt
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
PalmDoc Format
|
||||||
|
--------------
|
||||||
|
|
||||||
|
The format is that of a standard Palm Database Format file. The header of that
|
||||||
|
format includes the name of the database (usually the book title and sometimes
|
||||||
|
a portion of the authors name) which is up to 31 bytes of data. This string of
|
||||||
|
characters is terminated with a 0 in the C style. The files are identified as
|
||||||
|
Creator ID of REAd and a Type of TEXt.
|
||||||
|
|
||||||
|
|
||||||
|
Record 0
|
||||||
|
--------
|
||||||
|
|
||||||
|
The first record in the Palm Database Format gives more information about the
|
||||||
|
PalmDOC file, and contains 16 bytes.
|
||||||
|
|
||||||
|
bytes content comments
|
||||||
|
|
||||||
|
2 Compression 1 == no compression, 2 = PalmDOC compression (see below)
|
||||||
|
2 Unused Always zero
|
||||||
|
4 text length Uncompressed length of the entire text of the book
|
||||||
|
2 record count Number of PDB records used for the text of the book.
|
||||||
|
2 record size Maximum size of each record containing text, always 4096
|
||||||
|
4 Current Position Current reading position, as an offset into the uncompressed text
|
||||||
|
|
104
format_docs/pdb/pdb_format.txt
Normal file
104
format_docs/pdb/pdb_format.txt
Normal file
@ -0,0 +1,104 @@
|
|||||||
|
Format
|
||||||
|
------
|
||||||
|
|
||||||
|
A PDB file can be borken into multiple parts. The header, record 0 and data.
|
||||||
|
values stored within the various parts are big-endian byte order. The data
|
||||||
|
part is is broken down into multiple sections. The section count and offsets
|
||||||
|
are referened in the PDB header. Sections can be no more than 65505 bytes in
|
||||||
|
length.
|
||||||
|
|
||||||
|
|
||||||
|
Layout
|
||||||
|
------
|
||||||
|
|
||||||
|
PDB files take the format: DB header followed by the record 0 which has
|
||||||
|
contained format specific iformation followed by data.
|
||||||
|
|
||||||
|
DB Header
|
||||||
|
0 Record 0
|
||||||
|
.
|
||||||
|
. Data (borken down into sections)
|
||||||
|
.
|
||||||
|
|
||||||
|
|
||||||
|
Palm Database Header Format
|
||||||
|
|
||||||
|
bytes content comments
|
||||||
|
|
||||||
|
32 name database name. This name is 0 terminated in the
|
||||||
|
field and will be used as the file name on a
|
||||||
|
computer. For eBooks this usually contains the
|
||||||
|
title and may have the author depending on the
|
||||||
|
length available.
|
||||||
|
|
||||||
|
2 attributes bit field.
|
||||||
|
0x0002 Read-Only
|
||||||
|
0x0004 Dirty AppInfoArea
|
||||||
|
0x0008 Backup this database (i.e. no conduit exists)
|
||||||
|
0x0010 (16 decimal) Okay to install newer over
|
||||||
|
existing copy, if present on PalmPilot
|
||||||
|
0x0020 (32 decimal) Force the PalmPilot to reset
|
||||||
|
after this database is installed
|
||||||
|
0x0040 (64 decimal) Don't allow copy of file to be
|
||||||
|
beamed to other Pilot.
|
||||||
|
|
||||||
|
2 version file version
|
||||||
|
|
||||||
|
4 creation date No. of seconds since start of January 1, 1904.
|
||||||
|
|
||||||
|
4 modification date No. of seconds since start of January 1, 1904.
|
||||||
|
|
||||||
|
4 last backup date No. of seconds since start of January 1, 1904.
|
||||||
|
|
||||||
|
4 modificationNumber
|
||||||
|
|
||||||
|
4 appInfoID offset to start of Application Info (if present)
|
||||||
|
or null
|
||||||
|
|
||||||
|
4 sortInfoID offset to start of Sort Info (if present) or null
|
||||||
|
|
||||||
|
4 type See above table. (For Applications this data will
|
||||||
|
be 'appl')
|
||||||
|
|
||||||
|
4 creator See above table. This program will be launched if
|
||||||
|
the file is tapped
|
||||||
|
|
||||||
|
4 uniqueIDseed used internally to identify record
|
||||||
|
|
||||||
|
4 nextRecordListID Only used when in-memory on Palm OS. Always set to
|
||||||
|
zero in stored files.
|
||||||
|
|
||||||
|
2 number of Records number of records in the file - N
|
||||||
|
|
||||||
|
8N record Info List
|
||||||
|
|
||||||
|
start of record
|
||||||
|
info entry Repeat N times to end of record info entry
|
||||||
|
|
||||||
|
4 record Data Offset the offset from the start of the PDB of this record
|
||||||
|
|
||||||
|
1 record Attributes bit field. The least significant four bits are used
|
||||||
|
to represent the category values. These are the
|
||||||
|
categories used to split the databases for viewing
|
||||||
|
on the screen. A few of the 16 categories are
|
||||||
|
pre-defined but the user can add their own. There
|
||||||
|
is an undefined category for use if the user or
|
||||||
|
programmer hasn't set this.
|
||||||
|
0x10 (16 decimal) Secret record bit.
|
||||||
|
0x20 (32 decimal) Record in use (busy bit).
|
||||||
|
0x40 (64 decimal) Dirty record bit.
|
||||||
|
0x80 (128, unsigned decimal) Delete record on
|
||||||
|
next HotSync.
|
||||||
|
|
||||||
|
3 UniqueID The unique ID for this record. Often just a
|
||||||
|
sequential count from 0
|
||||||
|
|
||||||
|
end of record
|
||||||
|
info entry
|
||||||
|
|
||||||
|
2? Gap to data traditionally 2 zero bytes to Info or raw data
|
||||||
|
|
||||||
|
? Records The actual data in the file. AppInfoArea (if
|
||||||
|
present), SortInfoArea (if present) and then
|
||||||
|
records sequentially
|
||||||
|
|
34
format_docs/pdb/pdb_types.txt
Normal file
34
format_docs/pdb/pdb_types.txt
Normal file
@ -0,0 +1,34 @@
|
|||||||
|
Palm Database File Code
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
Reader Type Code
|
||||||
|
|
||||||
|
Adobe Reader .pdfADBE
|
||||||
|
PalmDOC TEXtREAd
|
||||||
|
BDicty BVokBDIC
|
||||||
|
DB (Database program) DB99DBOS
|
||||||
|
eReader PNRdPPrs
|
||||||
|
eReader DataPPrs
|
||||||
|
FireViewer (ImageViewer) vIMGView
|
||||||
|
HanDBase PmDBPmDB
|
||||||
|
InfoView InfoINDB
|
||||||
|
iSilo ToGoToGo
|
||||||
|
iSilo 3 SDocSilX
|
||||||
|
JFile JbDbJBas
|
||||||
|
JFile Pro JfDbJFil
|
||||||
|
LIST DATALSdb
|
||||||
|
MobileDB Mdb1Mdb1
|
||||||
|
MobiPocket BOOKMOBI
|
||||||
|
Plucker DataPlkr
|
||||||
|
QuickSheet DataSprd
|
||||||
|
SuperMemo SM01SMem
|
||||||
|
TealDoc TEXtTlDc
|
||||||
|
TealInfo InfoTlIf
|
||||||
|
TealMeal DataTlMl
|
||||||
|
TealPaint DataTlPt
|
||||||
|
ThinkDB dataTDBP
|
||||||
|
Tides TdatTide
|
||||||
|
TomeRaider ToRaTRPW
|
||||||
|
Weasel zTXTGPlm
|
||||||
|
WordSmith BDOCWrdS
|
||||||
|
|
2122
format_docs/pdb/plucker.html
Normal file
2122
format_docs/pdb/plucker.html
Normal file
File diff suppressed because it is too large
Load Diff
936
format_docs/pdb/pml.txt
Normal file
936
format_docs/pdb/pml.txt
Normal file
@ -0,0 +1,936 @@
|
|||||||
|
Palm Markup Language
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
This page explains how to use the Palm Markup Language (PML) to specify
|
||||||
|
formatting and other information in a text file for later reading using the
|
||||||
|
eReader.
|
||||||
|
|
||||||
|
PML commands start with a backslash, "\", and usually consist of a single
|
||||||
|
character after that. Some PML commands are paired, such as those that specify
|
||||||
|
italicized text. Other commands are directives, such as the "\p", which
|
||||||
|
specifies a page break. PML is not meant to be an industrial-strength markup
|
||||||
|
language, but it is easy to understand, easy to parse, and creates high-quality
|
||||||
|
electronic books.
|
||||||
|
|
||||||
|
Since PML and Palm DropBook are not without flaws, there is a page of Tips and
|
||||||
|
Pitfalls.
|
||||||
|
|
||||||
|
|
||||||
|
Let's Dive Right In
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
palmsample.txt contains examples of formatting text, specifying chapters, etc.
|
||||||
|
Use it to start from, or just as an example when making your own books.
|
||||||
|
|
||||||
|
The following table specifies the Palm Markup Language commands, and what
|
||||||
|
they do.
|
||||||
|
|
||||||
|
\p New page
|
||||||
|
\x New chapter; also causes a new page break.
|
||||||
|
Enclose chapter title (and any style codes)
|
||||||
|
with \x and \x
|
||||||
|
\Xn New chapter, indented n levels (n between 0 and
|
||||||
|
4 inclusive) in the Chapter dialog; doesn't
|
||||||
|
cause a page break. Enclose chapter title (and
|
||||||
|
any style codes) with \Xn and \Xn
|
||||||
|
\Cn="Chapter title" Insert "Chapter title" into the chapter
|
||||||
|
listing, with level n (like \Xn). The text is
|
||||||
|
not shown on the page and does not force a page
|
||||||
|
break. This can sometimes be useful to insert a
|
||||||
|
chapter mark at the beginning of an
|
||||||
|
introduction to the chapter, for example.
|
||||||
|
\c Center this block of text; close with \c on
|
||||||
|
beginning of line
|
||||||
|
\r Right justify text block; close with \r on
|
||||||
|
beginning of line
|
||||||
|
\i Italicize block; close with \i
|
||||||
|
\u Underline block; close with \u
|
||||||
|
\o Overstrike block; close with \o
|
||||||
|
\v Invisible text; close with \v (can be used for
|
||||||
|
comments)
|
||||||
|
\t Indent block. Start at beginning of a line,
|
||||||
|
close with \t at end of a line
|
||||||
|
\T="50%" Indents the specified percentage of the screen
|
||||||
|
width, 50% in this case. If the current drawing
|
||||||
|
position is already past the specified screen
|
||||||
|
location, this tag is ignored.
|
||||||
|
\w="50%" Embed a horizontal rule of a given percentage
|
||||||
|
width of the screen, in this case 50%. This tag
|
||||||
|
causes a line break before and after it. The
|
||||||
|
rule is centered. The percent sign is mandatory.
|
||||||
|
\n Switch to the "normal" font, which is specified
|
||||||
|
by the user
|
||||||
|
\s Switch to stdFont; close with \s to revert to
|
||||||
|
normal font
|
||||||
|
\b Switch to boldFont; close with \b to revert to
|
||||||
|
normal font (deprecated; use \B instead)
|
||||||
|
\l Switch to largeFont; close with \l to revert to
|
||||||
|
normal font
|
||||||
|
\B Mark text as bold. Unlike the \b tag, \B
|
||||||
|
doesn't change the font, so you can have large
|
||||||
|
bold text. You cannot mix \b and \B in the same
|
||||||
|
PML file.
|
||||||
|
\Sp Mark text as superscript. Should not be mixed
|
||||||
|
with other styles such as bold, italic, etc.
|
||||||
|
Enclose superscripted text with \Sp.
|
||||||
|
\Sb Mark text as subscript. Should not be mixed
|
||||||
|
with other styles such as bold, italic, etc.
|
||||||
|
Enclose subscripted text with \Sb.
|
||||||
|
\k Make enclosed text into small-caps; close with
|
||||||
|
\k. Any characters enclosed in \k tags
|
||||||
|
(including those with accents) are made
|
||||||
|
uppercase and are rendered at a smaller point
|
||||||
|
size than a regular uppercase character.
|
||||||
|
\\ Represents a single backslash
|
||||||
|
\aXXX Insert non-ASCII character whose Windows 1252
|
||||||
|
code is decimal XXX. See the PML character
|
||||||
|
table for details.
|
||||||
|
\UXXXX Insert non-ASCII character whose Unicode code
|
||||||
|
is hexidecimal XXXX. See the Extended PML
|
||||||
|
character table for details.
|
||||||
|
\m="imagename.png" Insert the named image. See the section on
|
||||||
|
Images below.
|
||||||
|
\q="#linkanchor"Some text\q Reference a link anchor which is at another
|
||||||
|
spot in the document. The string after the
|
||||||
|
anchor specification and before the trailing\q
|
||||||
|
is underlined or otherwise shown to be a link
|
||||||
|
when viewing the document.
|
||||||
|
\Q="linkanchor" Specify a link anchor in the document.
|
||||||
|
\- Insert a soft hyphen. A soft hyphen shows up
|
||||||
|
only if it is necessary to break a word across
|
||||||
|
a line.
|
||||||
|
\Fn="footnote1"1\Fn Link the "1" to a footnote whose name is
|
||||||
|
footnote1, tagged at the end of the PML
|
||||||
|
document. See the section on Footnotes and
|
||||||
|
Sidebars below.
|
||||||
|
\Sd="sidebar1"Sidebar\Sd Link the "Sidebar" text to a sidebar whose name
|
||||||
|
is sidebar1, tagged at the end of the PML
|
||||||
|
document. See the section on Footnotes and
|
||||||
|
Sidebars below.
|
||||||
|
\I Mark as a reference index item. Enclose index
|
||||||
|
item (and any style codes) with \I and \I. See
|
||||||
|
Creating Dictionaries for more information.
|
||||||
|
|
||||||
|
|
||||||
|
Examples
|
||||||
|
--------
|
||||||
|
|
||||||
|
\pThis is a new page
|
||||||
|
|
||||||
|
\xChapter III\x
|
||||||
|
|
||||||
|
\X1Chapter III, part A\X1
|
||||||
|
|
||||||
|
\p\C="Introduction"The following story is one of my favorites...
|
||||||
|
|
||||||
|
\cProperty of
|
||||||
|
Gateway Senior High School
|
||||||
|
\c
|
||||||
|
|
||||||
|
\rJustify my love
|
||||||
|
\r
|
||||||
|
|
||||||
|
This stuff is \ireally\i cool.
|
||||||
|
|
||||||
|
I just read \uMoby Dick.\u
|
||||||
|
|
||||||
|
This is a \obig\o mistake.
|
||||||
|
|
||||||
|
Copyright 1917\v Date of magazine serialization \v
|
||||||
|
|
||||||
|
\tOnce upon a time
|
||||||
|
there was a wicked queen
|
||||||
|
called Esmerelda.\t
|
||||||
|
|
||||||
|
Mammals:\T="40%"Lions
|
||||||
|
\T="40%"Tigers
|
||||||
|
\T="40%"Bears
|
||||||
|
|
||||||
|
He walked away.
|
||||||
|
\w="80%"
|
||||||
|
Later that day, he ran into an old friend.
|
||||||
|
|
||||||
|
\nIn the normal ways...
|
||||||
|
|
||||||
|
The \stitle page\s should be formatted...
|
||||||
|
|
||||||
|
I just \bcan't\b believe that you...
|
||||||
|
|
||||||
|
This \lREALLY\l is a large tiger...
|
||||||
|
|
||||||
|
This \Bbold\B text can be either \l\Blarge bold\B\l or \s\Bsmall bold\B\s.
|
||||||
|
|
||||||
|
e\Spx + 2\Sp = 9
|
||||||
|
|
||||||
|
C\Sb2\SbH\Sb3\SbO\Sb2\Sb should be used in moderation.
|
||||||
|
|
||||||
|
See also \kanteater\k.
|
||||||
|
|
||||||
|
The DOS prompt said "C:\\windows\\"
|
||||||
|
|
||||||
|
The man said \a147Yeah.\a148
|
||||||
|
|
||||||
|
Arrows can point \U2190 left or right \U2192.
|
||||||
|
|
||||||
|
A Yield sign looks like this: \m="yieldsign.png".
|
||||||
|
|
||||||
|
See the \q="#detailedinstructions"Detailed Instructions\q for how to install your eBook.
|
||||||
|
|
||||||
|
\Q="detailedinstructions"\bDetailed Instructions\b - This section
|
||||||
|
describes how to install an eBook to your handheld device.
|
||||||
|
|
||||||
|
Very long words like anti\-dis\-establish\-ment\-arian\-ism may benefit from
|
||||||
|
the use of soft hyphens.
|
||||||
|
|
||||||
|
The Emerson case\Fn="emerson"[1]\Fn will be very important...
|
||||||
|
|
||||||
|
For more information, see the \Sd="moreinfo"sidebar\Sd.
|
||||||
|
|
||||||
|
\I\Baardvark\B\I \in.\i a large burrowing nocturnal mammal that feeds especially on termites and ants
|
||||||
|
|
||||||
|
|
||||||
|
Footnotes and Sidebars
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
Footnotes and Sidebars are specified with an XML-like syntax at the end of the
|
||||||
|
PML document. For example,
|
||||||
|
|
||||||
|
<sidebar id="sidebar1">
|
||||||
|
Here's some \itext\i for a sidebar.
|
||||||
|
</sidebar>
|
||||||
|
|
||||||
|
would specify the sidebar to be displayed when the user taps on a sidebar link
|
||||||
|
in the text that was specified using the \Sd tag.
|
||||||
|
|
||||||
|
Any text or PML placed after the first footnote or sidebar is ignored as part
|
||||||
|
of the book text.
|
||||||
|
|
||||||
|
Sidebars and footnotes can include most PML features, but there are some PML
|
||||||
|
tags that cannot be used inside of a sidebar or footnote.
|
||||||
|
|
||||||
|
These include
|
||||||
|
Chapters \x, \X, \C
|
||||||
|
Links \q, \Q
|
||||||
|
Footnotes \Fn
|
||||||
|
Sidebars \Sd
|
||||||
|
|
||||||
|
See the palmsample.txt file for examples of how to use many of the PML tags.
|
||||||
|
|
||||||
|
|
||||||
|
Images
|
||||||
|
------
|
||||||
|
|
||||||
|
The following rules are intended to guarantee that images in your eBook will be
|
||||||
|
viewable on all platforms that eReader runs on.
|
||||||
|
|
||||||
|
On low-resolution Palm OS handhelds, an image wider than 158 pixels or taller
|
||||||
|
than 148 pixels will be represented in the text by a thumbnail that the user
|
||||||
|
can tap to view the entire image. Images smaller than 158 x 148 will be
|
||||||
|
presented in-line with the text.
|
||||||
|
|
||||||
|
On high-resolution Palm OS handhelds (those having screens of 320x320 pixels or
|
||||||
|
more), images smaller than 158 by 148 pixels will be pixel-doubled. Images
|
||||||
|
larger than 158x148 may be shown in-line with the text, if they will fit on
|
||||||
|
the screen.
|
||||||
|
|
||||||
|
On non-Palm OS platforms, small images will be scaled up appropriately. Large
|
||||||
|
images will be scaled down to fit on the page; in this case the user can tap on
|
||||||
|
the image to view the entire image and zoom in or out.
|
||||||
|
|
||||||
|
For DropBook to find the image, it must be present in a directory whose name
|
||||||
|
matches that of the PML text file. For example, if "pmlsample.txt" contains a
|
||||||
|
reference to an image called "intro.png", then there must be a directory called
|
||||||
|
"pmlsample_img" that contains intro.png. The directory's name is the name of
|
||||||
|
the PML file (without the .txt extension) with "_img" appended.
|
||||||
|
|
||||||
|
Images must be in PNG format and cannot be filtered or interlaced. Image depth
|
||||||
|
must be 8 bits or less. Any color table may be used for color images.
|
||||||
|
|
||||||
|
Image files must be less than or equal to 65505 bytes in size, since they are
|
||||||
|
embedded into the .pdb format of the book; Palm database records are limited to
|
||||||
|
65505 bytes in length. Since images are compressed, the actual image displayed
|
||||||
|
by the reader may be much larger than 64K.
|
||||||
|
|
||||||
|
Any or all of these restrictions may eventually be removed.
|
||||||
|
|
||||||
|
|
||||||
|
Adding a Title, Cover Art, and Other Meta-information to Your eBook
|
||||||
|
-------------------------------------------------------------------
|
||||||
|
|
||||||
|
DropBook normally presents a dialog in which the title and other information
|
||||||
|
for the eBook may be specified. This information may be embedded in the PML
|
||||||
|
file instead.
|
||||||
|
|
||||||
|
To specify the eBook title as it will appear in the Open dialog on the
|
||||||
|
handheld, place a block of invisible comment text at the beginning of the file
|
||||||
|
using \v tags. Inside this comment block, put the string TITLE="My eBook",
|
||||||
|
where "My eBook" is replaced with the name of your eBook. It should look
|
||||||
|
something like this:
|
||||||
|
|
||||||
|
\vTITLE="Palm Sample Document"\v
|
||||||
|
|
||||||
|
You can also specify the author using the AUTHOR meta-tag, the publisher with
|
||||||
|
PUBLISHER, copyright information with COPYRIGHT, and the eBook ISBN with EISBN.
|
||||||
|
A fully-specified set of meta-information might appear in PML as:
|
||||||
|
|
||||||
|
\vTITLE="Palm Sample Document" AUTHOR="Sam Morgenstern" PUBLISHER="eReader.com"
|
||||||
|
EISBN="X-XXXX-XXXX" COPYRIGHT="Copyright \a169 2004 by Sam Morgenstern"\v
|
||||||
|
|
||||||
|
Cover art: If an image named "cover.png" is present in the eBook, it is assumed
|
||||||
|
to be the cover art for the eBook. See the rules for images for sizing and
|
||||||
|
other information.
|
||||||
|
|
||||||
|
Some or all of this information may appear in the book information dialog in
|
||||||
|
eReader, and may be used for other purposes in future products.
|
||||||
|
|
||||||
|
|
||||||
|
Creating Dictionaries
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
The \I PML tag is used to delimit an index item. Example: \Iaardvark\I
|
||||||
|
|
||||||
|
Each entry must start in the normal font. If DropBook shows an error beginning
|
||||||
|
with "No styles permitted before...", there is probably a missing end style tag
|
||||||
|
before the text shown in the error message.
|
||||||
|
|
||||||
|
Links, chapters and other PML structures are not permitted in dictionaries.
|
||||||
|
Images, however, are.
|
||||||
|
|
||||||
|
A special dictionary entry, "(Front matter)" is shown before other entries in
|
||||||
|
the list of entries, and should be used to include pronunciation symbols and
|
||||||
|
other front matter.
|
||||||
|
|
||||||
|
Note that use of dictionaries requires eReader Pro.
|
||||||
|
|
||||||
|
|
||||||
|
Tips and Pitfalls
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
This page explains some common mistakes, some bugs in DropBook and/or the
|
||||||
|
eReader, and some techniques that will allow you to create quality electronic
|
||||||
|
books for the eReader.
|
||||||
|
|
||||||
|
* Check out the Converting to Palm eBooks page for some pointers on
|
||||||
|
converting text from various formats into the Palm Markup Language.
|
||||||
|
* Use a return at the end of each paragraph, not each line.
|
||||||
|
* Using an extra return between paragraphs reads easier than paragraph
|
||||||
|
indentation.
|
||||||
|
* The eReader doesn't display empty lines at the top of a page. If you need
|
||||||
|
to have some "empty" lines at the top of a page, put a space on each line.
|
||||||
|
* Don't use tables if you can possibly avoid it.
|
||||||
|
|
||||||
|
None of the fonts that the eReader supports are monospaced, so tables can
|
||||||
|
be difficult to represent. Break out the information in another way, or
|
||||||
|
use the \T tag, but beware of tables that look great on a Palm OS
|
||||||
|
handheld but not on a Pocket PC or vice versa.
|
||||||
|
|
||||||
|
* The Reader breaks lines on spaces, dashes or underscores. This has
|
||||||
|
several implications.
|
||||||
|
|
||||||
|
1. Don't fill more than a line with spaces, dashes or underscores.
|
||||||
|
There's a bug (which will be fixed in a future release) which
|
||||||
|
causes MakeBook to hang on such a line. Note that in the large
|
||||||
|
font, the number of spaces, dashes or underscores will be much
|
||||||
|
smaller than in the small font.
|
||||||
|
2. A string such as He shouted "Wait!--" may place the last quote on
|
||||||
|
the beginning of a line, since the line would break after the
|
||||||
|
second dash. Prevent this by using the PML string: He shouted
|
||||||
|
"Wait!\a150\a150". The non-breaking dash, code 150, will not break
|
||||||
|
a line. Use \a160 for a non-breaking space. Even better: use \a151,
|
||||||
|
a long dash, instead of two short dashes.
|
||||||
|
|
||||||
|
* The justification codes \c and \r (center and right justification) must
|
||||||
|
have closing codes on the beginning of the line following the justified
|
||||||
|
text.
|
||||||
|
* The indentation tag \t must have a closing tag at the end of a line of
|
||||||
|
the indented text.
|
||||||
|
* Use \s (small font) in the title page(s) of books to force the page(s) to
|
||||||
|
format nicely. Other than that, \n, \s and \l should rarely be necessary;
|
||||||
|
the font size used for most text display should be chosen by the user.
|
||||||
|
|
||||||
|
|
||||||
|
Converting Uncommon Characters to PML
|
||||||
|
-------------------------------------
|
||||||
|
|
||||||
|
Use this chart to convert uncommon characters to their Palm Markup Language
|
||||||
|
(PML) equivalent. Most characters are simply represented as themselves in PML
|
||||||
|
and don't require this chart. But some uncommon characters can only be
|
||||||
|
represented in PML by their "\aXXX" syntax. Use this chart to look up that
|
||||||
|
"\aXXX" syntax.
|
||||||
|
|
||||||
|
For Example, if you wanted to write the following phrase in PML:
|
||||||
|
|
||||||
|
Copyright © 1999 by Samuel Morgenstern
|
||||||
|
|
||||||
|
In PML, you would write it as:
|
||||||
|
|
||||||
|
Copyright \a169 1999 by Samuel Morgenstern
|
||||||
|
|
||||||
|
Char HTML # Code HTML Char Code PML Char Code Description
|
||||||
|
|
||||||
|
  - Normal space
|
||||||
|
! ! - ! Exclamation
|
||||||
|
" " " " Double quote
|
||||||
|
# # - # Hash
|
||||||
|
$ $ - $ Dollar
|
||||||
|
% % - % Percent
|
||||||
|
& & & & Ampersand
|
||||||
|
' ' - ' Apostrophe
|
||||||
|
( ( - ( Open bracket
|
||||||
|
) ) - ) Close bracket
|
||||||
|
* * - * Asterisk
|
||||||
|
+ + - + Plus sign
|
||||||
|
, , - , Comma
|
||||||
|
- - - - Minus sign
|
||||||
|
. . - . Period
|
||||||
|
/ / - / Forward slash
|
||||||
|
0 0 - 0 Digit 0
|
||||||
|
1 1 - 1 Digit 1
|
||||||
|
2 2 - 2 Digit 2
|
||||||
|
3 3 - 3 Digit 3
|
||||||
|
4 4 - 4 Digit 4
|
||||||
|
5 5 - 5 Digit 5
|
||||||
|
6 6 - 6 Digit 6
|
||||||
|
7 7 - 7 Digit 7
|
||||||
|
8 8 - 8 Digit 8
|
||||||
|
9 9 - 9 Digit 9
|
||||||
|
: : - : Colon
|
||||||
|
; ; - ; Semicolon
|
||||||
|
< < < Less than
|
||||||
|
= = - = Equals
|
||||||
|
> > > Greater than
|
||||||
|
? ? - ? Question mark
|
||||||
|
@ @ - @ At sign
|
||||||
|
A A - A A
|
||||||
|
B B - B B
|
||||||
|
C C - C C
|
||||||
|
D D - D D
|
||||||
|
E E - E E
|
||||||
|
F F - F F
|
||||||
|
G G - G G
|
||||||
|
H H - H H
|
||||||
|
I I - I I
|
||||||
|
J J - J J
|
||||||
|
K K - K K
|
||||||
|
L L - L L
|
||||||
|
M M - M M
|
||||||
|
N N - N N
|
||||||
|
O O - O O
|
||||||
|
P P - P P
|
||||||
|
Q Q - Q Q
|
||||||
|
R R - R R
|
||||||
|
S S - S S
|
||||||
|
T T - T T
|
||||||
|
U U - U U
|
||||||
|
V V - V V
|
||||||
|
W W - W W
|
||||||
|
X X - X X
|
||||||
|
Y Y - Y Y
|
||||||
|
Z Z - Z Z
|
||||||
|
[ [ - [ Open square bracket
|
||||||
|
\ \ - \\ Backslash
|
||||||
|
] ] - ] Close square bracket
|
||||||
|
^ ^ - ^ Caret
|
||||||
|
_ _ - _ Underscore
|
||||||
|
` ` - ` Grave accent
|
||||||
|
a a - a a
|
||||||
|
b b - b b
|
||||||
|
c c - c c
|
||||||
|
d d - d d
|
||||||
|
e e - e e
|
||||||
|
f f - f f
|
||||||
|
g g - g g
|
||||||
|
h h - h h
|
||||||
|
i i - i i
|
||||||
|
j j - j j
|
||||||
|
k k - k k
|
||||||
|
l l - l l
|
||||||
|
m m - m m
|
||||||
|
n n - n n
|
||||||
|
o o - o o
|
||||||
|
p p - p p
|
||||||
|
q q - q q
|
||||||
|
r r - r r
|
||||||
|
s s - s s
|
||||||
|
t t - t t
|
||||||
|
u u - u u
|
||||||
|
v v - v v
|
||||||
|
w w - w w
|
||||||
|
x x - x x
|
||||||
|
y y - y y
|
||||||
|
z z - z z
|
||||||
|
{ { - { Left brace
|
||||||
|
| | - | Vertical bar
|
||||||
|
} } - } Right brace
|
||||||
|
~ ~ - ~ Tilde
|
||||||
|
|
||||||
|
  \a160 Non-breaking space
|
||||||
|
¡ ¡ \a161 Inverted exclamation
|
||||||
|
¢ ¢ \a162 Cent sign
|
||||||
|
£ £ \a163 Pound sign
|
||||||
|
¤ ¤ \a164 Currency sign
|
||||||
|
¥ ¥ \a165 Yen sign
|
||||||
|
¦ ¦ \a166 Broken bar
|
||||||
|
§ § \a167 Section sign
|
||||||
|
¨ ¨ \a168 Umlaut or diaeresis
|
||||||
|
© © \a169 Copyright sign
|
||||||
|
ª ª \a170 Feminine ordinal
|
||||||
|
« « \a171 Left angle quotes
|
||||||
|
¬ ¬ \a172 Logical not sign
|
||||||
|
­ ­ \a173 Soft hyphen
|
||||||
|
® ® \a174 Registered trademark
|
||||||
|
¯ ¯ \a175 Spacing macron
|
||||||
|
° ° \a176 Degree sign
|
||||||
|
± ± \a177 Plus-minus sign
|
||||||
|
² ² \a178 Superscript 2
|
||||||
|
³ ³ \a179 Superscript 3
|
||||||
|
´ ´ \a180 Spacing acute
|
||||||
|
µ µ \a181 Micro sign
|
||||||
|
¶ ¶ \a182 Paragraph sign
|
||||||
|
· · \a183 Middle dot
|
||||||
|
¸ ¸ \a184 Spacing cedilla
|
||||||
|
¹ ¹ \a185 Superscript 1
|
||||||
|
º º \a186 Masculine ordinal
|
||||||
|
» » \a187 Right angle quotes
|
||||||
|
¼ ¼ \a188 One quarter
|
||||||
|
½ ½ \a189 One half
|
||||||
|
¾ ¾ \a190 Three quarters
|
||||||
|
¿ ¿ \a191 Inverted question mark
|
||||||
|
À À \a192 A grave
|
||||||
|
Á Á \a193 A acute
|
||||||
|
  \a194 A circumflex
|
||||||
|
à à \a195 A tilde
|
||||||
|
Ä Ä \a196 A diaeresis
|
||||||
|
Å Å \a197 A ring
|
||||||
|
Æ &Aelig; \a198 AE ligature
|
||||||
|
Ç Ç \a199 C cedilla
|
||||||
|
È È \a200 E grave
|
||||||
|
É É \a201 E acute
|
||||||
|
Ê Ê \a202 E circumflex
|
||||||
|
Ë Ë \a203 E diaeresis
|
||||||
|
Ì Ì \a204 I grave
|
||||||
|
Í Í \a205 I acute
|
||||||
|
Î Î \a206 I circumflex
|
||||||
|
Ï Ï \a207 I diaeresis
|
||||||
|
Ð Ð \a208 Eth
|
||||||
|
Ñ Ñ \a209 N tilde
|
||||||
|
Ò Ò \a210 O grave
|
||||||
|
Ó Ó \a211 O acute
|
||||||
|
Ô Ô \a212 O circumflex
|
||||||
|
Õ Õ \a213 O tilde
|
||||||
|
Ö Ö \a214 O diaeresis
|
||||||
|
× × \a215 Multiplication sign
|
||||||
|
Ø Ø \a216 O slash
|
||||||
|
Ù Ù \a217 U grave
|
||||||
|
Ú Ú \a218 U acute
|
||||||
|
Û Û \a219 U circumflex
|
||||||
|
Ü Ü \a220 U diaeresis
|
||||||
|
Ý Ý \a221 Y acute
|
||||||
|
Þ Þ \a222 THORN
|
||||||
|
ß ß \a223 sharp s
|
||||||
|
à à \a224 a grave
|
||||||
|
á á \a225 a acute
|
||||||
|
â â \a226 a circumflex
|
||||||
|
ã ã \a227 a tilde
|
||||||
|
ä ä \a228 a diaeresis
|
||||||
|
å å \a229 a ring
|
||||||
|
æ æ \a230 ae ligature
|
||||||
|
ç ç \a231 c cedilla
|
||||||
|
è è \a232 e grave
|
||||||
|
é é \a233 e acute
|
||||||
|
ê ê \a234 e circumflex
|
||||||
|
ë ë \a235 e diaeresis
|
||||||
|
ì ì \a236 i grave
|
||||||
|
í í \a237 i acute
|
||||||
|
î î \a238 i circumflex
|
||||||
|
ï ï \a239 i diaeresis
|
||||||
|
ð ð \a240 eth
|
||||||
|
ñ ñ \a241 n tilde
|
||||||
|
ò ò \a242 o grave
|
||||||
|
ó ó \a243 o acute
|
||||||
|
ô ô \a244 o circumflex
|
||||||
|
õ õ \a245 o tilde
|
||||||
|
ö ö \a246 o diaeresis
|
||||||
|
÷ ÷ \a247 division sign
|
||||||
|
ø ø \a248 o slash
|
||||||
|
ù ù \a249 u grave
|
||||||
|
ú ú \a250 u acute
|
||||||
|
û û \a251 u circumflex
|
||||||
|
ü ü \a252 u diaeresis
|
||||||
|
ý ý \a253 y acute
|
||||||
|
þ þ \a254 thorn
|
||||||
|
ÿ ÿ \a255 y diaeresis
|
||||||
|
, ‚ ‚ \a130 single low quote
|
||||||
|
ƒ ƒ \a131 Scripted f
|
||||||
|
„ „ \a132 low quote
|
||||||
|
… … \a133 Ellipsis
|
||||||
|
† † \a134 Dagger
|
||||||
|
‡ &Dagger \a135 Double dagger
|
||||||
|
Š Š \a138 Large S w/inverted caret
|
||||||
|
< ‹ ‹ \a139 single left angle quote
|
||||||
|
Œ Œ \a140 Large combined oe
|
||||||
|
‘ ‘ \a145 Open single smart quote
|
||||||
|
’ ’ \a146 Close single smart quote
|
||||||
|
“ “ \a147 Open double smart quote
|
||||||
|
” ” \a148 Close double smart quote
|
||||||
|
• • \a149 Bullet
|
||||||
|
– – \a150 Small dash (en dash)
|
||||||
|
— — \a151 Large dash (em dash)
|
||||||
|
™ ™ \a153 Trademark
|
||||||
|
š š \a154 Small S w/inverted caret
|
||||||
|
> › › \a155 single right angle quote
|
||||||
|
œ œ \a156 Small combined oe
|
||||||
|
Ÿ Ÿ \a159 Large Y with diaeresis
|
||||||
|
|
||||||
|
|
||||||
|
Extended Character Set
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
In addition to the special characters supported by earlier versions of eReader
|
||||||
|
(which can be accessed using the \a### tag), all versions of eReader Pro and
|
||||||
|
eReader version 2.4 and later include support for additional special characters
|
||||||
|
and symbols. These symbols can be accessed using the \U#### tag, where #### are
|
||||||
|
four hexidecimal digits giving the Unicode encoding of the special character.
|
||||||
|
|
||||||
|
Only the limited subset of Unicode characters given in the table below are
|
||||||
|
supported. In addition, some of the characters that are included in the table
|
||||||
|
are not present in eReader Pro versions prior to 2.4. To ensure that the
|
||||||
|
characters are displayed correctly, books using these tags should be read using
|
||||||
|
eReader or eReader Pro version 2.4 or later.
|
||||||
|
|
||||||
|
On Palm OS handhelds these special symbols are only available in one size,
|
||||||
|
matching the "Small" font. For best results on Palm OS handhelds the \U tag
|
||||||
|
should only be used inside blocks set to the "Small" font by way of \s tags.
|
||||||
|
On Palm OS handhelds these special characters are not affected by the font tags
|
||||||
|
(\s, \l, \b and \n), the bold style tag (\B), or the small caps style tag (\k).
|
||||||
|
|
||||||
|
If the \U characters are not showing up correctly using eReader on your Windows
|
||||||
|
desktop or laptop this problem is a result of the fonts for eReader not being
|
||||||
|
installed properly. The solution is to go to the directory C:\Windows\Fonts\
|
||||||
|
and "double click" on each font that starts with "Maynard". This will open each
|
||||||
|
font and allow the system to register it. Close the windows that were opened a
|
||||||
|
result of the mouse clicks and the problem should be resolved.
|
||||||
|
|
||||||
|
Char HTML Code PML Code Description
|
||||||
|
|
||||||
|
Latin Extended-A
|
||||||
|
Ā Ā \U0100 LATIN CAPITAL LETTER A WITH MACRON
|
||||||
|
ā ā \U0101 LATIN SMALL LETTER A WITH MACRON
|
||||||
|
Ă Ă \U0102 LATIN CAPITAL LETTER A WITH BREVE
|
||||||
|
ă ă \U0103 LATIN SMALL LETTER A WITH BREVE
|
||||||
|
ą ą \U0105 LATIN SMALL LETTER A WITH OGONEK
|
||||||
|
ć ć \U0107 LATIN SMALL LETTER C WITH ACUTE
|
||||||
|
Č Č \U010C LATIN CAPITAL LETTER C WITH CARON
|
||||||
|
č č \U010D LATIN SMALL LETTER C WITH CARON
|
||||||
|
Ē Ē \U0112 LATIN CAPITAL LETTER E WITH MACRON
|
||||||
|
ē ē \U0113 LATIN SMALL LETTER E WITH MACRON
|
||||||
|
ĕ ĕ \U0115 LATIN SMALL LETTER E WITH BREVE
|
||||||
|
ė ė \U0117 LATIN SMALL LETTER E WITH DOT ABOVE
|
||||||
|
ę ę \U0119 LATIN SMALL LETTER E WITH OGONEK
|
||||||
|
ě ě \U011B LATIN SMALL LETTER E WITH CARON
|
||||||
|
ĝ ĝ \U011D LATIN SMALL LETTER G WITH CIRCUMFLEX
|
||||||
|
ğ ğ \U011F LATIN SMALL LETTER G WITH BREVE
|
||||||
|
Ī Ī \U012A LATIN CAPITAL LETTER I WITH MACRON
|
||||||
|
ī ī \U012B LATIN SMALL LETTER I WITH MACRON
|
||||||
|
ĭ ĭ \U012D LATIN SMALL LETTER I WITH BREVE
|
||||||
|
į į \U012F LATIN SMALL LETTER I WITH OGONEK
|
||||||
|
ı ı \U0131 LATIN SMALL LETTER DOTLESS I
|
||||||
|
Ł Ł \U0141 LATIN CAPITAL LETTER L WITH STROKE
|
||||||
|
ł ł \U0142 LATIN SMALL LETTER L WITH STROKE
|
||||||
|
ń ń \U0144 LATIN SMALL LETTER N WITH ACUTE
|
||||||
|
ň ň \U0148 LATIN SMALL LETTER N WITH CARON
|
||||||
|
ŋ ŋ \U014B LATIN SMALL LETTER ENG
|
||||||
|
Ō Ō \U014C LATIN CAPITAL LETTER O WITH MACRON
|
||||||
|
ō ō \U014D LATIN SMALL LETTER O WITH MACRON
|
||||||
|
ŏ ŏ \U014F LATIN SMALL LETTER O WITH BREVE
|
||||||
|
ő ő \U0151 LATIN SMALL LETTER O WITH DOUBLE ACUTE
|
||||||
|
ŕ ŕ \U0155 LATIN SMALL LETTER R WITH ACUTE
|
||||||
|
ř ř \U0159 LATIN SMALL LETTER R WITH CARON
|
||||||
|
Ś Ś \U015A LATIN CAPITAL LETTER S WITH ACUTE
|
||||||
|
ś ś \U015B LATIN SMALL LETTER S WITH ACUTE
|
||||||
|
ş ş \U015F LATIN SMALL LETTER S WITH CEDILLA
|
||||||
|
ţ ţ \U0163 LATIN SMALL LETTER T WITH CEDILLA
|
||||||
|
ũ ũ \U0169 LATIN SMALL LETTER U WITH TILDE
|
||||||
|
ū ū \U016B LATIN SMALL LETTER U WITH MACRON
|
||||||
|
ŭ ŭ \U016D LATIN SMALL LETTER U WITH BREVE
|
||||||
|
ŷ ŷ \U0177 LATIN SMALL LETTER Y WITH CIRCUMFLEX
|
||||||
|
ź ź \U017A LATIN SMALL LETTER Z WITH ACUTE
|
||||||
|
Ž Ž \U017D LATIN CAPITAL LETTER Z WITH CARON
|
||||||
|
ž ž \U017E LATIN SMALL LETTER Z WITH CARON
|
||||||
|
Latin Extended-B
|
||||||
|
ƿ \U01BF LATIN LETTER WYNN
|
||||||
|
ǎ \U01CE LATIN SMALL LETTER A WITH CARON
|
||||||
|
ǐ \U01D0 LATIN SMALL LETTER I WITH CARON
|
||||||
|
ǒ \U01D2 LATIN SMALL LETTER O WITH CARON
|
||||||
|
ǔ \U01D4 LATIN SMALL LETTER U WITH CARON
|
||||||
|
ǡ \U01E1 LATIN SMALL LETTER A WITH DOT ABOVE AND MACRON
|
||||||
|
ǣ \U01E3 LATIN SMALL LETTER AE WITH MACRON
|
||||||
|
ǧ \U01E7 LATIN SMALL LETTER G WITH CARON
|
||||||
|
ǫ \U01EB LATIN SMALL LETTER O WITH OGONEK
|
||||||
|
ǰ \U01F0 LATIN SMALL LETTER J WITH CARON
|
||||||
|
ȇ \U0207 LATIN SMALL LETTER E WITH INVERTED BREVE
|
||||||
|
ȝ \U021D LATIN SMALL LETTER YOGH
|
||||||
|
ȧ \U0227 LATIN SMALL LETTER A WITH DOT ABOVE
|
||||||
|
ȯ \U022F LATIN SMALL LETTER O WITH DOT ABOVE
|
||||||
|
ȳ \U0233 LATIN SMALL LETTER Y WITH MACRON
|
||||||
|
IPA Extensions
|
||||||
|
ɑ \U0251 LATIN SMALL LETTER SCRIPT A
|
||||||
|
ɒ \U0252 LATIN SMALL LETTER TURNED SCRIPT A
|
||||||
|
ɔ \U0254 LATIN SMALL LETTER OPEN O
|
||||||
|
ə \U0259 LATIN SMALL LETTER SCHWA
|
||||||
|
ɜ \U025C LATIN SMALL LETTER REVERSED OPEN E
|
||||||
|
ɥ \U0265 LATIN LETTER SMALL LETTER TURNED H
|
||||||
|
ɪ \U026A LATIN LETTER SMALL CAPITAL I
|
||||||
|
ɲ \U0272 LATIN SMALL LETTER N WITH LEFT HOOK
|
||||||
|
ʃ \U0283 LATIN SMALL LETTER ESH
|
||||||
|
ʉ \U0289 LATIN SMALL LETTER U BAR
|
||||||
|
ʊ \U028A LATIN SMALL LETTER UPSILON
|
||||||
|
ʌ \U028C LATIN SMALL LETTER TURNED V
|
||||||
|
ʏ \U028F LATIN LETTER SMALL CAPITAL Y
|
||||||
|
ʒ \U0292 LATIN SMALL LETTER EZH
|
||||||
|
ʔ \U0294 LATIN LETTER GLOTTAL STOP
|
||||||
|
ʜ \U029C LATIN LETTER SMALL CAPITAL H
|
||||||
|
Spacing Modifier Letters
|
||||||
|
ʾ \U02BE MODIFIER LETTER RIGHT HALF RING
|
||||||
|
ʿ \U02BF MODIFIER LETTER LEFT HALF RING
|
||||||
|
ˇ ˇ \U02C7 CARON
|
||||||
|
ˈ \U02C8 MODIFIER LETTER VERTICAL LINE
|
||||||
|
ˌ \U02CC MODIFIER LETTER LOW VERTICAL LINE
|
||||||
|
ː \U02D0 MODIFIER LETTER TRIANGULAR COLON
|
||||||
|
˘ ˘ \U02D8 BREVE
|
||||||
|
˙ ˙ \U02D9 DOT ABOVE
|
||||||
|
Greek and Coptic
|
||||||
|
Α Α \U0391 GREEK CAPTIAL LETTER ALPHA
|
||||||
|
Β Β \U0392 GREEK CAPTIAL LETTER BETA
|
||||||
|
Γ Γ \U0393 GREEK CAPTIAL LETTER GAMMA
|
||||||
|
Δ Ε \U0394 GREEK CAPTIAL LETTER DELTA
|
||||||
|
Ε Ε \U0395 GREEK CAPTIAL LETTER EPSILON
|
||||||
|
Ζ Ζ \U0396 GREEK CAPTIAL LETTER ZETA
|
||||||
|
Η Η \U0397 GREEK CAPTIAL LETTER ETA
|
||||||
|
Θ Θ \U0398 GREEK CAPTIAL LETTER THETA
|
||||||
|
Ι Ι \U0399 GREEK CAPTIAL LETTER IOTA
|
||||||
|
Κ Κ \U039A GREEK CAPTIAL LETTER KAPPA
|
||||||
|
Λ Λ \U039B GREEK CAPTIAL LETTER LAMBDA
|
||||||
|
Μ Μ \U039C GREEK CAPTIAL LETTER MU
|
||||||
|
Ν Ν \U039D GREEK CAPTIAL LETTER NU
|
||||||
|
Ξ Ξ \U039E GREEK CAPTIAL LETTER XI
|
||||||
|
Ο Ο \U039F GREEK CAPTIAL LETTER OMICRON
|
||||||
|
Π Π \U03A0 GREEK CAPTIAL LETTER PI
|
||||||
|
Ρ Ρ \U03A1 GREEK CAPTIAL LETTER RHO
|
||||||
|
Σ Σ \U03A3 GREEK CAPTIAL LETTER SIGMA
|
||||||
|
Τ Τ \U03A4 GREEK CAPTIAL LETTER TAU
|
||||||
|
Υ Υ \U03A5 GREEK CAPTIAL LETTER UPSILON
|
||||||
|
Φ Φ \U03A6 GREEK CAPTIAL LETTER PHI
|
||||||
|
Χ Χ \U03A7 GREEK CAPTIAL LETTER CHI
|
||||||
|
Ψ Ψ \U03A8 GREEK CAPTIAL LETTER PSI
|
||||||
|
Ω Ω \U03A9 GREEK CAPTIAL LETTER OMEGA
|
||||||
|
α α \U03B1 GREEK SMALL LETTER ALPHA
|
||||||
|
β β \U03B2 GREEK SMALL LETTER BETA
|
||||||
|
γ γ \U03B3 GREEK SMALL LETTER GAMMA
|
||||||
|
δ δ \U03B4 GREEK SMALL LETTER DELTA
|
||||||
|
ε ε \U03B5 GREEK SMALL LETTER EPSILON
|
||||||
|
ζ ζ \U03B6 GREEK SMALL LETTER ZETA
|
||||||
|
η η \U03B7 GREEK SMALL LETTER ETA
|
||||||
|
θ θ \U03B8 GREEK SMALL LETTER THETA
|
||||||
|
ι ι \U03B9 GREEK SMALL LETTER IOTA
|
||||||
|
κ κ \U03BA GREEK SMALL LETTER KAPPA
|
||||||
|
λ λ \U03BB GREEK SMALL LETTER LAMBDA
|
||||||
|
μ μ \U03BC GREEK SMALL LETTER MU
|
||||||
|
ν ν \U03BD GREEK SMALL LETTER NU
|
||||||
|
ξ ξ \U03BE GREEK SMALL LETTER XI
|
||||||
|
ο ο \U03BF GREEK SMALL LETTER OMICRON
|
||||||
|
π π \U03C0 GREEK SMALL LETTER PI
|
||||||
|
ρ ρ \U03C1 GREEK SMALL LETTER RHO
|
||||||
|
ς ς \U03C2 GREEK SMALL LETTER FINAL SIGMA
|
||||||
|
σ σ \U03C3 GREEK SMALL LETTER SIGMA
|
||||||
|
τ τ \U03C4 GREEK SMALL LETTER TAU
|
||||||
|
υ υ \U03C5 GREEK SMALL LETTER UPSILON
|
||||||
|
φ φ \U03C6 GREEK SMALL LETTER PHI
|
||||||
|
χ χ \U03C7 GREEK SMALL LETTER CHI
|
||||||
|
ψ ψ \U03C8 GREEK SMALL LETTER PSI
|
||||||
|
ω ω \U03C9 GREEK SMALL LETTER OMEGA
|
||||||
|
ϑ \U03D1 GREEK THETA SYMBOL
|
||||||
|
ϝ \U03DD GREEK SMALL LETTER DIGAMMA
|
||||||
|
Hebrew
|
||||||
|
א א \U05D0 HEBREW LETTER ALEPH
|
||||||
|
ב ב \U05D1 HEBREW LETTER BET
|
||||||
|
ג ג \U05D2 HEBREW LETTER GIMEL
|
||||||
|
ד ד \U05D3 HEBREW LETTER DALET
|
||||||
|
ה ה \U05D4 HEBREW LETTER HE
|
||||||
|
ו ו \U05D5 HEBREW LETTER VAV
|
||||||
|
ז ז \U05D6 HEBREW LETTER ZAYIN
|
||||||
|
ח ח \U05D7 HEBREW LETTER HET
|
||||||
|
ט ט \U05D8 HEBREW LETTER TET
|
||||||
|
י י \U05D9 HEBREW LETTER YOD
|
||||||
|
ך ך \U05DA HEBREW LETTER FINAL KAF
|
||||||
|
כ כ \U05DB HEBREW LETTER KAF
|
||||||
|
ל ל \U05DC HEBREW LETTER LAMED
|
||||||
|
ם ם \U05DD HEBREW LETTER FINAL MEM
|
||||||
|
מ מ \U05DE HEBREW LETTER MEM
|
||||||
|
ן ן \U05DF HEBREW LETTER FINAL NUN
|
||||||
|
נ נ \U05E0 HEBREW LETTER NUN
|
||||||
|
ס ס \U05E1 HEBREW LETTER SAMEKH
|
||||||
|
ע ע \U05E2 HEBREW LETTER AYIN
|
||||||
|
ף ף \U05E3 HEBREW LETTER FINAL PE
|
||||||
|
פ פ \U05E4 HEBREW LETTER PE
|
||||||
|
ץ ץ \U05E5 HEBREW LETTER FINAL TSADI
|
||||||
|
צ צ \U05E6 HEBREW LETTER TSADI
|
||||||
|
ק ק \U05E7 HEBREW LETTER QOF
|
||||||
|
ר ר \U05E8 HEBREW LETTER RESH
|
||||||
|
ת ת \U05EA HEBREW LETTER TAV
|
||||||
|
Latin Extended Additional
|
||||||
|
ḋ \U1E0B LATIN SMALL LETTER D WITH DOT ABOVE
|
||||||
|
ḍ \U1E0D LATIN SMALL LETTER D WITH DOT BELOW
|
||||||
|
ḗ \U1E17 LATIN SMALL LETTER E WITH MACRON AND ACUTE
|
||||||
|
Ḣ \U1E22 LATIN CAPITAL LETTER H WITH DOT ABOVE
|
||||||
|
Ḥ \U1E24 LATIN CAPITAL LETTER H WITH DOT BELOW
|
||||||
|
ḥ \U1E25 LATIN SMALL LETTER H WITH DOT BELOW
|
||||||
|
ḫ \U1E2B LATIN SMALL LETTER H WITH BREVE BELOW
|
||||||
|
ḳ \U1E33 LATIN SMALL LETTER K WITH DOT BELOW
|
||||||
|
ḷ \U1E37 LATIN SMALL LETTER L WITH DOT BELOW
|
||||||
|
ṁ \U1E41 LATIN SMALL LETTER M WITH DOT ABOVE
|
||||||
|
ṃ \U1E43 LATIN SMALL LETTER M WITH DOT BELOW
|
||||||
|
ṅ \U1E45 LATIN SMALL LETTER N WITH DOT ABOVE
|
||||||
|
ṇ \U1E47 LATIN SMALL LETTER N WITH DOT BELOW
|
||||||
|
ṓ \U1E53 LATIN SMALL LETTER O WITH MACRON AND ACUTE
|
||||||
|
ṙ \U1E59 LATIN SMALL LETTER R WITH DOT ABOVE
|
||||||
|
Ṛ \U1E5A LATIN CAPITAL LETTER R WITH DOT BELOW
|
||||||
|
ṛ \U1E5B LATIN SMALL LETTER R WITH DOT BELOW
|
||||||
|
ṡ \U1E61 LATIN SMALL LETTER S WITH DOT ABOVE
|
||||||
|
ṣ \U1E63 LATIN SMALL LETTER S WITH DOT BELOW
|
||||||
|
ṫ \U1E6B LATIN SMALL LETTER T WITH DOT ABOVE
|
||||||
|
ṭ \U1E6D LATIN SMALL LETTER T WITH DOT BELOW
|
||||||
|
ṯ \U1E6F LATIN SMALL LETTER T WITH LINE BELOW
|
||||||
|
ẑ \U1E91 LATIN SMALL LETTER Z WITH CIRCUMFLEX
|
||||||
|
ẓ \U1E93 LATIN SMALL LETTER Z WITH DOT BELOW
|
||||||
|
ẖ \U1E96 LATIN SMALL LETTER H WITH LINE BELOW
|
||||||
|
ạ \U1EA1 LATIN SMALL LETTER A WITH DOT BELOW
|
||||||
|
ọ \U1ECD LATIN SMALL LETTER O WITH DOT BELOW
|
||||||
|
ỹ \U1EF9 LATIN SMALL LETTER Y WITH TILDE
|
||||||
|
General Punctuation
|
||||||
|
- ‑ \U2011 NON-BREAKING HYPHEN
|
||||||
|
‸ \U2038 CARET
|
||||||
|
‽ \U203D INTERROBANG
|
||||||
|
⁂ \U2042 ASTERISM
|
||||||
|
Arrows
|
||||||
|
← ← \U2190 LEFTWARDS ARROW
|
||||||
|
→ → \U2192 RIGHTWARDS ARROW
|
||||||
|
Mathematical Operators
|
||||||
|
∂ ∂ \U2202 PARTIAL DIFFERENTIAL
|
||||||
|
√ √ \U221A SQUARE ROOT
|
||||||
|
∞ ∞ \U221E INFINITY
|
||||||
|
∥ ∥ \U2225 PARALLEL TO
|
||||||
|
∫ ∫ \U222B INTEGRAL
|
||||||
|
≠ ≠ \U2260 NOT EQUAL TO
|
||||||
|
⊔ \U2294 SQUARE CUP
|
||||||
|
⊕ \U2295 CIRCLED PLUS
|
||||||
|
⋮ \U22EE VERTICAL ELLIPSIS
|
||||||
|
Enclosed Alphanumerics
|
||||||
|
Ⓤ \U24CA CIRCLED LATIN CAPITAL LETTER U
|
||||||
|
Miscellaneous Symbols
|
||||||
|
☜ ☜ \U261C WHITE LEFT POINTING INDEX
|
||||||
|
☞ ☞ \U261E WHITE RIGHT POINTING INDEX
|
||||||
|
☿ \U263F MERCURY
|
||||||
|
♀ \U2640 FEMALE SIGN
|
||||||
|
♂ \U2642 MALE SIGN
|
||||||
|
♃ \U2643 JUPITER
|
||||||
|
♄ \U2644 SATURN
|
||||||
|
♅ \U2645 URANUS
|
||||||
|
♆ \U2646 NEPTUNE
|
||||||
|
♇ \U2647 PLUTO
|
||||||
|
♠ \U2660 BLACK SPADE SUIT
|
||||||
|
♡ \U2661 WHITE HEART SUIT
|
||||||
|
♢ \U2662 WHITE DIAMOND SUIT
|
||||||
|
♣ \U2663 BLACK CLUB SUIT
|
||||||
|
♭ \U266D MUSIC FLAT SIGN
|
||||||
|
♮ \U266E MUSIC NATURAL SIGN
|
||||||
|
♯ \U266F MUSIC SHARP SIGN
|
||||||
|
Dingbats
|
||||||
|
✓ \U2713 CHECK MARK
|
||||||
|
✠ \U2720 MALTESE CROSS
|
||||||
|
Private Use Area
|
||||||
|
- \UE000 LATIN SMALL LETTER A WITH MACRON AND ACUTE
|
||||||
|
- \UE001 LATIN SMALL LETTER A WITH MACRON AND TILDE
|
||||||
|
- \UE002 LATIN SMALL LETTER A WITH VERTICAL LINE ABOVE
|
||||||
|
- \UE003 LATIN CAPITAL LETTER C WITH MACRON
|
||||||
|
- \UE004 LATIN SMALL LETTER C WITH MACRON
|
||||||
|
- \UE005 LATIN SMALL LETTER C WITH BREVE
|
||||||
|
- \UE006 LATIN SMALL LETTER C WITH DOT BELOW
|
||||||
|
- \UE007 LATIN SMALL LIGATURE CH
|
||||||
|
- \UE008 LATIN CAPITAL LETTER D WITH MACRON
|
||||||
|
- \UE009 LATIN SMALL LETTER E WITH BAR BELOW
|
||||||
|
- \UE00A LATIN SMALL LETTER E WITH TILDE
|
||||||
|
- \UE00B LATIN SMALL LETTER E WITH MACRON AND BREVE
|
||||||
|
- \UE00C LATIN SMALL LETTER E WITH TILDE AND DOT ABOVE
|
||||||
|
- \UE00D LATIN SMALL LETTER E WITH HOOK RIGHT BELOW
|
||||||
|
- \UE00E LATIN SMALL LETTER G WITH INVERTED BREVE
|
||||||
|
- \UE00F LATIN SMALL LETTER I WITH INVERTED BREVE BELOW
|
||||||
|
- \UE010 LATIN SMALL LETTER I WITH MACRON AND ACUTE
|
||||||
|
- \UE011 LATIN SMALL LETTER K WITH CIRCUMFLEX
|
||||||
|
- \UE012 LATIN SMALL LETTER K WITH BREVE
|
||||||
|
- \UE013 LATIN SMALL LETTER K WITH INVERTED BREVE
|
||||||
|
- \UE014 LATIN SMALL LIGATURE KH
|
||||||
|
- \UE015 LATIN CAPITAL LETTER L WITH MACRON
|
||||||
|
- \UE016 LATIN SMALL LETTER L WITH TILDE
|
||||||
|
- \UE017 LATIN SMALL LETTER L WITH INVERTED BREVE
|
||||||
|
- \UE018 LATIN CAPITAL LETTER M WITH MACRON
|
||||||
|
- \UE019 LATIN SMALL LETTER M WITH MACRON
|
||||||
|
- \UE01A LATIN SMALL LETTER M WITH TILDE
|
||||||
|
- \UE01B LATIN SMALL LETTER O WITH CEDILLA
|
||||||
|
- \UE01C LATIN SMALL LETTER O WITH MACRON AND CIRUMFLEX
|
||||||
|
- \UE01E LATIN SMALL LIGATURE OI
|
||||||
|
- \UE01F LATIN SMALL LIGATURE OO
|
||||||
|
- \UE020 LATIN SMALL LIGATURE OO WITH MACRON
|
||||||
|
- \UE021 LATIN SMALL LIGATURE OU
|
||||||
|
- \UE022 LATIN SMALL LETTER OPEN O WITH ACUTE
|
||||||
|
- \UE023 LATIN SMALL LETTER R WITH DIARESIS
|
||||||
|
- \UE024 LATIN SMALL LETTER R WITH CIRCUMFLEX
|
||||||
|
- \UE025 LATIN SMALL LETTER R WITH RING BELOW
|
||||||
|
- \UE026 LATIN SMALL LETTER S WITH VERTICAL LINE ABOVE
|
||||||
|
- \UE027 LATIN SMALL LETTER S WITH OGONEK
|
||||||
|
- \UE028 LATIN SMALL LETTER S WITH COMMA
|
||||||
|
- \UE02A LATIN SMALL LETTER S WITH BREVE
|
||||||
|
- \UE02B LATIN SMALL LIGATURE SH
|
||||||
|
- \UE02C LATIN SMALL LIGATURE TH
|
||||||
|
- \UE02D LATIN SMALL LETTER U WITH MACRON AND ACUTE
|
||||||
|
- \UE02E LATIN CAPITAL LETTER V WITH MACRON
|
||||||
|
- \UE02F LATIN CAPITAL LETTER X WITH MACRON
|
||||||
|
- \UE030 LATIN SMALL LETTER X WITH CIRCUMFLEX
|
||||||
|
- \UE031 LATIN SMALL LETTER Y WITH BREVE
|
||||||
|
- \UE032 LATIN SMALL LIGATURE ZH
|
||||||
|
- \UE033 LATIN SMALL LETTER TURNED E WITH ACUTE
|
||||||
|
- \UE034 LATIN SMALL LETTER TURNED E WITH CIRCUMFLEX
|
||||||
|
- \UE035 GREEK SMALL LETTER ALPHA WITH GRAVE
|
||||||
|
- \UE036 MUSICAL SYMBOL SEGNO
|
||||||
|
- \UE037 MUSICAL SYMBOL FERMATA
|
||||||
|
- \UE038 MUSICAL SYMBOL CRESCENDO
|
||||||
|
- \UE039 MUSICAL SYMBOL DECRESCENDO
|
||||||
|
- \UE03A MUSICAL SYMBOL DOUBLE SHARP
|
||||||
|
- \UE03B MUSICAL SYMBOL BREVE
|
||||||
|
- \UE03C MUSICAL SYMBOL DOWN BOW
|
||||||
|
- \UE03D MUSICAL SYMBOL UP BOW
|
||||||
|
- \UE03E MUSICAL SYMBOL BREVE ALTERNATE
|
||||||
|
- \UE03F PRINTING SYMBOL DELE
|
||||||
|
- \UE040 PRINTING SYMBOL FRACTIONAL EM
|
||||||
|
- \UE041 INVERTED ASTERISM
|
||||||
|
- \UE042 LATIN SMALL LETTER SCHWA SUPERSCRIPT
|
||||||
|
- \UE043 LATIN SMALL LETTER TURNED Y
|
||||||
|
- \UE044 LATIN SMALL LIGATURE OE WITH MACRON
|
||||||
|
- \UE045 SQUARE ROOT WITH BAR
|
||||||
|
- \UE046 LATIN SMALL LETTER U WITH DOT ABOVE
|
||||||
|
- \UE047 LATIN SMALL LIGATURE UE
|
||||||
|
- \UE048 LATIN SMALL LIGATURE UE WITH MACRON
|
||||||
|
- \UE049 LATIN SMALL LETTER OPEN O WITH TILDE
|
||||||
|
- \UE04A LATIN SMALL LETTER T WITH CARON BELOW
|
||||||
|
- \UE04B LATIN SMALL LETTER SCRIPT A WITH TILDE
|
||||||
|
- \UE04C GREEK SMALL LETTER EPSILON WITH TILDE
|
||||||
|
- \UE04D LATIN SMALL LIGATURE OE WITH TILDE
|
||||||
|
- \UE04E MODIFIER LETTER DOUBLE VERTICAL LINE
|
||||||
|
- \UE04F DOUBLE HYPHEN
|
||||||
|
- \UE050 LATIN SMALL LETTER SCHWA WITH DOT ABOVE
|
||||||
|
- \UE051 LATIN SMALL LETTER SCHWA WITH MACRON
|
||||||
|
Alphabetic Presentation Forms
|
||||||
|
fl fl \UFB02 LATIN SMALL LIGATURE FL
|
||||||
|
שׁ שׁ \UFB2A HEBREW LETTER SINH WITH SHIN DOT
|
||||||
|
שׂ שׂ \UFB2B HEBREW LETTER SINH WITH SIN DOT
|
||||||
|
|
226
format_docs/pdb/ztxt.txt
Normal file
226
format_docs/pdb/ztxt.txt
Normal file
@ -0,0 +1,226 @@
|
|||||||
|
The zTXT Format
|
||||||
|
---------------
|
||||||
|
|
||||||
|
The zTXT format is relatively straightforward. The simplest zTXT contains a
|
||||||
|
Palm database header, followed by zTXT record #0, followed by the compressed
|
||||||
|
data. The compressed data can be in one of two formats: one long data stream,
|
||||||
|
or split into chunks for random access. If there are any bookmarks, they occupy
|
||||||
|
the record immediately after the compressed data. If there are any annotations,
|
||||||
|
the annotation index occupies the record immediately after the bookmarks with
|
||||||
|
each annotation in the index having a record immediately after the annotation
|
||||||
|
index. Here are diagrams of a simple zTXT and a full featured zTXT:
|
||||||
|
|
||||||
|
DB Header
|
||||||
|
0 Record 0
|
||||||
|
1
|
||||||
|
2
|
||||||
|
3
|
||||||
|
... Compressed Data
|
||||||
|
36
|
||||||
|
37
|
||||||
|
38
|
||||||
|
|
||||||
|
DB Header
|
||||||
|
0 Record 0
|
||||||
|
1
|
||||||
|
2
|
||||||
|
3
|
||||||
|
... Compressed Data
|
||||||
|
36
|
||||||
|
37
|
||||||
|
38
|
||||||
|
39 Bookmarks
|
||||||
|
40 Annotation Index
|
||||||
|
41 Annotation 1
|
||||||
|
42 Annotation 2
|
||||||
|
43 Annotation 3
|
||||||
|
|
||||||
|
|
||||||
|
Compression Modes
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
zTXT version 1.40 and later supports two modes of compression. Mode 1 is a
|
||||||
|
random access mode, and mode 2 consists of one long data stream. Both modes
|
||||||
|
work on 8K (the default record size) blocks of text.
|
||||||
|
|
||||||
|
Please note, however, that as of Weasel Reader version 1.60 the old style
|
||||||
|
(mode 2) zTXT format is no longer supported. makeztxt and libztxt still support
|
||||||
|
creating these documents for backwards compatibility, but you should not use
|
||||||
|
mode 2 if possible.
|
||||||
|
|
||||||
|
|
||||||
|
Mode 1
|
||||||
|
------
|
||||||
|
|
||||||
|
In mode one, 8K blocks of text are compressed into an equal number of blocks of
|
||||||
|
compressed data. Using the Z_FULL_FLUSH flush mode with zLib allows for random
|
||||||
|
access among the blocks of data. In order for this to function, the first block
|
||||||
|
must be decompressed first, and after that any block in the file may be
|
||||||
|
decompressed in any order. In mode 1, the blocks of compressed data will likely
|
||||||
|
not all have the same size.
|
||||||
|
|
||||||
|
|
||||||
|
Mode 2
|
||||||
|
------
|
||||||
|
|
||||||
|
In zTXT versions before 1.40, this was the only method of compression. This
|
||||||
|
mode involves compressing the entire input buffer into a single output buffer
|
||||||
|
and then splitting the resulting buffer into 8K segments. This mode requires
|
||||||
|
that all of the compressed data be decompressed in one pass. Since there are no
|
||||||
|
real 'blocks' of data, the resulting output can be of any blocksize, though
|
||||||
|
typically the default of 8K should be fine. The advantage to mode 2 is that it
|
||||||
|
will give about 10% - 15% more compression.
|
||||||
|
|
||||||
|
|
||||||
|
zTXT Record #0 Definition (version 1.44)
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
|
Record 0 provides all of the information about the zTXT contents. Be sure it is
|
||||||
|
correct, lest firey death rain down upon your program.
|
||||||
|
|
||||||
|
typedef struct zTXT_record0Type {
|
||||||
|
UInt16 version;
|
||||||
|
UInt16 numRecords;
|
||||||
|
UInt32 size;
|
||||||
|
UInt16 recordSize;
|
||||||
|
UInt16 numBookmarks;
|
||||||
|
UInt16 bookmarkRecord;
|
||||||
|
UInt16 numAnnotations;
|
||||||
|
UInt16 annotationRecord;
|
||||||
|
UInt8 flags;
|
||||||
|
UInt8 reserved;
|
||||||
|
UInt32 crc32;
|
||||||
|
UInt8 padding[0x20 - 24];
|
||||||
|
} zTXT_record0;
|
||||||
|
|
||||||
|
|
||||||
|
Structure Elements
|
||||||
|
------------------
|
||||||
|
|
||||||
|
UInt16 version;
|
||||||
|
|
||||||
|
This is mostly just informational. Your program can figure out what features
|
||||||
|
might be available from the version. However, the remaining parts of the
|
||||||
|
structure are designed such that their value will be 0 if that particular
|
||||||
|
feature is not present, so that is the correct way to test. The version is
|
||||||
|
stored as two 8 bit integers. For example, version 1.42 is 0x012A.
|
||||||
|
|
||||||
|
UInt16 numRecords;
|
||||||
|
|
||||||
|
This is the number of DATA records only and does not include record 0,
|
||||||
|
bookmarks, or annotations. With compression mode 1, this is also the number of
|
||||||
|
uncompressed text records. With mode 2, you must decompress the file to figure
|
||||||
|
out how many text records there will be.
|
||||||
|
|
||||||
|
UInt32 size;
|
||||||
|
|
||||||
|
The size in bytes of the uncompressed data in the zTXT. Check this value with
|
||||||
|
the amount of free storage memory on the Palm to make sure there's enough room
|
||||||
|
to decompress the data in full or in part.
|
||||||
|
|
||||||
|
UInt16 recordSize;
|
||||||
|
|
||||||
|
recordSize is the size in bytes of a text record. This field is important, as
|
||||||
|
the size of text and decompression buffers is based on this value. It is used
|
||||||
|
by Weasel to navigate though the text so it can map absolute offsets to record
|
||||||
|
numberss. 8192 is the default. With compression mode 1, this is the amount of
|
||||||
|
data inside each compressed record (except maybe the last one), but the actual
|
||||||
|
compressed records will likely have varying sizes. In mode 2, both compressed
|
||||||
|
records and the resulting text records are all of this size (except, again, the
|
||||||
|
last record).
|
||||||
|
|
||||||
|
UInt16 numBookmarks;
|
||||||
|
|
||||||
|
The definitive count of how many bookmarks are stored in the bookmark index
|
||||||
|
record. See the section on bookmarks below.
|
||||||
|
|
||||||
|
UInt16 bookmarkRecord;
|
||||||
|
|
||||||
|
If there are any bookmarks, this is set to the record index number that
|
||||||
|
contains the bookmark listing, otherwise it is 0.
|
||||||
|
|
||||||
|
UInt16 numAnnotations;
|
||||||
|
|
||||||
|
Like the bookmark count, this is the definitive count of how many annotations
|
||||||
|
are in the annotation index and how many annotation records follow it. See the
|
||||||
|
section on annotation below.
|
||||||
|
|
||||||
|
UInt16 annotationRecord;
|
||||||
|
|
||||||
|
If there are any annotations, this is set to the record index number that
|
||||||
|
contains the annotation index, otherwise it is 0.
|
||||||
|
|
||||||
|
UInt8 flags;
|
||||||
|
|
||||||
|
These flags indicate various features of the zTXT database. flags is a bitmask
|
||||||
|
and at present the only two defined bits are:
|
||||||
|
|
||||||
|
ZTXT_RANDOMACCESS (0x01)
|
||||||
|
If the zTXT was compressed according to the method in mode 1, then it
|
||||||
|
supports random access and this should be set.
|
||||||
|
ZTXT_NONUNIFORM (0x02)
|
||||||
|
Setting this bit indicates that the text records within the zTXT database
|
||||||
|
are not of uniform length. That is, when the blocks of text are
|
||||||
|
decompressed they will not have identical block sizes. If this is not set,
|
||||||
|
the compressed blocks are assumed to all have the same size when
|
||||||
|
decompressed (typically 8K) except for the last block which can be smaller.
|
||||||
|
|
||||||
|
UInt32 crc32;
|
||||||
|
|
||||||
|
A CRC32 value for checking data integrity. This value is computer over all text
|
||||||
|
data record only and does not include record 0 nor any bookmark/annotation
|
||||||
|
records. The current implementation in makeztxt/Weasel computes this value
|
||||||
|
using the crc32 function in zLib which should be the standard CRC32 definition.
|
||||||
|
|
||||||
|
UInt8 padding[0x20 - 24];
|
||||||
|
|
||||||
|
zTXT record zero is 32 bytes in length, so the unused portion is padded.
|
||||||
|
|
||||||
|
|
||||||
|
zTXT Bookmarks
|
||||||
|
--------------
|
||||||
|
|
||||||
|
zTXT bookmarks are stored in a simple array in a record at the end of a zTXT.
|
||||||
|
The format is as follows:
|
||||||
|
|
||||||
|
#define MAX_BMRK_LENGTH 20
|
||||||
|
|
||||||
|
typedef struct GPlmMarkType {
|
||||||
|
UInt32 offset;
|
||||||
|
Char title[MAX_BMRK_LENGTH];
|
||||||
|
} GPlmMark;
|
||||||
|
|
||||||
|
In the structure, offset is counted as an absolute offset into the text. The
|
||||||
|
bookmarks must be sorted in ascending order.
|
||||||
|
|
||||||
|
If there are no bookmarks, then the bookmark index does not exist. When the
|
||||||
|
user creates the first bookmark, the record containing the index will then be
|
||||||
|
created. If there are annotations, when the bookmark record is created it must
|
||||||
|
go before the annotation index. This will require incrementing annotationRecord
|
||||||
|
in record 0 to point to the new record index.
|
||||||
|
|
||||||
|
Similarly, when all bookmarks are deleted the bookmark index record is also
|
||||||
|
deleted. If there are annotations, annotationRecord in record 0 must be
|
||||||
|
decremented to point to the new index.
|
||||||
|
|
||||||
|
|
||||||
|
zTXT Annotations
|
||||||
|
----------------
|
||||||
|
|
||||||
|
zTXT annotations have a format almost identical to that of the bookmark index:
|
||||||
|
|
||||||
|
typedef struct GPlmAnnotationType {
|
||||||
|
UInt32 offset;
|
||||||
|
Char title[MAX_BMRK_LENGTH];
|
||||||
|
} GPlmAnnotation;
|
||||||
|
|
||||||
|
Like the bookmarks, offset is an absolute offset into the text. The annotation
|
||||||
|
index is organized just as the bookmarks are, as a single array in a record.
|
||||||
|
Note that this structure does NOT store the actual annotation text.
|
||||||
|
|
||||||
|
The text of each annotation is stored in its own record immediately following
|
||||||
|
the index. So, the first annotation in the index will occupy the first record
|
||||||
|
following the index, and the second annotation will be in the second record
|
||||||
|
following the index, and so on. The text of each annotation is limited to
|
||||||
|
4096 bytes.
|
||||||
|
|
303
format_docs/rb.txt
Normal file
303
format_docs/rb.txt
Normal file
@ -0,0 +1,303 @@
|
|||||||
|
Rocket eBook File Format
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
from http://rbmake.sourceforge.net/rb_format.html
|
||||||
|
|
||||||
|
|
||||||
|
Overview
|
||||||
|
--------
|
||||||
|
|
||||||
|
This document attempts to describe the format of a .rb file -- the book
|
||||||
|
format that is downloaded into NuvoMedia's <http://www.nuvomedia.com>
|
||||||
|
hand-held wonder, the Rocket eBook
|
||||||
|
<http://www.rocket-ebook.com/enter.html>.
|
||||||
|
|
||||||
|
*Note:* All multi-byte integers are stored in Vax/Intel order (the
|
||||||
|
opposite of network byte order). Most integers are 4 bytes (an int32),
|
||||||
|
but there are some minor exceptions (as detailed below).
|
||||||
|
|
||||||
|
Also, the following document refers to the .rb file sections as "pages".
|
||||||
|
|
||||||
|
|
||||||
|
Details
|
||||||
|
-------
|
||||||
|
|
||||||
|
The first 4 bytes of the file seem to be a magic number (in hex): B0 0C
|
||||||
|
B0 0C. I like to think of this as a hexidecimal pun on the word "book"
|
||||||
|
(repeated). [Matt Greenwood has reported seeing a magic number of "B0 0C
|
||||||
|
F0 0D" in another type of ReB-related file -- i.e. "book food".]
|
||||||
|
|
||||||
|
The next two bytes appear to be a version number, currently "02 00". I
|
||||||
|
assume this means major version 2, minor version 0.
|
||||||
|
|
||||||
|
The next 4 bytes are the string "NUVO", followed by 4 bytes of 00h. (I
|
||||||
|
have also seen an old title that had 0s in place of the "NUVO".)
|
||||||
|
|
||||||
|
This brings us up to offset 0Eh, at which point we have a 4-byte
|
||||||
|
representation of the date the book was created (Matt Greenwood pointed
|
||||||
|
this out to me -- thanks!). The year is encoded as an int16. On older
|
||||||
|
version of the RocketLibrary was encoding the year's full value (e.g.
|
||||||
|
1999 was "CF 07" and 2000 was "D0 07"), but a more recent version is now
|
||||||
|
using the tm_year value verbatim -- i.e. it's storing 100 for the year
|
||||||
|
2000 ("64 00"). The year is followed by an int8 for the 1-relative month
|
||||||
|
number, and an int8 for the day of the month.
|
||||||
|
|
||||||
|
After that is 6 bytes of 00h. These may be reserved for setting the time
|
||||||
|
of creation (at a guess).
|
||||||
|
|
||||||
|
Then, at offset 18h, we have an int32 that contains the absolute offset
|
||||||
|
of the "Table of Contents" (the directory of the pages contained within
|
||||||
|
this .rb file). In all of the .rb file's I've seen, this remains
|
||||||
|
constant with a value of 128h. However, I have tested an atypical .rb
|
||||||
|
file where I placed the ToC at the end of the file (after all the file
|
||||||
|
contents), and it worked fine. (I've chosen not to build any books in
|
||||||
|
such a non-standard format, however.)
|
||||||
|
|
||||||
|
Immediately following this is an int32 with the length of the .rb file
|
||||||
|
(so we can check if the file is complete or not).
|
||||||
|
|
||||||
|
All the bytes from here (offset 20h) up to offset 128h appear to only be
|
||||||
|
used by an encrypted title. In a non-encrypted title, they are always 0.
|
||||||
|
|
||||||
|
The table of contents typically comes next (at offset 128h). It starts
|
||||||
|
with an int32 count of the number of "page" entries (.rb-file sections)
|
||||||
|
in the ToC. Each entry consists of a name (zero-padded to 32 bytes),
|
||||||
|
followed by 3 int32s: the length of this entry's data segment, the
|
||||||
|
absolute offset of the data in the .rb file, and a flag. The known flag
|
||||||
|
values are: 1 (encrypted), 2 (info page), and 8 (deflated). The names
|
||||||
|
are tweaked as needed to ensure that they are all unique. The current
|
||||||
|
RocketWriter software uses a unique 6-digit number, a dash, up to 8
|
||||||
|
characters from the filename, and then the re-mapped suffix for the data
|
||||||
|
(.html, .hidx, .png, .info, etc.). My rbmake library simply ensures that
|
||||||
|
the names are no longer than 15 characters (not counting the suffix) and
|
||||||
|
are all unique.
|
||||||
|
|
||||||
|
Often the first item in the ToC is the info page, but it doesn't have to
|
||||||
|
be. This page of information contains NAME=VALUE pairs that note the
|
||||||
|
author, title, what the root-page's name is, etc. (See appendix A). This
|
||||||
|
data is never encrypted nor compressed, so this entry's flag value is
|
||||||
|
always "2".
|
||||||
|
|
||||||
|
An image page is always stored as a B&W image in PNG format. Since it
|
||||||
|
has its own compression, it is stored without any additional attempt at
|
||||||
|
deflation. I have also never seen an encrypted image, so its flag value
|
||||||
|
is always 0.
|
||||||
|
|
||||||
|
An HTML page contains the tags and text that were re-written into a
|
||||||
|
consistent syntax (this presumably makes the HTML renderer in the ReB
|
||||||
|
itself simpler). HTML pages are typically compressed (See appendix B).
|
||||||
|
Every HTML page appears to use the suffix .html no matter what the file
|
||||||
|
name was on import (but I have seen older files with .htm used as the
|
||||||
|
suffix, so the rocket appears to support both).
|
||||||
|
|
||||||
|
For every HTML page there is a corresponding .hidx page that contains a
|
||||||
|
summary of the paragraph formatting and the position of the anchor names
|
||||||
|
in the associated .html page (See appendix C). This page is sometimes
|
||||||
|
compressed, depending on length (See appendix B).
|
||||||
|
|
||||||
|
There are also reference titles that have a .hkey page that contains a
|
||||||
|
list of words that can be looked up in the associated .html page (See
|
||||||
|
appendix D).
|
||||||
|
|
||||||
|
Immediately following the ToC is the data for each piece mentioned in
|
||||||
|
the ToC, in the same order as it appeared in the ToC.
|
||||||
|
|
||||||
|
Finally, the end of the file appears to be padded with 20 bytes of 01h.
|
||||||
|
|
||||||
|
|
||||||
|
Appendix A: Info Page Format
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
The info page consists of a series of lines that contain "NAME=VALUE"
|
||||||
|
strings. Each line is terminated by a single newline. Here are the
|
||||||
|
values that the RocketWriter generates:
|
||||||
|
|
||||||
|
COMMENT=Info file for <title>
|
||||||
|
TYPE=2
|
||||||
|
TITLE=<title>
|
||||||
|
AUTHOR=<author>
|
||||||
|
URL=ebook:<long, unique string used for the file's name by the librarian>
|
||||||
|
GENERATOR=<e.g. RocketLibrarian 1.3.216>
|
||||||
|
PARSE=1
|
||||||
|
OUTPUT=1
|
||||||
|
BODY=<name of root HTML page (as it appears in the ToC)>
|
||||||
|
MENUMARK=menumark.html
|
||||||
|
SuggestedRetailPrice=<usually empty>
|
||||||
|
|
||||||
|
Encrypted titles have a few more entries (including those listed above):
|
||||||
|
|
||||||
|
ISBN=<ISBN number, including dashes>
|
||||||
|
REVISION=<digits>
|
||||||
|
TITLE_LANGUAGE=<en-us>
|
||||||
|
PUB_NAME=<Publisher's name>
|
||||||
|
PUBSERVER_ID=<digits>
|
||||||
|
GENERATOR=<e.g. RocketPress 1.3.121>
|
||||||
|
VERSION=<digits>
|
||||||
|
USERNAME=<rocket-ID>
|
||||||
|
COPY_ID=<digits>
|
||||||
|
COPYRIGHT=<copyright>
|
||||||
|
COPYTITLE=<another copyright?>
|
||||||
|
|
||||||
|
A reference title also has an indication that there is a .hkey page
|
||||||
|
present, and may also have a GENRE of "Reference":
|
||||||
|
|
||||||
|
HKEY=1
|
||||||
|
GENRE=Reference
|
||||||
|
|
||||||
|
|
||||||
|
Appendix B: The format of compressed data
|
||||||
|
-----------------------------------------
|
||||||
|
|
||||||
|
Compressed pages have a data section in the .rb file with the following
|
||||||
|
format:
|
||||||
|
|
||||||
|
The first int32 is a count of the number of 4096-byte chunks of data we
|
||||||
|
broke the uncompressed page into (the last chunk can be shorter than
|
||||||
|
4096 bytes, of course).
|
||||||
|
|
||||||
|
This is immediately followed by an int32 with the length of the entire
|
||||||
|
uncompressed data.
|
||||||
|
|
||||||
|
After this there are <count> int32s that indicate the size of each
|
||||||
|
chunk's compressed data.
|
||||||
|
|
||||||
|
Following these length int32s is the output from a deflation (the
|
||||||
|
algorithm used in gzip) for each 4096-byte chunk of the original data.
|
||||||
|
It appears that you must use a window-bit size of 13 and a compression
|
||||||
|
level of "best" to be compatible with the Rocket eBook's system software.
|
||||||
|
|
||||||
|
|
||||||
|
Appendix C: HTML-index Page Format
|
||||||
|
----------------------------------
|
||||||
|
|
||||||
|
The .hidx page's purpose is to allow the renderer to quickly look up the
|
||||||
|
format of each paragraph (useful for random access to the data), and the
|
||||||
|
position of the anchor names.
|
||||||
|
|
||||||
|
The first section lists the various paragraph-producing tags. It is
|
||||||
|
headed by a line of "[tags <count>]", where <count> is the number of
|
||||||
|
tags that follow this header. The tags are listed one per line, and have
|
||||||
|
an implied enumeration from 0 to N-1 (which the other tags and the
|
||||||
|
upcoming paragraph sections reference).
|
||||||
|
|
||||||
|
The first tag is typically (always?) "<HTML> -1". The number trailing
|
||||||
|
the tag indicates what other tag (or sequence of tags, one per line) in
|
||||||
|
which we are nested. So, if we have a <BR> nested inside a <P
|
||||||
|
ALIGN="center">, it would be listed separately from a <BR> that was
|
||||||
|
nested inside a normal paragraph, and each one would have a different
|
||||||
|
trailing index number.
|
||||||
|
|
||||||
|
Following the tag section is the paragraph section. The heading is
|
||||||
|
"[paragraphs <count>]", and is followed by a line for each paragraph.
|
||||||
|
These lines consist of a character offset into the .html page for the
|
||||||
|
start of the paragraph followed by a 0-relative offset into the tag
|
||||||
|
section (indicating what kind of formatting to use for the indicated
|
||||||
|
paragraph).
|
||||||
|
|
||||||
|
The paragraph-section character offsets point to the first bit of text
|
||||||
|
after the associated tag.
|
||||||
|
|
||||||
|
The last section details the anchor names. The heading is
|
||||||
|
"[names <count>]", and each item that follows is a quoted string of the
|
||||||
|
anchor name, followed by a character offset into the .html page where
|
||||||
|
we'll find that name. If there are no names in the associated HTML
|
||||||
|
section, the heading is included with a 0 count (i.e. "[names 0]").
|
||||||
|
|
||||||
|
The name-section character offsets point to the start of the anchor tag
|
||||||
|
(not after the tag, like the offsets in the "paragraphs" section).
|
||||||
|
|
||||||
|
The lines are terminated by newlines (in standard unix fashion).
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
[tags 10]
|
||||||
|
<HTML> -1
|
||||||
|
<BODY> 0
|
||||||
|
<P ALIGN="right"> 1
|
||||||
|
<P ALIGN="left"> 1
|
||||||
|
<P> 1
|
||||||
|
<H3 ALIGN="center"> 1
|
||||||
|
<P ALIGN="center"> 1
|
||||||
|
<BR> 6
|
||||||
|
<H2 ALIGN="center"> 1
|
||||||
|
<BR> 1
|
||||||
|
|
||||||
|
[paragraphs 42]
|
||||||
|
160 9
|
||||||
|
164 9
|
||||||
|
184 8
|
||||||
|
220 8
|
||||||
|
261 6
|
||||||
|
316 5
|
||||||
|
359 1
|
||||||
|
379 6
|
||||||
|
410 6
|
||||||
|
460 7
|
||||||
|
511 7
|
||||||
|
564 7
|
||||||
|
616 7
|
||||||
|
668 7
|
||||||
|
720 7
|
||||||
|
773 7
|
||||||
|
827 7
|
||||||
|
880 7
|
||||||
|
933 7
|
||||||
|
988 7
|
||||||
|
1043 7
|
||||||
|
1100 7
|
||||||
|
1157 7
|
||||||
|
1214 7
|
||||||
|
1270 7
|
||||||
|
1328 7
|
||||||
|
1385 7
|
||||||
|
1442 7
|
||||||
|
1497 7
|
||||||
|
1556 7
|
||||||
|
1561 7
|
||||||
|
1635 1
|
||||||
|
1656 5
|
||||||
|
1690 6
|
||||||
|
1737 7
|
||||||
|
1773 5
|
||||||
|
1798 4
|
||||||
|
1826 3
|
||||||
|
2663 1
|
||||||
|
2668 4
|
||||||
|
2689 2
|
||||||
|
2730 8
|
||||||
|
|
||||||
|
[names 1]
|
||||||
|
"ch1" 2689
|
||||||
|
|
||||||
|
|
||||||
|
Appendix D: HTML-key Page Format
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
The .hkey page contains a list of words, one per line, sorted in a
|
||||||
|
strict ASCII sequence, each one followed by a tab and the offset in the
|
||||||
|
.html page of the word's data. I presume that the .hkey page must share
|
||||||
|
the same name prefix as its related .html page.
|
||||||
|
|
||||||
|
If the names contain high-bit characters, they are translated into
|
||||||
|
regular ASCII in the .hkey file, since this allows the user to search
|
||||||
|
for the words using unaccented characters.
|
||||||
|
|
||||||
|
The lines are terminated with a newline (in standard unix fashion).
|
||||||
|
|
||||||
|
An example:
|
||||||
|
|
||||||
|
a 5
|
||||||
|
apple 38
|
||||||
|
b 84
|
||||||
|
book 104
|
||||||
|
|
||||||
|
Each of these offsets points to a paragraph tag in the associated .html
|
||||||
|
page. I have only seen this sequence of tags used so far:
|
||||||
|
|
||||||
|
<P><BIG><B>word</B></BIG> other stuff</P>
|
||||||
|
|
||||||
|
I have seen multiple <B>...</B> tags in the middle of the single set of
|
||||||
|
<BIG>...</BIG> tags, but this is the basic tag format.
|
||||||
|
|
||||||
|
The offset in the .hkey page points to the start of the <P> tag.
|
||||||
|
|
56
format_docs/tcr.txt
Normal file
56
format_docs/tcr.txt
Normal file
@ -0,0 +1,56 @@
|
|||||||
|
About
|
||||||
|
-----
|
||||||
|
|
||||||
|
Text compression format that can be decompressed starting at any point.
|
||||||
|
Little-endian byte ordering is used.
|
||||||
|
|
||||||
|
|
||||||
|
Header
|
||||||
|
------
|
||||||
|
|
||||||
|
TCR files always start with:
|
||||||
|
|
||||||
|
!!8-Bit!!
|
||||||
|
|
||||||
|
|
||||||
|
Layout
|
||||||
|
------
|
||||||
|
|
||||||
|
Header
|
||||||
|
256 key dictionary
|
||||||
|
compressed text
|
||||||
|
|
||||||
|
|
||||||
|
Dictionary
|
||||||
|
----------
|
||||||
|
|
||||||
|
A dictionary of key and replacement string. There are a total of 256 keys,
|
||||||
|
0 - 255. Each string is preceded with one byte that represents the length of
|
||||||
|
the string.
|
||||||
|
|
||||||
|
|
||||||
|
Compressed text
|
||||||
|
---------------
|
||||||
|
|
||||||
|
The compressed text is a series of values 0-255 which correspond to a key and
|
||||||
|
thus a string. Reassembling is replacing each key in the compressed text with
|
||||||
|
its corresponding string.
|
||||||
|
|
||||||
|
|
||||||
|
Compressor
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
From Andrew Giddings TCR.c (http://www.cix.co.uk/~gidds/Software/TCR.html):
|
||||||
|
|
||||||
|
The TCR compression format is easy to describe: after the fixed header is a
|
||||||
|
dictionary of 256 strings, each preceded by a length byte. The rest of the
|
||||||
|
file is a list of codes from this dictionary.
|
||||||
|
|
||||||
|
The compressor works by starting with each code defined as itself. While
|
||||||
|
there's an unused code, it finds the most common two-code combination, and
|
||||||
|
creates a new code for it, replacing all occurrences in the text with the
|
||||||
|
new code.
|
||||||
|
|
||||||
|
It also searches for codes that are always followed by another, which it can
|
||||||
|
merge, possibly freeing up some.
|
||||||
|
|
@ -52,6 +52,17 @@ p.formats {
|
|||||||
text-indent: 0.0in;
|
text-indent: 0.0in;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Minimize widows and orphans by logically grouping chunks
|
||||||
|
* Some reports of problems with Sony (ADE) ereaders
|
||||||
|
* ADE: page-break-inside:avoid;
|
||||||
|
* iBooks: display:inline-block;
|
||||||
|
* width:100%;
|
||||||
|
*/
|
||||||
|
div.author_logical_group {
|
||||||
|
page-break-inside:avoid;
|
||||||
|
}
|
||||||
|
|
||||||
div.description > p:first-child {
|
div.description > p:first-child {
|
||||||
margin: 0 0 0 0;
|
margin: 0 0 0 0;
|
||||||
text-indent: 0em;
|
text-indent: 0em;
|
||||||
@ -62,27 +73,19 @@ div.description {
|
|||||||
text-indent: 1em;
|
text-indent: 1em;
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
div.initial_letter {
|
||||||
* Attempt to minimize widows and orphans by logically grouping chunks
|
page-break-before:always;
|
||||||
* Recommend enabling for iPad
|
|
||||||
* Some reports of problems with Sony ereaders, presumably ADE engines
|
|
||||||
*/
|
|
||||||
/*
|
|
||||||
div.logical_group {
|
|
||||||
display:inline-block;
|
|
||||||
width:100%;
|
|
||||||
}
|
}
|
||||||
*/
|
|
||||||
|
|
||||||
p.date_index {
|
p.author_title_letter_index {
|
||||||
font-size:x-large;
|
font-size:x-large;
|
||||||
text-align:center;
|
text-align:center;
|
||||||
font-weight:bold;
|
font-weight:bold;
|
||||||
margin-top:1em;
|
margin-top:0px;
|
||||||
margin-bottom:0px;
|
margin-bottom:0px;
|
||||||
}
|
}
|
||||||
|
|
||||||
p.letter_index {
|
p.date_index {
|
||||||
font-size:x-large;
|
font-size:x-large;
|
||||||
text-align:center;
|
text-align:center;
|
||||||
font-weight:bold;
|
font-weight:bold;
|
||||||
@ -99,6 +102,14 @@ p.series {
|
|||||||
text-indent:-2em;
|
text-indent:-2em;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
p.series_letter_index {
|
||||||
|
font-size:x-large;
|
||||||
|
text-align:center;
|
||||||
|
font-weight:bold;
|
||||||
|
margin-top:1em;
|
||||||
|
margin-bottom:0px;
|
||||||
|
}
|
||||||
|
|
||||||
p.read_book {
|
p.read_book {
|
||||||
text-align:left;
|
text-align:left;
|
||||||
margin-top:0px;
|
margin-top:0px;
|
||||||
|
@ -13,15 +13,12 @@ class MSNSankeiNewsProduct(BasicNewsRecipe):
|
|||||||
description = 'Products release from Japan'
|
description = 'Products release from Japan'
|
||||||
oldest_article = 7
|
oldest_article = 7
|
||||||
max_articles_per_feed = 100
|
max_articles_per_feed = 100
|
||||||
encoding = 'Shift_JIS'
|
encoding = 'utf-8'
|
||||||
language = 'ja'
|
language = 'ja'
|
||||||
cover_url = 'http://sankei.jp.msn.com/images/common/sankeShinbunLogo.jpg'
|
cover_url = 'http://sankei.jp.msn.com/images/common/sankeShinbunLogo.jpg'
|
||||||
masthead_url = 'http://sankei.jp.msn.com/images/common/sankeiNewsLogo.gif'
|
masthead_url = 'http://sankei.jp.msn.com/images/common/sankeiNewsLogo.gif'
|
||||||
|
|
||||||
feeds = [(u'\u65b0\u5546\u54c1', u'http://sankei.jp.msn.com/rss/news/release.xml')]
|
feeds = [(u'\u65b0\u5546\u54c1', u'http://sankei.jp.msn.com/rss/news/release.xml')]
|
||||||
|
|
||||||
remove_tags_before = dict(id="__r_article_title__")
|
remove_tags_before = dict(id="NewsTitle")
|
||||||
remove_tags_after = dict(id="ajax_release_news")
|
remove_tags_after = dict(id="RelatedTitle")
|
||||||
remove_tags = [{'class':"parent chromeCustom6G"},
|
|
||||||
dict(id="RelatedImg")
|
|
||||||
]
|
|
||||||
|
@ -1,7 +1,5 @@
|
|||||||
#!/usr/bin/env python
|
|
||||||
|
|
||||||
__license__ = 'GPL v3'
|
__license__ = 'GPL v3'
|
||||||
__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>'
|
__copyright__ = '2009-2011, Darko Miletic <darko.miletic at gmail.com>'
|
||||||
|
|
||||||
'''
|
'''
|
||||||
theonion.com
|
theonion.com
|
||||||
@ -12,35 +10,73 @@ from calibre.web.feeds.news import BasicNewsRecipe
|
|||||||
class TheOnion(BasicNewsRecipe):
|
class TheOnion(BasicNewsRecipe):
|
||||||
title = 'The Onion'
|
title = 'The Onion'
|
||||||
__author__ = 'Darko Miletic'
|
__author__ = 'Darko Miletic'
|
||||||
description = "America's finest news source"
|
description = "America's finest news source"
|
||||||
oldest_article = 2
|
oldest_article = 2
|
||||||
max_articles_per_feed = 100
|
max_articles_per_feed = 100
|
||||||
publisher = u'Onion, Inc.'
|
publisher = 'Onion, Inc.'
|
||||||
category = u'humor, news, USA'
|
category = 'humor, news, USA'
|
||||||
language = 'en'
|
language = 'en'
|
||||||
|
|
||||||
no_stylesheets = True
|
no_stylesheets = True
|
||||||
use_embedded_content = False
|
use_embedded_content = False
|
||||||
encoding = 'utf-8'
|
encoding = 'utf-8'
|
||||||
remove_javascript = True
|
publication_type = 'newsportal'
|
||||||
html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'
|
masthead_url = 'http://o.onionstatic.com/img/headers/onion_190.png'
|
||||||
|
extra_css = """
|
||||||
html2lrf_options = [
|
body{font-family: Helvetica,Arial,sans-serif}
|
||||||
'--comment' , description
|
.section_title{color: gray; text-transform: uppercase}
|
||||||
, '--category' , category
|
.title{font-family: Georgia,serif}
|
||||||
, '--publisher' , publisher
|
.meta{color: gray; display: inline}
|
||||||
]
|
.has_caption{display: block}
|
||||||
|
.caption{font-size: x-small; color: gray; margin-bottom: 0.8em}
|
||||||
|
"""
|
||||||
|
|
||||||
keep_only_tags = [dict(name='div', attrs={'id':'main'})]
|
conversion_options = {
|
||||||
|
'comment' : description
|
||||||
|
, 'tags' : category
|
||||||
|
, 'publisher': publisher
|
||||||
|
, 'language' : language
|
||||||
|
}
|
||||||
|
|
||||||
|
keep_only_tags = [
|
||||||
|
dict(name='h2', attrs={'class':['section_title','title']})
|
||||||
|
,dict(attrs={'class':['main_image','meta','article_photo_lead','article_body']})
|
||||||
|
,dict(attrs={'id':['entries']})
|
||||||
|
]
|
||||||
|
remove_attributes=['lang','rel']
|
||||||
|
remove_tags_after = dict(attrs={'class':['article_body','feature_content']})
|
||||||
remove_tags = [
|
remove_tags = [
|
||||||
dict(name=['object','link','iframe','base'])
|
dict(name=['object','link','iframe','base','meta'])
|
||||||
,dict(name='div', attrs={'class':['toolbar_side','graphical_feature','toolbar_bottom']})
|
,dict(name='div', attrs={'class':['toolbar_side','graphical_feature','toolbar_bottom']})
|
||||||
,dict(name='div', attrs={'id':['recent_slider','sidebar','pagination','related_media']})
|
,dict(name='div', attrs={'id':['recent_slider','sidebar','pagination','related_media']})
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
feeds = [
|
feeds = [
|
||||||
(u'Daily' , u'http://feeds.theonion.com/theonion/daily' )
|
(u'Daily' , u'http://feeds.theonion.com/theonion/daily' )
|
||||||
,(u'Sports' , u'http://feeds.theonion.com/theonion/sports' )
|
,(u'Sports' , u'http://feeds.theonion.com/theonion/sports' )
|
||||||
]
|
]
|
||||||
|
|
||||||
|
def get_article_url(self, article):
|
||||||
|
artl = BasicNewsRecipe.get_article_url(self, article)
|
||||||
|
if artl.startswith('http://www.theonion.com/audio/'):
|
||||||
|
artl = None
|
||||||
|
return artl
|
||||||
|
|
||||||
|
def preprocess_html(self, soup):
|
||||||
|
for item in soup.findAll(style=True):
|
||||||
|
del item['style']
|
||||||
|
for item in soup.findAll('a'):
|
||||||
|
limg = item.find('img')
|
||||||
|
if item.string is not None:
|
||||||
|
str = item.string
|
||||||
|
item.replaceWith(str)
|
||||||
|
else:
|
||||||
|
if limg:
|
||||||
|
item.name = 'div'
|
||||||
|
item.attrs = []
|
||||||
|
if not limg.has_key('alt'):
|
||||||
|
limg['alt'] = 'image'
|
||||||
|
else:
|
||||||
|
str = self.tag_to_string(item)
|
||||||
|
item.replaceWith(str)
|
||||||
|
return soup
|
||||||
|
@ -89,21 +89,21 @@ class NOOK_COLOR(NOOK):
|
|||||||
BCD = [0x216]
|
BCD = [0x216]
|
||||||
WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = 'EBOOK_DISK'
|
WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = 'EBOOK_DISK'
|
||||||
|
|
||||||
EBOOK_DIR_MAIN = 'My Files/Books'
|
EBOOK_DIR_MAIN = 'My Files'
|
||||||
|
|
||||||
'''
|
|
||||||
def create_upload_path(self, path, mdata, fname, create_dirs=True):
|
def create_upload_path(self, path, mdata, fname, create_dirs=True):
|
||||||
filepath = NOOK.create_upload_path(self, path, mdata, fname,
|
filepath = NOOK.create_upload_path(self, path, mdata, fname,
|
||||||
create_dirs=create_dirs)
|
create_dirs=False)
|
||||||
edm = self.EBOOK_DIR_MAIN.replace('/', os.sep)
|
edm = self.EBOOK_DIR_MAIN
|
||||||
npath = os.path.join(edm, _('News')) + os.sep
|
subdir = 'Books'
|
||||||
if npath in filepath:
|
if mdata.tags:
|
||||||
filepath = filepath.replace(npath, os.sep.join('My Files',
|
if _('News') in mdata.tags:
|
||||||
'Magazines')+os.sep)
|
subdir = 'Magazines'
|
||||||
filedir = os.path.dirname(filepath)
|
filepath = filepath.replace(os.sep+edm+os.sep,
|
||||||
if create_dirs and not os.path.exists(filedir):
|
os.sep+edm+os.sep+subdir+os.sep)
|
||||||
os.makedirs(filedir)
|
filedir = os.path.dirname(filepath)
|
||||||
|
if create_dirs and not os.path.exists(filedir):
|
||||||
|
os.makedirs(filedir)
|
||||||
|
|
||||||
return filepath
|
return filepath
|
||||||
'''
|
|
||||||
|
|
||||||
|
@ -71,19 +71,28 @@ class FB2MLizer(object):
|
|||||||
return u'<?xml version="1.0" encoding="UTF-8"?>' + output
|
return u'<?xml version="1.0" encoding="UTF-8"?>' + output
|
||||||
|
|
||||||
def clean_text(self, text):
|
def clean_text(self, text):
|
||||||
|
# Condense empty paragraphs into a line break.
|
||||||
|
text = re.sub(r'(?miu)(<p>\s*</p>\s*){3,}', '<p><empty-line /></p>', text)
|
||||||
|
# Remove empty paragraphs.
|
||||||
text = re.sub(r'(?miu)<p>\s*</p>', '', text)
|
text = re.sub(r'(?miu)<p>\s*</p>', '', text)
|
||||||
|
# Clean up pargraph endings.
|
||||||
text = re.sub(r'(?miu)\s*</p>', '</p>', text)
|
text = re.sub(r'(?miu)\s*</p>', '</p>', text)
|
||||||
|
# Put paragraphs following a paragraph on a separate line.
|
||||||
text = re.sub(r'(?miu)</p>\s*<p>', '</p>\n\n<p>', text)
|
text = re.sub(r'(?miu)</p>\s*<p>', '</p>\n\n<p>', text)
|
||||||
|
|
||||||
|
# Remove empty title elements.
|
||||||
text = re.sub(r'(?miu)<title>\s*</title>', '', text)
|
text = re.sub(r'(?miu)<title>\s*</title>', '', text)
|
||||||
text = re.sub(r'(?miu)\s+</title>', '</title>', text)
|
text = re.sub(r'(?miu)\s+</title>', '</title>', text)
|
||||||
|
|
||||||
|
# Remove empty sections.
|
||||||
text = re.sub(r'(?miu)<section>\s*</section>', '', text)
|
text = re.sub(r'(?miu)<section>\s*</section>', '', text)
|
||||||
|
# Clean up sections start and ends.
|
||||||
text = re.sub(r'(?miu)\s*</section>', '\n</section>', text)
|
text = re.sub(r'(?miu)\s*</section>', '\n</section>', text)
|
||||||
text = re.sub(r'(?miu)</section>\s*', '</section>\n\n', text)
|
text = re.sub(r'(?miu)</section>\s*', '</section>\n\n', text)
|
||||||
text = re.sub(r'(?miu)\s*<section>', '\n<section>', text)
|
text = re.sub(r'(?miu)\s*<section>', '\n<section>', text)
|
||||||
text = re.sub(r'(?miu)<section>\s*', '<section>\n', text)
|
text = re.sub(r'(?miu)<section>\s*', '<section>\n', text)
|
||||||
text = re.sub(r'(?miu)</section><section>', '</section>\n\n<section>', text)
|
# Put sectnions followed by sections on a separate line.
|
||||||
|
text = re.sub(r'(?miu)</section>\s*<section>', '</section>\n\n<section>', text)
|
||||||
|
|
||||||
if self.opts.insert_blank_line:
|
if self.opts.insert_blank_line:
|
||||||
text = re.sub(r'(?miu)</p>', '</p><empty-line />', text)
|
text = re.sub(r'(?miu)</p>', '</p><empty-line />', text)
|
||||||
@ -338,6 +347,11 @@ class FB2MLizer(object):
|
|||||||
tags = []
|
tags = []
|
||||||
# First tag in tree
|
# First tag in tree
|
||||||
tag = barename(elem_tree.tag)
|
tag = barename(elem_tree.tag)
|
||||||
|
# Number of blank lines above tag
|
||||||
|
try:
|
||||||
|
ems = int(round((float(style.marginTop) / style.fontSize) - 1))
|
||||||
|
except:
|
||||||
|
ems = 0
|
||||||
|
|
||||||
# Convert TOC entries to <title>s and add <section>s
|
# Convert TOC entries to <title>s and add <section>s
|
||||||
if self.opts.sectionize == 'toc':
|
if self.opts.sectionize == 'toc':
|
||||||
@ -370,7 +384,9 @@ class FB2MLizer(object):
|
|||||||
fb2_out.append('<section>')
|
fb2_out.append('<section>')
|
||||||
self.section_level += 1
|
self.section_level += 1
|
||||||
|
|
||||||
# Process the XHTML tag if it needs to be converted to an FB2 tag.
|
# Process the XHTML tag and styles. Converted to an FB2 tag.
|
||||||
|
# Use individual if statement not if else. There can be
|
||||||
|
# only one XHTML tag but it can have multiple styles.
|
||||||
if tag == 'img':
|
if tag == 'img':
|
||||||
if elem_tree.attrib.get('src', None):
|
if elem_tree.attrib.get('src', None):
|
||||||
# Only write the image tag if it is in the manifest.
|
# Only write the image tag if it is in the manifest.
|
||||||
@ -381,7 +397,11 @@ class FB2MLizer(object):
|
|||||||
fb2_out += p_txt
|
fb2_out += p_txt
|
||||||
tags += p_tag
|
tags += p_tag
|
||||||
fb2_out.append('<image xlink:href="#%s" />' % self.image_hrefs[page.abshref(elem_tree.attrib['src'])])
|
fb2_out.append('<image xlink:href="#%s" />' % self.image_hrefs[page.abshref(elem_tree.attrib['src'])])
|
||||||
elif tag == 'br':
|
if tag in ('br', 'hr') or ems:
|
||||||
|
if not ems:
|
||||||
|
multiplier = 1
|
||||||
|
else:
|
||||||
|
multiplier = ems
|
||||||
if self.in_p:
|
if self.in_p:
|
||||||
closed_tags = []
|
closed_tags = []
|
||||||
open_tags = tag_stack+tags
|
open_tags = tag_stack+tags
|
||||||
@ -391,52 +411,38 @@ class FB2MLizer(object):
|
|||||||
closed_tags.append(t)
|
closed_tags.append(t)
|
||||||
if t == 'p':
|
if t == 'p':
|
||||||
break
|
break
|
||||||
fb2_out.append('<empty-line />')
|
fb2_out.append('<empty-line />' * multiplier)
|
||||||
closed_tags.reverse()
|
closed_tags.reverse()
|
||||||
for t in closed_tags:
|
for t in closed_tags:
|
||||||
fb2_out.append('<%s>' % t)
|
fb2_out.append('<%s>' % t)
|
||||||
else:
|
else:
|
||||||
fb2_out.append('<empty-line />')
|
fb2_out.append('<empty-line />' * multiplier)
|
||||||
elif tag in ('div', 'li', 'p'):
|
if tag in ('div', 'li', 'p'):
|
||||||
p_text, added_p = self.close_open_p(tag_stack+tags)
|
p_text, added_p = self.close_open_p(tag_stack+tags)
|
||||||
fb2_out += p_text
|
fb2_out += p_text
|
||||||
if added_p:
|
if added_p:
|
||||||
tags.append('p')
|
tags.append('p')
|
||||||
elif tag == 'b':
|
if tag == 'b' or style['font-weight'] in ('bold', 'bolder'):
|
||||||
s_out, s_tags = self.handle_simple_tag('strong', tag_stack+tags)
|
s_out, s_tags = self.handle_simple_tag('strong', tag_stack+tags)
|
||||||
fb2_out += s_out
|
fb2_out += s_out
|
||||||
tags += s_tags
|
tags += s_tags
|
||||||
elif tag == 'i':
|
if tag == 'i' or style['font-style'] == 'italic':
|
||||||
s_out, s_tags = self.handle_simple_tag('emphasis', tag_stack+tags)
|
s_out, s_tags = self.handle_simple_tag('emphasis', tag_stack+tags)
|
||||||
fb2_out += s_out
|
fb2_out += s_out
|
||||||
tags += s_tags
|
tags += s_tags
|
||||||
elif tag in ('del', 'strike'):
|
if tag in ('del', 'strike') or style['text-decoration'] == 'line-through':
|
||||||
s_out, s_tags = self.handle_simple_tag('strikethrough', tag_stack+tags)
|
s_out, s_tags = self.handle_simple_tag('strikethrough', tag_stack+tags)
|
||||||
fb2_out += s_out
|
fb2_out += s_out
|
||||||
tags += s_tags
|
tags += s_tags
|
||||||
elif tag == 'sub':
|
if tag == 'sub':
|
||||||
s_out, s_tags = self.handle_simple_tag('sub', tag_stack+tags)
|
s_out, s_tags = self.handle_simple_tag('sub', tag_stack+tags)
|
||||||
fb2_out += s_out
|
fb2_out += s_out
|
||||||
tags += s_tags
|
tags += s_tags
|
||||||
elif tag == 'sup':
|
if tag == 'sup':
|
||||||
s_out, s_tags = self.handle_simple_tag('sup', tag_stack+tags)
|
s_out, s_tags = self.handle_simple_tag('sup', tag_stack+tags)
|
||||||
fb2_out += s_out
|
fb2_out += s_out
|
||||||
tags += s_tags
|
tags += s_tags
|
||||||
|
|
||||||
# Processes style information.
|
|
||||||
if style['font-style'] == 'italic':
|
|
||||||
s_out, s_tags = self.handle_simple_tag('emphasis', tag_stack+tags)
|
|
||||||
fb2_out += s_out
|
|
||||||
tags += s_tags
|
|
||||||
elif style['font-weight'] in ('bold', 'bolder'):
|
|
||||||
s_out, s_tags = self.handle_simple_tag('strong', tag_stack+tags)
|
|
||||||
fb2_out += s_out
|
|
||||||
tags += s_tags
|
|
||||||
elif style['text-decoration'] == 'line-through':
|
|
||||||
s_out, s_tags = self.handle_simple_tag('strikethrough', tag_stack+tags)
|
|
||||||
fb2_out += s_out
|
|
||||||
tags += s_tags
|
|
||||||
|
|
||||||
# Process element text.
|
# Process element text.
|
||||||
if hasattr(elem_tree, 'text') and elem_tree.text:
|
if hasattr(elem_tree, 'text') and elem_tree.text:
|
||||||
if not self.in_p:
|
if not self.in_p:
|
||||||
|
@ -633,7 +633,7 @@ class Style(object):
|
|||||||
def lineHeight(self):
|
def lineHeight(self):
|
||||||
if self._lineHeight is None:
|
if self._lineHeight is None:
|
||||||
result = None
|
result = None
|
||||||
parent = self._getparent()
|
parent = self._get_parent()
|
||||||
if 'line-height' in self._style:
|
if 'line-height' in self._style:
|
||||||
lineh = self._style['line-height']
|
lineh = self._style['line-height']
|
||||||
if lineh == 'normal':
|
if lineh == 'normal':
|
||||||
|
@ -67,10 +67,11 @@ class TXTMLizer(object):
|
|||||||
output.append(self.get_toc())
|
output.append(self.get_toc())
|
||||||
for item in self.oeb_book.spine:
|
for item in self.oeb_book.spine:
|
||||||
self.log.debug('Converting %s to TXT...' % item.href)
|
self.log.debug('Converting %s to TXT...' % item.href)
|
||||||
stylizer = Stylizer(item.data, item.href, self.oeb_book, self.opts, self.opts.output_profile)
|
content = unicode(etree.tostring(item.data, encoding=unicode))
|
||||||
content = unicode(etree.tostring(item.data.find(XHTML('body')), encoding=unicode))
|
|
||||||
content = self.remove_newlines(content)
|
content = self.remove_newlines(content)
|
||||||
output += self.dump_text(etree.fromstring(content), stylizer, item)
|
content = etree.fromstring(content)
|
||||||
|
stylizer = Stylizer(content, item.href, self.oeb_book, self.opts, self.opts.output_profile)
|
||||||
|
output += self.dump_text(content.find(XHTML('body')), stylizer, item)
|
||||||
output += '\n\n\n\n\n\n'
|
output += '\n\n\n\n\n\n'
|
||||||
output = u''.join(output)
|
output = u''.join(output)
|
||||||
output = u'\n'.join(l.rstrip() for l in output.splitlines())
|
output = u'\n'.join(l.rstrip() for l in output.splitlines())
|
||||||
@ -219,11 +220,16 @@ class TXTMLizer(object):
|
|||||||
if tag in SPACE_TAGS:
|
if tag in SPACE_TAGS:
|
||||||
text.append(u' ')
|
text.append(u' ')
|
||||||
|
|
||||||
# Scene breaks.
|
# Hard scene breaks.
|
||||||
if tag == 'hr':
|
if tag == 'hr':
|
||||||
text.append('\n\n* * *\n\n')
|
text.append('\n\n* * *\n\n')
|
||||||
elif style['margin-top']:
|
# Soft scene breaks.
|
||||||
text.append('\n\n' + '\n' * round(style['margin-top']))
|
try:
|
||||||
|
ems = int(round((float(style.marginTop) / style.fontSize) - 1))
|
||||||
|
if ems:
|
||||||
|
text.append('\n' * ems)
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
# Process tags that contain text.
|
# Process tags that contain text.
|
||||||
if hasattr(elem, 'text') and elem.text:
|
if hasattr(elem, 'text') and elem.text:
|
||||||
|
@ -492,8 +492,7 @@ title and author are swapped before the title case is set</string>
|
|||||||
<item>
|
<item>
|
||||||
<widget class="QCheckBox" name="update_title_sort">
|
<widget class="QCheckBox" name="update_title_sort">
|
||||||
<property name="toolTip">
|
<property name="toolTip">
|
||||||
<string>Recompute the title sort value and store it in title sort.
|
<string>Update title sort based on the current title. This will be applied only after other changes to title.</string>
|
||||||
This will happen after any title case changes</string>
|
|
||||||
</property>
|
</property>
|
||||||
<property name="text">
|
<property name="text">
|
||||||
<string>Update &title sort</string>
|
<string>Update &title sort</string>
|
||||||
|
@ -420,7 +420,8 @@ class ResultCache(SearchQueryParser): # {{{
|
|||||||
return candidates - res
|
return candidates - res
|
||||||
return res
|
return res
|
||||||
|
|
||||||
def get_matches(self, location, query, allow_recursion=True, candidates=None):
|
def get_matches(self, location, query, candidates=None,
|
||||||
|
allow_recursion=True):
|
||||||
matches = set([])
|
matches = set([])
|
||||||
if candidates is None:
|
if candidates is None:
|
||||||
candidates = self.universal_set()
|
candidates = self.universal_set()
|
||||||
@ -434,8 +435,8 @@ class ResultCache(SearchQueryParser): # {{{
|
|||||||
if isinstance(location, list):
|
if isinstance(location, list):
|
||||||
if allow_recursion:
|
if allow_recursion:
|
||||||
for loc in location:
|
for loc in location:
|
||||||
matches |= self.get_matches(loc, query, candidates,
|
matches |= self.get_matches(loc, query,
|
||||||
allow_recursion=False)
|
candidates=candidates, allow_recursion=False)
|
||||||
return matches
|
return matches
|
||||||
raise ParseException(query, len(query), 'Recursive query group detected', self)
|
raise ParseException(query, len(query), 'Recursive query group detected', self)
|
||||||
|
|
||||||
|
@ -1841,8 +1841,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
|||||||
body.insert(btc,pTag)
|
body.insert(btc,pTag)
|
||||||
btc += 1
|
btc += 1
|
||||||
|
|
||||||
# <p class="letter_index">
|
|
||||||
# <p class="book_title">
|
|
||||||
divTag = Tag(soup, "div")
|
divTag = Tag(soup, "div")
|
||||||
dtc = 0
|
dtc = 0
|
||||||
current_letter = ""
|
current_letter = ""
|
||||||
@ -1870,11 +1868,12 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
|||||||
divTag.insert(dtc, divRunningTag)
|
divTag.insert(dtc, divRunningTag)
|
||||||
dtc += 1
|
dtc += 1
|
||||||
divRunningTag = Tag(soup, 'div')
|
divRunningTag = Tag(soup, 'div')
|
||||||
divRunningTag['class'] = "logical_group"
|
if dtc > 0:
|
||||||
|
divRunningTag['class'] = "initial_letter"
|
||||||
drtc = 0
|
drtc = 0
|
||||||
current_letter = self.letter_or_symbol(book['title_sort'][0])
|
current_letter = self.letter_or_symbol(book['title_sort'][0])
|
||||||
pIndexTag = Tag(soup, "p")
|
pIndexTag = Tag(soup, "p")
|
||||||
pIndexTag['class'] = "letter_index"
|
pIndexTag['class'] = "author_title_letter_index"
|
||||||
aTag = Tag(soup, "a")
|
aTag = Tag(soup, "a")
|
||||||
aTag['name'] = "%s" % self.letter_or_symbol(book['title_sort'][0])
|
aTag['name'] = "%s" % self.letter_or_symbol(book['title_sort'][0])
|
||||||
pIndexTag.insert(0,aTag)
|
pIndexTag.insert(0,aTag)
|
||||||
@ -1982,8 +1981,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
|||||||
body.insert(btc, aTag)
|
body.insert(btc, aTag)
|
||||||
btc += 1
|
btc += 1
|
||||||
|
|
||||||
# <p class="letter_index">
|
|
||||||
# <p class="author_index">
|
|
||||||
divTag = Tag(soup, "div")
|
divTag = Tag(soup, "div")
|
||||||
dtc = 0
|
dtc = 0
|
||||||
divOpeningTag = None
|
divOpeningTag = None
|
||||||
@ -2017,10 +2014,11 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
|||||||
current_letter = self.letter_or_symbol(book['author_sort'][0].upper())
|
current_letter = self.letter_or_symbol(book['author_sort'][0].upper())
|
||||||
author_count = 0
|
author_count = 0
|
||||||
divOpeningTag = Tag(soup, 'div')
|
divOpeningTag = Tag(soup, 'div')
|
||||||
divOpeningTag['class'] = "logical_group"
|
if dtc > 0:
|
||||||
|
divOpeningTag['class'] = "initial_letter"
|
||||||
dotc = 0
|
dotc = 0
|
||||||
pIndexTag = Tag(soup, "p")
|
pIndexTag = Tag(soup, "p")
|
||||||
pIndexTag['class'] = "letter_index"
|
pIndexTag['class'] = "author_title_letter_index"
|
||||||
aTag = Tag(soup, "a")
|
aTag = Tag(soup, "a")
|
||||||
aTag['name'] = "%sauthors" % self.letter_or_symbol(current_letter)
|
aTag['name'] = "%sauthors" % self.letter_or_symbol(current_letter)
|
||||||
pIndexTag.insert(0,aTag)
|
pIndexTag.insert(0,aTag)
|
||||||
@ -2032,16 +2030,21 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
|||||||
# Start a new author
|
# Start a new author
|
||||||
current_author = book['author']
|
current_author = book['author']
|
||||||
author_count += 1
|
author_count += 1
|
||||||
if author_count == 2:
|
if author_count >= 2:
|
||||||
# Add divOpeningTag to divTag, kill divOpeningTag
|
# Add divOpeningTag to divTag, kill divOpeningTag
|
||||||
divTag.insert(dtc, divOpeningTag)
|
if divOpeningTag:
|
||||||
dtc += 1
|
divTag.insert(dtc, divOpeningTag)
|
||||||
divOpeningTag = None
|
dtc += 1
|
||||||
dotc = 0
|
divOpeningTag = None
|
||||||
|
dotc = 0
|
||||||
|
|
||||||
|
# Create a divRunningTag for the next author
|
||||||
|
if author_count > 2:
|
||||||
|
divTag.insert(dtc, divRunningTag)
|
||||||
|
dtc += 1
|
||||||
|
|
||||||
# Create a divRunningTag for the rest of the authors in this letter
|
|
||||||
divRunningTag = Tag(soup, 'div')
|
divRunningTag = Tag(soup, 'div')
|
||||||
divRunningTag['class'] = "logical_group"
|
divRunningTag['class'] = "author_logical_group"
|
||||||
drtc = 0
|
drtc = 0
|
||||||
|
|
||||||
non_series_books = 0
|
non_series_books = 0
|
||||||
@ -2373,8 +2376,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
|||||||
body.insert(btc,pTag)
|
body.insert(btc,pTag)
|
||||||
btc += 1
|
btc += 1
|
||||||
|
|
||||||
# <p class="letter_index">
|
|
||||||
# <p class="author_index">
|
|
||||||
divTag = Tag(soup, "div")
|
divTag = Tag(soup, "div")
|
||||||
dtc = 0
|
dtc = 0
|
||||||
|
|
||||||
@ -2558,8 +2559,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
|||||||
body.insert(btc, aTag)
|
body.insert(btc, aTag)
|
||||||
btc += 1
|
btc += 1
|
||||||
|
|
||||||
# <p class="letter_index">
|
|
||||||
# <p class="author_index">
|
|
||||||
divTag = Tag(soup, "div")
|
divTag = Tag(soup, "div")
|
||||||
dtc = 0
|
dtc = 0
|
||||||
|
|
||||||
@ -2661,8 +2660,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
|||||||
body.insert(btc, aTag)
|
body.insert(btc, aTag)
|
||||||
btc += 1
|
btc += 1
|
||||||
|
|
||||||
# <p class="letter_index">
|
|
||||||
# <p class="author_index">
|
|
||||||
divTag = Tag(soup, "div")
|
divTag = Tag(soup, "div")
|
||||||
dtc = 0
|
dtc = 0
|
||||||
current_letter = ""
|
current_letter = ""
|
||||||
@ -2677,7 +2674,7 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
|
|||||||
# Start a new letter with Index letter
|
# Start a new letter with Index letter
|
||||||
current_letter = self.letter_or_symbol(sort_title[0].upper())
|
current_letter = self.letter_or_symbol(sort_title[0].upper())
|
||||||
pIndexTag = Tag(soup, "p")
|
pIndexTag = Tag(soup, "p")
|
||||||
pIndexTag['class'] = "letter_index"
|
pIndexTag['class'] = "series_letter_index"
|
||||||
aTag = Tag(soup, "a")
|
aTag = Tag(soup, "a")
|
||||||
aTag['name'] = "%s_series" % self.letter_or_symbol(current_letter)
|
aTag['name'] = "%s_series" % self.letter_or_symbol(current_letter)
|
||||||
pIndexTag.insert(0,aTag)
|
pIndexTag.insert(0,aTag)
|
||||||
|
@ -457,7 +457,7 @@ class CustomColumns(object):
|
|||||||
if num is not None:
|
if num is not None:
|
||||||
data = self.custom_column_num_map[num]
|
data = self.custom_column_num_map[num]
|
||||||
if data['datatype'] == 'composite':
|
if data['datatype'] == 'composite':
|
||||||
return set()
|
return set([])
|
||||||
if not data['editable']:
|
if not data['editable']:
|
||||||
raise ValueError('Column %r is not editable'%data['label'])
|
raise ValueError('Column %r is not editable'%data['label'])
|
||||||
table, lt = self.custom_table_names(data['num'])
|
table, lt = self.custom_table_names(data['num'])
|
||||||
@ -468,7 +468,7 @@ class CustomColumns(object):
|
|||||||
if data['datatype'] == 'series' and extra is None:
|
if data['datatype'] == 'series' and extra is None:
|
||||||
(val, extra) = self._get_series_values(val)
|
(val, extra) = self._get_series_values(val)
|
||||||
|
|
||||||
books_to_refresh = set()
|
books_to_refresh = set([])
|
||||||
if data['normalized']:
|
if data['normalized']:
|
||||||
if data['datatype'] == 'enumeration' and (
|
if data['datatype'] == 'enumeration' and (
|
||||||
val and val not in data['display']['enum_values']):
|
val and val not in data['display']['enum_values']):
|
||||||
@ -497,7 +497,7 @@ class CustomColumns(object):
|
|||||||
ex = existing[idx]
|
ex = existing[idx]
|
||||||
xid = self.conn.get(
|
xid = self.conn.get(
|
||||||
'SELECT id FROM %s WHERE value=?'%table, (ex,), all=False)
|
'SELECT id FROM %s WHERE value=?'%table, (ex,), all=False)
|
||||||
if ex != x:
|
if allow_case_change and ex != x:
|
||||||
case_change = True
|
case_change = True
|
||||||
self.conn.execute(
|
self.conn.execute(
|
||||||
'UPDATE %s SET value=? WHERE id=?'%table, (x, xid))
|
'UPDATE %s SET value=? WHERE id=?'%table, (x, xid))
|
||||||
|
@ -1636,7 +1636,8 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
|||||||
if not authors:
|
if not authors:
|
||||||
authors = [_('Unknown')]
|
authors = [_('Unknown')]
|
||||||
self.conn.execute('DELETE FROM books_authors_link WHERE book=?',(id,))
|
self.conn.execute('DELETE FROM books_authors_link WHERE book=?',(id,))
|
||||||
books_to_refresh = set()
|
books_to_refresh = set([])
|
||||||
|
final_authors = []
|
||||||
for a in authors:
|
for a in authors:
|
||||||
case_change = False
|
case_change = False
|
||||||
if not a:
|
if not a:
|
||||||
@ -1648,13 +1649,17 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
|||||||
if aus:
|
if aus:
|
||||||
aid, name = aus[0]
|
aid, name = aus[0]
|
||||||
# Handle change of case
|
# Handle change of case
|
||||||
if allow_case_change and name != a:
|
if name != a:
|
||||||
self.conn.execute('''UPDATE authors
|
if allow_case_change:
|
||||||
SET name=? WHERE id=?''', (a, aid))
|
self.conn.execute('''UPDATE authors
|
||||||
case_change = True
|
SET name=? WHERE id=?''', (a, aid))
|
||||||
|
case_change = True
|
||||||
|
else:
|
||||||
|
a = name
|
||||||
else:
|
else:
|
||||||
aid = self.conn.execute('''INSERT INTO authors(name)
|
aid = self.conn.execute('''INSERT INTO authors(name)
|
||||||
VALUES (?)''', (a,)).lastrowid
|
VALUES (?)''', (a,)).lastrowid
|
||||||
|
final_authors.append(a.replace('|', ','))
|
||||||
try:
|
try:
|
||||||
self.conn.execute('''INSERT INTO books_authors_link(book, author)
|
self.conn.execute('''INSERT INTO books_authors_link(book, author)
|
||||||
VALUES (?,?)''', (id, aid))
|
VALUES (?,?)''', (id, aid))
|
||||||
@ -1668,7 +1673,7 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
|||||||
self.conn.execute('UPDATE books SET author_sort=? WHERE id=?',
|
self.conn.execute('UPDATE books SET author_sort=? WHERE id=?',
|
||||||
(ss, id))
|
(ss, id))
|
||||||
self.data.set(id, self.FIELD_MAP['authors'],
|
self.data.set(id, self.FIELD_MAP['authors'],
|
||||||
','.join([a.replace(',', '|') for a in authors]),
|
','.join([a.replace(',', '|') for a in final_authors]),
|
||||||
row_is_id=True)
|
row_is_id=True)
|
||||||
self.data.set(id, self.FIELD_MAP['author_sort'], ss, row_is_id=True)
|
self.data.set(id, self.FIELD_MAP['author_sort'], ss, row_is_id=True)
|
||||||
aum = self.authors_with_sort_strings(id, index_is_id=True)
|
aum = self.authors_with_sort_strings(id, index_is_id=True)
|
||||||
@ -1716,6 +1721,10 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
|||||||
title = title.decode(preferred_encoding, 'replace')
|
title = title.decode(preferred_encoding, 'replace')
|
||||||
self.conn.execute('UPDATE books SET title=? WHERE id=?', (title, id))
|
self.conn.execute('UPDATE books SET title=? WHERE id=?', (title, id))
|
||||||
self.data.set(id, self.FIELD_MAP['title'], title, row_is_id=True)
|
self.data.set(id, self.FIELD_MAP['title'], title, row_is_id=True)
|
||||||
|
ts = self.conn.get('SELECT sort FROM books WHERE id=?', (id,),
|
||||||
|
all=False)
|
||||||
|
if ts:
|
||||||
|
self.data.set(id, self.FIELD_MAP['sort'], ts, row_is_id=True)
|
||||||
return True
|
return True
|
||||||
|
|
||||||
def set_title(self, id, title, notify=True, commit=True):
|
def set_title(self, id, title, notify=True, commit=True):
|
||||||
@ -1768,10 +1777,13 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
|||||||
WHERE name=?''', (publisher,))
|
WHERE name=?''', (publisher,))
|
||||||
if pubx:
|
if pubx:
|
||||||
aid, cur_name = pubx[0]
|
aid, cur_name = pubx[0]
|
||||||
if allow_case_change and publisher != cur_name:
|
if publisher != cur_name:
|
||||||
self.conn.execute('''UPDATE publishers SET name=?
|
if allow_case_change:
|
||||||
|
self.conn.execute('''UPDATE publishers SET name=?
|
||||||
WHERE id=?''', (publisher, aid))
|
WHERE id=?''', (publisher, aid))
|
||||||
case_change = True
|
case_change = True
|
||||||
|
else:
|
||||||
|
publisher = cur_name
|
||||||
else:
|
else:
|
||||||
aid = self.conn.execute('''INSERT INTO publishers(name)
|
aid = self.conn.execute('''INSERT INTO publishers(name)
|
||||||
VALUES (?)''', (publisher,)).lastrowid
|
VALUES (?)''', (publisher,)).lastrowid
|
||||||
@ -2163,7 +2175,7 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
|||||||
FROM books_tags_link WHERE tag=tags.id) < 1''')
|
FROM books_tags_link WHERE tag=tags.id) < 1''')
|
||||||
otags = self.get_tags(id)
|
otags = self.get_tags(id)
|
||||||
tags = self.cleanup_tags(tags)
|
tags = self.cleanup_tags(tags)
|
||||||
books_to_refresh = set()
|
books_to_refresh = set([])
|
||||||
for tag in (set(tags)-otags):
|
for tag in (set(tags)-otags):
|
||||||
case_changed = False
|
case_changed = False
|
||||||
tag = tag.strip()
|
tag = tag.strip()
|
||||||
@ -2258,7 +2270,7 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
|||||||
WHERE (SELECT COUNT(id) FROM books_series_link
|
WHERE (SELECT COUNT(id) FROM books_series_link
|
||||||
WHERE series=series.id) < 1''')
|
WHERE series=series.id) < 1''')
|
||||||
(series, idx) = self._get_series_values(series)
|
(series, idx) = self._get_series_values(series)
|
||||||
books_to_refresh = set()
|
books_to_refresh = set([])
|
||||||
if series:
|
if series:
|
||||||
case_change = False
|
case_change = False
|
||||||
if not isinstance(series, unicode):
|
if not isinstance(series, unicode):
|
||||||
@ -2268,9 +2280,12 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
|||||||
sx = self.conn.get('SELECT id,name from series WHERE name=?', (series,))
|
sx = self.conn.get('SELECT id,name from series WHERE name=?', (series,))
|
||||||
if sx:
|
if sx:
|
||||||
aid, cur_name = sx[0]
|
aid, cur_name = sx[0]
|
||||||
if allow_case_change and cur_name != series:
|
if cur_name != series:
|
||||||
self.conn.execute('UPDATE series SET name=? WHERE id=?', (series, aid))
|
if allow_case_change:
|
||||||
case_change = True
|
self.conn.execute('UPDATE series SET name=? WHERE id=?', (series, aid))
|
||||||
|
case_change = True
|
||||||
|
else:
|
||||||
|
series = cur_name
|
||||||
else:
|
else:
|
||||||
aid = self.conn.execute('INSERT INTO series(name) VALUES (?)', (series,)).lastrowid
|
aid = self.conn.execute('INSERT INTO series(name) VALUES (?)', (series,)).lastrowid
|
||||||
self.conn.execute('INSERT INTO books_series_link(book, series) VALUES (?,?)', (id, aid))
|
self.conn.execute('INSERT INTO books_series_link(book, series) VALUES (?,?)', (id, aid))
|
||||||
|
Loading…
x
Reference in New Issue
Block a user