mirror of
				https://github.com/kovidgoyal/calibre.git
				synced 2025-10-30 10:12:25 -04:00 
			
		
		
		
	
		
			
				
	
	
		
			57 lines
		
	
	
		
			1.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			57 lines
		
	
	
		
			1.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| About
 | |
| -----
 | |
| 
 | |
| Text compression format that can be decompressed starting at any point.
 | |
| Little-endian byte ordering is used.
 | |
| 
 | |
| 
 | |
| Header
 | |
| ------
 | |
| 
 | |
| TCR files always start with:
 | |
| 
 | |
| !!8-Bit!!
 | |
| 
 | |
| 
 | |
| Layout
 | |
| ------
 | |
| 
 | |
| Header
 | |
| 256 key dictionary
 | |
| compressed text
 | |
| 
 | |
| 
 | |
| Dictionary
 | |
| ----------
 | |
| 
 | |
| A dictionary of key and replacement string. There are a total of 256 keys,
 | |
| 0 - 255. Each string is preceded with one byte that represents the length of
 | |
| the string.
 | |
| 
 | |
| 
 | |
| Compressed text
 | |
| ---------------
 | |
| 
 | |
| The compressed text is a series of values 0-255 which correspond to a key and
 | |
| thus a string. Reassembling is replacing each key in the compressed text with
 | |
| its corresponding string.
 | |
| 
 | |
| 
 | |
| Compressor
 | |
| -----------------
 | |
| 
 | |
| From Andrew Giddings TCR.c (http://www.cix.co.uk/~gidds/Software/TCR.html):
 | |
| 
 | |
| The TCR compression format is easy to describe: after the fixed header is a
 | |
| dictionary of 256 strings, each preceded by a length byte.  The rest of the
 | |
| file is a list of codes from this dictionary.
 | |
| 
 | |
| The compressor works by starting with each code defined as itself.  While
 | |
| there's an unused code, it finds the most common two-code combination, and
 | |
| creates a new code for it, replacing all occurrences in the text with the
 | |
| new code.
 | |
| 
 | |
| It also searches for codes that are always followed by another, which it can
 | |
| merge, possibly freeing up some.
 | |
| 
 |