This commit is contained in:
bulislaw 2011-01-23 22:07:17 +00:00
commit 6fe910c6f2
159 changed files with 129898 additions and 74828 deletions

View File

@ -4,6 +4,122 @@
# for important features/bug fixes. # for important features/bug fixes.
# Also, each release can have new and improved recipes. # Also, each release can have new and improved recipes.
- version: 0.7.42
date: 2011-01-21
new features:
- title: "0.7.42 is a re-release of 0.7.41, because conversion to MOBI was broken in 0.7.41"
- title: "Conversions: Replace the remove header/footer options with a more geenric search replace option, that allows you to not only remove but also replace text"
- title: "Conversion: The preprocess html option has now become a new 'Heuristic Processing' option which allows you to control exactly which heuristics are used"
- title: "Conversion: Various improvements to Heuristic Processing (used to be preprocess HTML)"
- title: "When adding empty books to calibre, optionally set the author to the author of the currently selected book"
tickets: [7702]
- title: "Device drivers for the Archos 101, SmatQ T7 and Acer Lumiread"
- title: "Catalog generation: Make By Authors optional"
- title: "Allow bulk editing of Date and Published columns."
- title: "Add a little button to clear date and published values to the edit metadata dialogs"
- title: "When adding books by ISBN, allow the specification of special tags that will be added to the new book entries"
tickets: [8436]
- title: "Completion on multiple authors"
tickets: [8405]
- title: "Add AZW to default list of internally viewed formats, a I am tired of getting tickets about it"
- title: "Nicer error message when catalog generation fails"
- title: "Add capitalize option to context menus in the edit metadata dialog"
bug fixes:
- title: "RTF Input: Fix regression in 0.7.40 that broke conversion of some old style RTF files"
- title: "Fix Tag editor forgets position"
tickets: [8271]
- title: "When converting books in the calibre GUI, override metadata from the input document, even when empty."
description: >
"So if you have removed all the tags and comments in the calibre GUI for the book in the calibre GUI, but the actual file that is being converted still has tags and comments, they are ignored. This affects only conversions in the calibre GUI, not from the command line via ebook-convert."
tickets: [8390]
- title: "Fix memory leak when switching libraries"
- title: "RTF Output: Fix incorrent spacing between letters."
tickets: [8422]
- title: "Catalog generation: Add composite columns to Merge Comments eligible types"
- title: "Add a confirmation when closing the add a custom news source dialog."
tickets: [8460]
- title: "Another workaround for LibraryThing UA sniffing that was preventing series metadata download, sigh."
tickets: [8477]
- title: "PD Novel driver: Put books on the SD card into the eBooks folder"
- title: "When shortening filepaths to conform to windows path length limitations, remove text from the middle of each component instead of the ends."
tickets: [8451]
- title: "Make completion in most places case insensitive"
tickets: [8441]
- title: "Fix regression that caused the N key to stop working when editing a Yes/no column"
tickets: [8417]
- title: "Email: Fix bug when connecting to SMTP relays that use MD5 auth"
- title: "MOBI Output: Fix bug that could cause a link pointing to the start of a section to go to a point later in the section is the section contained an empty id attribute"
- title: "When auto converting books and the device is unplugged, do not raise an error."
tickets: [8426]
- title: "Ebook-viewer: Display cover when viewing FB2 files"
- title: "MOBI Input: Special case handling of emptu div tags with a defined height used as paragraph separators."
tickets: [8391]
- title: "Fix sorting of author names into sub categories by first letter in the Tag Browser when the first letter has diacritics"
tickets: [8378]
- title: "Fix regression in 0.7.40 that caused commas in author names to become | when converting/saving to disk"
- title: "Fix view specific format on a book with no formats gives an error"
tickets: [8352]
improved recipes:
- Blic
- Las Vegas Review Journal
- La Vanguardia
- New York Times
- El Pais
- Seattle Times
- Ars Technica
- Dilbert
- Nature News
new recipes:
- title: "kath.net"
author: "Bobus"
- title: "iHNed"
author: "Karel Bilek"
- title: "Gulf News"
author: "Darko Miletic"
- title: "South Africa Mail and Guardian"
author: "77ja65"
- version: 0.7.40 - version: 0.7.40
date: 2011-01-14 date: 2011-01-14

View File

@ -1,6 +1,4 @@
@echo OFF @echo OFF
REM CalibreRun.bat
REM ~~~~~~~~~~~~~~
REM Batch File to start a Calibre configuration on Windows REM Batch File to start a Calibre configuration on Windows
REM giving explicit control of the location of: REM giving explicit control of the location of:
REM - Calibe Program Files REM - Calibe Program Files
@ -24,7 +22,10 @@ REM -------------------------------------
REM Set up Calibre Config folder REM Set up Calibre Config folder
REM ------------------------------------- REM -------------------------------------
If EXIST CalibreConfig SET CALIBRE_CONFIG_DIRECTORY=%cd%\CalibreConfig IF EXIST CalibreConfig (
SET CALIBRE_CONFIG_DIRECTORY=%cd%\CalibreConfig
ECHO CONFIG=%cd%\CalibreConfig
)
REM -------------------------------------------------------------- REM --------------------------------------------------------------
@ -38,24 +39,53 @@ REM drive letter of the USB stick.
REM Comment out any of the following that are not to be used REM Comment out any of the following that are not to be used
REM -------------------------------------------------------------- REM --------------------------------------------------------------
SET CALIBRE_LIBRARY_DIRECTORY=U:\eBOOKS\CalibreLibrary IF EXIST U:\eBooks\CalibreLibrary (
IF EXIST CalibreLibrary SET CALIBRE_LIBRARY_DIRECTORY=%cd%\CalibreLibrary SET CALIBRE_LIBRARY_DIRECTORY=U:\eBOOKS\CalibreLibrary
IF EXIST CalibreBooks SET CALIBRE_LIBRARY_DIRECTORY=%cd%\CalibreBooks ECHO LIBRARY=U:\eBOOKS\CalibreLibrary
)
IF EXIST CalibreLibrary (
SET CALIBRE_LIBRARY_DIRECTORY=%cd%\CalibreLibrary
ECHO LIBRARY=%cd%\CalibreLibrary
)
IF EXIST CalibreBooks (
SET CALIBRE_LIBRARY_DIRECTORY=%cd%\CalibreBooks
ECHO LIBRARY=%cd%\CalibreBooks
)
REM -------------------------------------------------------------- REM --------------------------------------------------------------
REM Specify Location of metadata database (optional) REM Specify Location of metadata database (optional)
REM REM
REM Location where the metadata.db file is located. If not set REM Location where the metadata.db file is located. If not set
REM the same location as Books files will be assumed. This. REM the same location as Books files will be assumed. This.
REM options is used to get better performance when the Library is REM options is used to get better performance when the Library is
REM on a (slow) network drive. Putting the metadata.db file REM on a (slow) network drive. Putting the metadata.db file
REM locally gives a big performance improvement. REM locally makes gives a big performance improvement.
REM
REM NOTE. If you use this option, then the ability to switch
REM libraries within Calibre will be disabled. Therefore
REM you do not want to set it if the metadata.db file
REM is at the same location as the book files.
REM -------------------------------------------------------------- REM --------------------------------------------------------------
IF EXIST CalibreBooks SET SET CALIBRE_OVERRIDE_DATABASE_PATH=%cd%\CalibreBooks\metadata.db IF EXIST CalibreBooks (
IF EXIST CalibreMetadata SET CALIBRE_OVERRIDE_DATABASE_PATH=%cd%\CalibreMetadata\metadata.db IF NOT "%CALIBRE_LIBRARY_DIRECTORY%" == "%cd%\CalibreBooks" (
SET SET CALIBRE_OVERRIDE_DATABASE_PATH=%cd%\CalibreBooks\metadata.db
ECHO DATABASE=%cd%\CalibreBooks\metadata.db
ECHO '
ECHO ***CAUTION*** Library Switching will be disabled
ECHO '
)
)
IF EXIST CalibreMetadata (
IF NOT "%CALIBRE_LIBRARY_DIRECTORY%" == "%cd%\CalibreMetadata" (
SET CALIBRE_OVERRIDE_DATABASE_PATH=%cd%\CalibreMetadata\metadata.db
ECHO DATABASE=%cd%\CalibreMetadata\metadata.db
ECHO '
ECHO ***CAUTION*** Library Switching will be disabled
ECHO '
)
)
REM -------------------------------------------------------------- REM --------------------------------------------------------------
REM Specify Location of source (optional) REM Specify Location of source (optional)
@ -63,13 +93,20 @@ REM
REM It is easy to run Calibre from source REM It is easy to run Calibre from source
REM Just set the environment variable to where the source is located REM Just set the environment variable to where the source is located
REM When running from source the GUI will have a '*' after the version. REM When running from source the GUI will have a '*' after the version.
REM number that is displayed at the bottom of the Calibre main screen.
REM -------------------------------------------------------------- REM --------------------------------------------------------------
IF EXIST Calibre\src SET CALIBRE_DEVELOP_FROM=%cd%\Calibre\src IF EXIST Calibre\src (
SET CALIBRE_DEVELOP_FROM=%cd%\Calibre\src
ECHO SOURCE=%cd%\Calibre\src
)
IF EXIST D:\Calibre\Calibre\src (
SET CALIBRE_DEVELOP_FROM=D:\Calibre\Calibre\src
ECHO SOURCE=D:\Calibre\Calibre\src
)
REM -------------------------------------------------------------- REM --------------------------------------------------------------
REM Specify Location of calibre binaries (optinal) REM Specify Location of calibre binaries (optional)
REM REM
REM To avoid needing Calibre to be set in the search path, ensure REM To avoid needing Calibre to be set in the search path, ensure
REM that Calibre Program Files is current directory when starting. REM that Calibre Program Files is current directory when starting.
@ -78,21 +115,15 @@ REM This folder can be populated by cpying the Calibre2 folder from
REM an existing isntallation or by isntalling direct to here. REM an existing isntallation or by isntalling direct to here.
REM -------------------------------------------------------------- REM --------------------------------------------------------------
IF EXIST Calibre2 CD Calibre2 IF EXIST Calibre2 (
Calibre2 CD Calibre2
ECHO PROGRAMS=%cd%
REM -------------------------------------------- )
REM Display settings that will be used
REM --------------------------------------------
echo PROGRAMS=%cd%
echo SOURCE=%CALIBRE_DEVELOP_FROM%
echo CONFIG=%CALIBRE_CONFIG_DIRECTORY%
echo LIBRARY=%CALIBRE_LIBRARY_DIRECTORY%
echo DATABASE=%CALIBRE_OVERRIDE_DATABASE_PATH%
REM ----------------------------------------------------------
REM The following gives a chance to check the settings before REM The following gives a chance to check the settings before
REM starting Calibre. It can be commented out if not wanted. REM starting Calibre. It can be commented out if not wanted.
REM ----------------------------------------------------------
echo "Press CTRL-C if you do not want to continue" echo "Press CTRL-C if you do not want to continue"
pause pause
@ -111,4 +142,4 @@ REM Use with /WAIT to wait until Calibre completes to run a task on exit
REM -------------------------------------------------------- REM --------------------------------------------------------
echo "Starting up Calibre" echo "Starting up Calibre"
START /belownormal Calibre --with-library %CALIBRE_LIBRARY_DIRECTORY% START /belownormal Calibre --with-library "%CALIBRE_LIBRARY_DIRECTORY%"

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.3 KiB

View File

@ -1,6 +1,6 @@
__license__ = 'GPL v3' __license__ = 'GPL v3'
__copyright__ = '2008-2010, Darko Miletic <darko.miletic at gmail.com>' __copyright__ = '2008-2011, Darko Miletic <darko.miletic at gmail.com>'
''' '''
blic.rs blic.rs
''' '''
@ -21,21 +21,53 @@ class Blic(BasicNewsRecipe):
masthead_url = 'http://www.blic.rs/resources/images/header/header_back.png' masthead_url = 'http://www.blic.rs/resources/images/header/header_back.png'
language = 'sr' language = 'sr'
publication_type = 'newspaper' publication_type = 'newspaper'
extra_css = '@font-face {font-family: "serif1";src:url(res:///opt/sony/ebook/FONT/tt0011m_.ttf)} @font-face {font-family: "sans1";src:url(res:///opt/sony/ebook/FONT/tt0003m_.ttf)} body{font-family: Georgia, serif1, serif} .article_description{font-family: Arial, sans1, sans-serif} .img_full{float: none} img{margin-bottom: 0.8em} ' extra_css = """
@font-face {font-family: "serif1";src:url(res:///opt/sony/ebook/FONT/tt0011m_.ttf)}
@font-face {font-family: "sans1";src:url(res:///opt/sony/ebook/FONT/tt0003m_.ttf)}
body{font-family: Georgia, serif1, serif}
.articledescription,#nadnaslov,.article_info{font-family: Arial, sans1, sans-serif}
.img_full{float: none}
#nadnaslov{font-size: small}
#article_lead{font-size: 1.5em}
h1{color: red}
.potpis{font-size: x-small; color: gray}
.article_info{font-size: small}
img{margin-bottom: 0.8em; margin-top: 0.8em; display: block}
"""
conversion_options = { conversion_options = {
'comment' : description 'comment' : description
, 'tags' : category , 'tags' : category
, 'publisher': publisher , 'publisher': publisher
, 'language' : language , 'language' : language
, 'linearize_tables' : True
} }
preprocess_regexps = [(re.compile(u'\u0110'), lambda match: u'\u00D0')] preprocess_regexps = [(re.compile(u'\u0110'), lambda match: u'\u00D0')]
remove_tags_before = dict(name='div', attrs={'id':'article_info'}) remove_tags_before = dict(name='div', attrs={'id':'article_info'})
remove_tags = [dict(name=['object','link'])] remove_tags = [dict(name=['object','link','meta','base','object','embed'])]
remove_attributes = ['width','height'] remove_attributes = ['width','height','m_id','m_ext','mlg_id','poll_id','v_id']
feeds = [(u'Danasnje Vesti', u'http://www.blic.rs/rss/danasnje-vesti')] feeds = [
(u'Politika' , u'http://www.blic.rs/rss/Vesti/Politika')
,(u'Tema Dana' , u'http://www.blic.rs/rss/Vesti/Tema-Dana')
,(u'Svet' , u'http://www.blic.rs/rss/Vesti/Svet')
,(u'Drustvo' , u'http://www.blic.rs/rss/Vesti/Drustvo')
,(u'Ekonomija' , u'http://www.blic.rs/rss/Vesti/Ekonomija')
,(u'Hronika' , u'http://www.blic.rs/rss/Vesti/Hronika')
,(u'Beograd' , u'http://www.blic.rs/rss/Vesti/Beograd')
,(u'Srbija' , u'http://www.blic.rs/rss/Vesti/Srbija')
,(u'Vojvodina' , u'http://www.blic.rs/rss/Vesti/Vojvodina')
,(u'Republika Srpska' , u'http://www.blic.rs/rss/Vesti/Republika-Srpska')
,(u'Reportaza' , u'http://www.blic.rs/rss/Vesti/Reportaza')
,(u'Dodatak' , u'http://www.blic.rs/rss/Vesti/Dodatak')
,(u'Zabava' , u'http://www.blic.rs/rss/Zabava')
,(u'Kultura' , u'http://www.blic.rs/rss/Kultura')
,(u'Slobodno Vreme' , u'http://www.blic.rs/rss/Slobodno-vreme')
,(u'IT' , u'http://www.blic.rs/rss/IT')
,(u'Komentar' , u'http://www.blic.rs/rss/Komentar')
,(u'Intervju' , u'http://www.blic.rs/rss/Intervju')
]
def print_version(self, url): def print_version(self, url):
@ -44,4 +76,4 @@ class Blic(BasicNewsRecipe):
def preprocess_html(self, soup): def preprocess_html(self, soup):
for item in soup.findAll(style=True): for item in soup.findAll(style=True):
del item['style'] del item['style']
return self.adeify_images(soup) return soup

View File

@ -7,22 +7,29 @@ class DallasNews(BasicNewsRecipe):
max_articles_per_feed = 25 max_articles_per_feed = 25
no_stylesheets = True no_stylesheets = True
remove_tags_before = dict(name='h2', attrs={'class':'vitstoryheadline'}) use_embedded_content = False
remove_tags_after = dict(name='div', attrs={'style':'width: 100%; clear: right'}) remove_tags_before = dict(name='h1')
remove_tags_after = dict(name='div', attrs={'id':'article_tools_bottom'}) keep_only_tags = {'class':lambda x: x and 'article' in x}
remove_tags = [ remove_tags = [
dict(name='iframe'), {'class':['DMNSocialTools', 'article ', 'article first ', 'article premium']},
dict(name='div', attrs={'class':'biblockmore'}),
dict(name='div', attrs={'style':'width: 100%; clear: right'}),
dict(name='div', attrs={'id':'article_tools_bottom'}),
#dict(name='ul', attrs={'class':'articleTools'}),
] ]
feeds = [ feeds = [
('Latest News', 'http://www.dallasnews.com/newskiosk/rss/dallasnewslatestnews.xml'), ('Local News',
('Local News', 'http://www.dallasnews.com/newskiosk/rss/dallasnewslocalnews.xml'), 'http://www.dallasnews.com/news/politics/local-politics/?rss'),
('Nation and World', 'http://www.dallasnews.com/newskiosk/rss/dallasnewsnationworld.xml'), ('National Politics',
('Politics', 'http://www.dallasnews.com/newskiosk/rss/dallasnewsnationalpolitics.xml'), 'http://www.dallasnews.com/news/politics/national-politic/?rss'),
('Science', 'http://www.dallasnews.com/newskiosk/rss/dallasnewsscience.xml'), ('State Politics',
'http://www.dallasnews.com/news/politics/state-politics/?rss'),
('Religion',
'http://www.dallasnews.com/news/religion/?rss'),
('Crime',
'http://www.dallasnews.com/news/crime/headlines/?rss'),
('Celebrity News',
'http://www.dallasnews.com/entertainment/celebrity-news/?rss&listname=TopStories'),
('Nation',
'http://www.dallasnews.com/news/nation-world/nation/?rss'),
('World',
'http://www.dallasnews.com/news/nation-world/world/?rss'),
] ]

View File

@ -0,0 +1,36 @@
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1295088390(BasicNewsRecipe):
title = u'Everett Herald'
language = 'en'
__author__ = '77ja65'
oldest_article = 4
max_articles_per_feed = 50
no_stylesheets = True
masthead_url = 'http://heraldnet.com/images/hnet/jQueryComponents/jQueryNavigation/heraldnet_logo.png'
extra_css = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt }'
feeds = [(u'Local News',
u'http://heraldnet.com/section/RSS02&mime=xml'),
(u'Sports', u'http://heraldnet.com/section/RSS04&mime=xml'),
(u'Entertainment',
u'http://heraldnet.com/section/RSS07&mime=xml'),
(u'Life', u'http://heraldnet.com/section/RSS03&mime=xml'),
(u'Breaking News',
u'http://heraldnet.com/section/RSS34&mime=xml'),
(u'Seahawks', u'http://heraldnet.com/section/RSS22&mime=xml'),
(u'HeraldNet', u'http://heraldnet.com/section/RSS01&mime=xml'),
(u'Inside Everett',
u'http://heraldnet.com/section/RSS26&mime=xml')
]
def print_version(self, url):
return url + "&template=PrinterFriendly"
extra_css = '''
h1{font-family:Arial,Helvetica,sans-serif; font-
weight:bold;font-size:large;}
h2{font-family:Arial,Helvetica,sans-serif; font-
weight:normal;font-size:small;}
'''

View File

@ -0,0 +1,64 @@
__license__ = 'GPL v3'
__copyright__ = '2011, Darko Miletic <darko.miletic at gmail.com>'
'''
gulfnews.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
class GulfNews(BasicNewsRecipe):
title = 'Gulf News'
__author__ = 'Darko Miletic'
description = 'News from United Arab Emirrates, persian gulf and rest of the world'
publisher = 'Al Nisr Publishing LLC'
category = 'news, politics, UAE, world'
oldest_article = 2
max_articles_per_feed = 200
no_stylesheets = True
encoding = 'utf8'
use_embedded_content = False
language = 'en'
remove_empty_feeds = True
publication_type = 'newsportal'
masthead_url = 'http://gulfnews.com/media/img/gulf_news_logo.jpg'
extra_css = """
body{font-family: Arial,Helvetica,sans-serif }
img{margin-bottom: 0.4em; display:block}
h1{font-family: Georgia, 'Times New Roman', Times, serif}
ol,ul{list-style: none}
.synopsis{font-size: small}
.details{font-size: x-small}
.image{font-size: xx-small}
"""
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
}
remove_tags = [
dict(name=['meta','link','object','embed'])
,dict(attrs={'class':['quickLinks','ratings']})
,dict(attrs={'id':'imageSelector'})
]
remove_attributes=['lang']
keep_only_tags=[
dict(name='h1')
,dict(attrs={'class':['synopsis','details','image','article']})
]
feeds = [
(u'UAE News' , u'http://gulfnews.com/cmlink/1.446094')
,(u'Business' , u'http://gulfnews.com/cmlink/1.446098')
,(u'Entertainment' , u'http://gulfnews.com/cmlink/1.446095')
,(u'Sport' , u'http://gulfnews.com/cmlink/1.446096')
,(u'Life' , u'http://gulfnews.com/cmlink/1.446097')
]
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
return soup

View File

@ -5,6 +5,7 @@ class AdvancedUserRecipe1293122276(BasicNewsRecipe):
__author__ = 'Jack Mason' __author__ = 'Jack Mason'
author = 'IBM Global Business Services' author = 'IBM Global Business Services'
publisher = 'IBM' publisher = 'IBM'
language = 'en'
category = 'news, technology, IT, internet of things, analytics' category = 'news, technology, IT, internet of things, analytics'
oldest_article = 7 oldest_article = 7
max_articles_per_feed = 30 max_articles_per_feed = 30

View File

@ -6,6 +6,7 @@ class KANewsRecipe(BasicNewsRecipe):
description = u'Nachrichten aus Karlsruhe, Deutschland und der Welt.' description = u'Nachrichten aus Karlsruhe, Deutschland und der Welt.'
__author__ = 'tfeld' __author__ = 'tfeld'
lang='de' lang='de'
language = 'de'
no_stylesheets = True no_stylesheets = True
oldest_article = 7 oldest_article = 7

View File

@ -4,6 +4,7 @@ class AdvancedUserRecipe1295262156(BasicNewsRecipe):
title = u'kath.net' title = u'kath.net'
__author__ = 'Bobus' __author__ = 'Bobus'
oldest_article = 7 oldest_article = 7
language = 'en'
max_articles_per_feed = 100 max_articles_per_feed = 100
feeds = [(u'kath.net', u'http://www.kath.net/2005/xml/index.xml')] feeds = [(u'kath.net', u'http://www.kath.net/2005/xml/index.xml')]

View File

@ -3,12 +3,17 @@ from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1274742400(BasicNewsRecipe): class AdvancedUserRecipe1274742400(BasicNewsRecipe):
title = u'Las Vegas Review Journal' title = u'Las Vegas Review Journal'
__author__ = 'Joel' __author__ = 'Kovid Goyal'
language = 'en' language = 'en'
oldest_article = 7 oldest_article = 7
max_articles_per_feed = 100 max_articles_per_feed = 100
keep_only_tags = [dict(id='content-main')]
remove_tags = [dict(id=['right-col-content', 'trending-topics']),
{'class':['ppy-outer']}
]
no_stylesheets = True
feeds = [ feeds = [
(u'News', u'http://www.lvrj.com/news.rss'), (u'News', u'http://www.lvrj.com/news.rss'),

View File

@ -20,8 +20,8 @@ class LaVanguardia(BasicNewsRecipe):
max_articles_per_feed = 100 max_articles_per_feed = 100
no_stylesheets = True no_stylesheets = True
use_embedded_content = False use_embedded_content = False
delay = 1 delay = 5
encoding = 'cp1252' # encoding = 'cp1252'
language = 'es' language = 'es'
direction = 'ltr' direction = 'ltr'
@ -35,8 +35,8 @@ class LaVanguardia(BasicNewsRecipe):
html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"' html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'
feeds = [ feeds = [
(u'Ciudadanos' , u'http://feeds.feedburner.com/lavanguardia/ciudadanos' ) (u'Portada' , u'http://feeds.feedburner.com/lavanguardia/home' )
,(u'Cultura' , u'http://feeds.feedburner.com/lavanguardia/cultura' ) ,(u'Cultura' , u'http://feeds.feedburner.com/lavanguardia/cultura' )
,(u'Deportes' , u'http://feeds.feedburner.com/lavanguardia/deportes' ) ,(u'Deportes' , u'http://feeds.feedburner.com/lavanguardia/deportes' )
,(u'Economia' , u'http://feeds.feedburner.com/lavanguardia/economia' ) ,(u'Economia' , u'http://feeds.feedburner.com/lavanguardia/economia' )
,(u'El lector opina' , u'http://feeds.feedburner.com/lavanguardia/lectoropina' ) ,(u'El lector opina' , u'http://feeds.feedburner.com/lavanguardia/lectoropina' )
@ -45,17 +45,17 @@ class LaVanguardia(BasicNewsRecipe):
,(u'Internet y tecnologia', u'http://feeds.feedburner.com/lavanguardia/internet' ) ,(u'Internet y tecnologia', u'http://feeds.feedburner.com/lavanguardia/internet' )
,(u'Motor' , u'http://feeds.feedburner.com/lavanguardia/motor' ) ,(u'Motor' , u'http://feeds.feedburner.com/lavanguardia/motor' )
,(u'Politica' , u'http://feeds.feedburner.com/lavanguardia/politica' ) ,(u'Politica' , u'http://feeds.feedburner.com/lavanguardia/politica' )
,(u'Sucessos' , u'http://feeds.feedburner.com/lavanguardia/sucesos' ) ,(u'Sucesos' , u'http://feeds.feedburner.com/lavanguardia/sucesos' )
] ]
keep_only_tags = [ keep_only_tags = [
dict(name='div', attrs={'class':'element1_3'}) dict(name='div', attrs={'class':'detalle noticia'})
] ]
remove_tags = [ remove_tags = [
dict(name=['object','link','script']) dict(name=['object','link','script'])
,dict(name='div', attrs={'class':['colC','peu']}) ,dict(name='div', attrs={'class':['colC','peu','jstoolbar']})
] ]
remove_tags_after = [dict(name='div', attrs={'class':'text'})] remove_tags_after = [dict(name='div', attrs={'class':'text'})]
@ -67,4 +67,3 @@ class LaVanguardia(BasicNewsRecipe):
for item in soup.findAll(style=True): for item in soup.findAll(style=True):
del item['style'] del item['style']
return soup return soup

View File

@ -10,6 +10,7 @@ import re
class NationalGeographicNews(BasicNewsRecipe): class NationalGeographicNews(BasicNewsRecipe):
title = u'National Geographic News' title = u'National Geographic News'
oldest_article = 7 oldest_article = 7
language = 'en'
max_articles_per_feed = 100 max_articles_per_feed = 100
remove_javascript = True remove_javascript = True
no_stylesheets = True no_stylesheets = True

View File

@ -1,5 +1,5 @@
__license__ = 'GPL v3' __license__ = 'GPL v3'
__copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>' __copyright__ = '2010-2011, Darko Miletic <darko.miletic at gmail.com>'
''' '''
nrc.nl nrc.nl
''' '''
@ -15,13 +15,18 @@ class Pagina12(BasicNewsRecipe):
oldest_article = 2 oldest_article = 2
max_articles_per_feed = 200 max_articles_per_feed = 200
no_stylesheets = True no_stylesheets = True
encoding = 'cp1252' encoding = 'utf8'
use_embedded_content = False use_embedded_content = False
language = 'nl' language = 'nl'
country = 'NL' country = 'NL'
remove_empty_feeds = True remove_empty_feeds = True
masthead_url = 'http://www.nrc.nl/nrc.nl/images/logo_nrc.png' masthead_url = 'http://www.nrc.nl/nrc.nl/images/logo_nrc.png'
extra_css = ' body{font-family: Verdana,Arial,Helvetica,sans-serif } img{margin-bottom: 0.4em} h1,h2,h3{text-align:left} ' extra_css = """
body{font-family: Georgia,serif }
img{margin-bottom: 0.4em; display: block}
.bijschrift,.sectie{font-size: x-small}
.sectie{color: gray}
"""
conversion_options = { conversion_options = {
'comment' : description 'comment' : description
@ -30,21 +35,42 @@ class Pagina12(BasicNewsRecipe):
, 'language' : language , 'language' : language
} }
keep_only_tags = [dict(name='div',attrs={'class':'article clearfix'})] keep_only_tags = [dict(attrs={'class':'uitstekendekeus'})]
remove_tags = [
dict(name=['meta','base','link','object','embed'])
,dict(attrs={'class':['reclamespace','tags-and-sharing']})
]
remove_attributes=['lang']
feeds = [ feeds = [
(u'Voorpagina' , u'http://feeds.feedburner.com/NRCHandelsbladVoorpagina' ) (u'Voor nieuws', u'http://www.nrc.nl/nieuws/categorie/nieuws/rss.php' )
,(u'Binnenland' , u'http://feeds.feedburner.com/NRCHandelsbladBinnenland' ) ,(u'Binnenland' , u'http://www.nrc.nl/nieuws/categorie/binnenland/rss.php' )
,(u'Buitenland' , u'http://feeds.feedburner.com/NRCHandelsbladBuitenland' ) ,(u'Buitenland' , u'http://www.nrc.nl/nieuws/categorie/buitenland/rss.php' )
,(u'Economie' , u'http://feeds.feedburner.com/NRCHandelsbladEconomie' ) ,(u'Economie' , u'http://www.nrc.nl/nieuws/categorie/economie/rss.php' )
,(u'Kunst & Film' , u'http://feeds.feedburner.com/nrc/NRCHandelsbladKunstEnFilm') ,(u'Cultuur' , u'http://www.nrc.nl/nieuws/categorie/cultuur/rss.php' )
,(u'Sport' , u'http://feeds.feedburner.com/NRCHandelsbladSport' ) ,(u'Sport' , u'http://www.nrc.nl/nieuws/categorie/sport/rss.php' )
,(u'Wetenschap ' , u'http://www.nrc.nl/rss/wetenschap' ) ,(u'Wetenschap ', u'http://www.nrc.nl/nieuws/categorie/wetenschap-nieuws/rss.php')
] ]
def print_version(self, url):
return url + '?service=Print'
def preprocess_html(self, soup): def preprocess_html(self, soup):
return self.adeify_images(soup) for item in soup.findAll(style=True):
del item['style']
for item in soup.findAll('a'):
limg = item.find('img')
if item.string is not None:
str = item.string
item.replaceWith(str)
else:
if limg:
item.name = 'div'
atritems =['href','target','rel']
for atit in atritems:
if item.has_key(atit):
del item[atit]
else:
str = self.tag_to_string(item)
item.replaceWith(str)
for item in soup.findAll('img'):
if not item.has_key('alt'):
item['alt'] = 'image'
return soup

View File

@ -159,6 +159,11 @@ class NYTimes(BasicNewsRecipe):
'relatedSearchesModule', 'relatedSearchesModule',
'side_tool', 'side_tool',
'singleAd', 'singleAd',
'entry entry-utility', #added for DealBook
'entry-tags', #added for DealBook
'footer promos clearfix', #added for DealBook
'footer links clearfix', #added for DealBook
'inlineImage module', #added for DealBook
re.compile('^subNavigation'), re.compile('^subNavigation'),
re.compile('^leaderboard'), re.compile('^leaderboard'),
re.compile('^module'), re.compile('^module'),
@ -192,6 +197,9 @@ class NYTimes(BasicNewsRecipe):
'side_index', 'side_index',
'side_tool', 'side_tool',
'toolsRight', 'toolsRight',
'skybox', #added for DealBook
'TopAd', #added for DealBook
'related-content', #added for DealBook
]), ]),
dict(name=['script', 'noscript', 'style','form','hr'])] dict(name=['script', 'noscript', 'style','form','hr'])]
no_stylesheets = True no_stylesheets = True
@ -246,7 +254,7 @@ class NYTimes(BasicNewsRecipe):
def exclude_url(self,url): def exclude_url(self,url):
if not url.startswith("http"): if not url.startswith("http"):
return True return True
if not url.endswith(".html"): if not url.endswith(".html") and 'dealbook.nytimes.com' not in url: #added for DealBook
return True return True
if 'nytimes.com' not in url: if 'nytimes.com' not in url:
return True return True
@ -569,7 +577,6 @@ class NYTimes(BasicNewsRecipe):
def preprocess_html(self, soup): def preprocess_html(self, soup):
if self.webEdition & (self.oldest_article>0): if self.webEdition & (self.oldest_article>0):
date_tag = soup.find(True,attrs={'class': ['dateline','date']}) date_tag = soup.find(True,attrs={'class': ['dateline','date']})
if date_tag: if date_tag:
@ -592,128 +599,168 @@ class NYTimes(BasicNewsRecipe):
img_div = soup.find('div','inlineImage module') img_div = soup.find('div','inlineImage module')
if img_div: if img_div:
img_div.extract() img_div.extract()
return self.strip_anchors(soup) return self.strip_anchors(soup)
def postprocess_html(self,soup, True): def postprocess_html(self,soup, True):
try:
if self.one_picture_per_article:
# Remove all images after first
largeImg = soup.find(True, {'class':'articleSpanImage'})
inlineImgs = soup.findAll(True, {'class':'inlineImage module'})
if largeImg:
for inlineImg in inlineImgs:
inlineImg.extract()
else:
if inlineImgs:
firstImg = inlineImgs[0]
for inlineImg in inlineImgs[1:]:
inlineImg.extract()
# Move firstImg before article body
cgFirst = soup.find(True, {'class':re.compile('columnGroup *first')})
if cgFirst:
# Strip all sibling NavigableStrings: noise
navstrings = cgFirst.findAll(text=True, recursive=False)
[ns.extract() for ns in navstrings]
headline_found = False
tag = cgFirst.find(True)
insertLoc = 0
while True:
insertLoc += 1
if hasattr(tag,'class') and tag['class'] == 'articleHeadline':
headline_found = True
break
tag = tag.nextSibling
if not tag:
headline_found = False
break
if headline_found:
cgFirst.insert(insertLoc,firstImg)
else:
self.log(">>> No class:'columnGroup first' found <<<")
except:
self.log("ERROR: One picture per article in postprocess_html")
try: try:
# Change captions to italic if self.one_picture_per_article:
for caption in soup.findAll(True, {'class':'caption'}) : # Remove all images after first
if caption and len(caption) > 0: largeImg = soup.find(True, {'class':'articleSpanImage'})
cTag = Tag(soup, "p", [("class", "caption")]) inlineImgs = soup.findAll(True, {'class':'inlineImage module'})
c = self.fixChars(self.tag_to_string(caption,use_alt=False)).strip() if largeImg:
mp_off = c.find("More Photos") for inlineImg in inlineImgs:
if mp_off >= 0: inlineImg.extract()
c = c[:mp_off] else:
cTag.insert(0, c) if inlineImgs:
caption.replaceWith(cTag) firstImg = inlineImgs[0]
except: for inlineImg in inlineImgs[1:]:
self.log("ERROR: Problem in change captions to italic") inlineImg.extract()
# Move firstImg before article body
cgFirst = soup.find(True, {'class':re.compile('columnGroup *first')})
if cgFirst:
# Strip all sibling NavigableStrings: noise
navstrings = cgFirst.findAll(text=True, recursive=False)
[ns.extract() for ns in navstrings]
headline_found = False
tag = cgFirst.find(True)
insertLoc = 0
while True:
insertLoc += 1
if hasattr(tag,'class') and tag['class'] == 'articleHeadline':
headline_found = True
break
tag = tag.nextSibling
if not tag:
headline_found = False
break
if headline_found:
cgFirst.insert(insertLoc,firstImg)
else:
self.log(">>> No class:'columnGroup first' found <<<")
except:
self.log("ERROR: One picture per article in postprocess_html")
try: try:
# Change <nyt_headline> to <h2> # Change captions to italic
h1 = soup.find('h1') for caption in soup.findAll(True, {'class':'caption'}) :
if h1: if caption and len(caption) > 0:
headline = h1.find("nyt_headline") cTag = Tag(soup, "p", [("class", "caption")])
if headline: c = self.fixChars(self.tag_to_string(caption,use_alt=False)).strip()
tag = Tag(soup, "h2") mp_off = c.find("More Photos")
tag['class'] = "headline" if mp_off >= 0:
tag.insert(0, self.fixChars(headline.contents[0])) c = c[:mp_off]
h1.replaceWith(tag) cTag.insert(0, c)
else: caption.replaceWith(cTag)
# Blog entry - replace headline, remove <hr> tags except:
headline = soup.find('title') self.log("ERROR: Problem in change captions to italic")
if headline:
tag = Tag(soup, "h2")
tag['class'] = "headline"
tag.insert(0, self.fixChars(headline.contents[0]))
soup.insert(0, tag)
hrs = soup.findAll('hr')
for hr in hrs:
hr.extract()
except:
self.log("ERROR: Problem in Change <nyt_headline> to <h2>")
try: try:
# Change <h1> to <h3> - used in editorial blogs # Change <nyt_headline> to <h2>
masthead = soup.find("h1") h1 = soup.find('h1')
if masthead: blogheadline = str(h1) #added for dealbook
# Nuke the href if h1:
if masthead.a: headline = h1.find("nyt_headline")
del(masthead.a['href']) if headline:
tag = Tag(soup, "h3") tag = Tag(soup, "h2")
tag.insert(0, self.fixChars(masthead.contents[0])) tag['class'] = "headline"
masthead.replaceWith(tag) tag.insert(0, self.fixChars(headline.contents[0]))
except: h1.replaceWith(tag)
self.log("ERROR: Problem in Change <h1> to <h3> - used in editorial blogs") elif blogheadline.find('entry-title'):#added for dealbook
tag = Tag(soup, "h2")#added for dealbook
tag['class'] = "headline"#added for dealbook
tag.insert(0, self.fixChars(h1.contents[0]))#added for dealbook
h1.replaceWith(tag)#added for dealbook
try: else:
# Change <span class="bold"> to <b> # Blog entry - replace headline, remove <hr> tags - BCC I think this is no longer functional 1-18-2011
for subhead in soup.findAll(True, {'class':'bold'}) : headline = soup.find('title')
if subhead.contents: if headline:
bTag = Tag(soup, "b") tag = Tag(soup, "h2")
bTag.insert(0, subhead.contents[0]) tag['class'] = "headline"
subhead.replaceWith(bTag) tag.insert(0, self.fixChars(headline.renderContents()))
except: soup.insert(0, tag)
self.log("ERROR: Problem in Change <h1> to <h3> - used in editorial blogs") hrs = soup.findAll('hr')
for hr in hrs:
hr.extract()
except:
self.log("ERROR: Problem in Change <nyt_headline> to <h2>")
try: try:
divTag = soup.find('div',attrs={'id':'articleBody'}) #if this is from a blog (dealbook, fix the byline format
if divTag: bylineauthor = soup.find('address',attrs={'class':'byline author vcard'})
divTag['class'] = divTag['id'] if bylineauthor:
except: tag = Tag(soup, "h6")
self.log("ERROR: Problem in soup.find(div,attrs={id:articleBody})") tag['class'] = "byline"
tag.insert(0, self.fixChars(bylineauthor.renderContents()))
bylineauthor.replaceWith(tag)
except:
self.log("ERROR: fixing byline author format")
try: try:
# Add class="authorId" to <div> so we can format with CSS #if this is a blog (dealbook) fix the credit style for the pictures
divTag = soup.find('div',attrs={'id':'authorId'}) blogcredit = soup.find('div',attrs={'class':'credit'})
if divTag and divTag.contents[0]: if blogcredit:
tag = Tag(soup, "p") tag = Tag(soup, "h6")
tag['class'] = "authorId" tag['class'] = "credit"
tag.insert(0, self.fixChars(self.tag_to_string(divTag.contents[0], tag.insert(0, self.fixChars(blogcredit.renderContents()))
use_alt=False))) blogcredit.replaceWith(tag)
divTag.replaceWith(tag) except:
except: self.log("ERROR: fixing credit format")
self.log("ERROR: Problem in Add class=authorId to <div> so we can format with CSS")
return soup
try:
# Change <h1> to <h3> - used in editorial blogs
masthead = soup.find("h1")
if masthead:
# Nuke the href
if masthead.a:
del(masthead.a['href'])
tag = Tag(soup, "h3")
tag.insert(0, self.fixChars(masthead.contents[0]))
masthead.replaceWith(tag)
except:
self.log("ERROR: Problem in Change <h1> to <h3> - used in editorial blogs")
try:
# Change <span class="bold"> to <b>
for subhead in soup.findAll(True, {'class':'bold'}) :
if subhead.contents:
bTag = Tag(soup, "b")
bTag.insert(0, subhead.contents[0])
subhead.replaceWith(bTag)
except:
self.log("ERROR: Problem in Change <h1> to <h3> - used in editorial blogs")
try:
#remove the <strong> update tag
blogupdated = soup.find('span', {'class':'update'})
if blogupdated:
blogupdated.replaceWith("")
except:
self.log("ERROR: Removing strong tag")
try:
divTag = soup.find('div',attrs={'id':'articleBody'})
if divTag:
divTag['class'] = divTag['id']
except:
self.log("ERROR: Problem in soup.find(div,attrs={id:articleBody})")
try:
# Add class="authorId" to <div> so we can format with CSS
divTag = soup.find('div',attrs={'id':'authorId'})
if divTag and divTag.contents[0]:
tag = Tag(soup, "p")
tag['class'] = "authorId"
tag.insert(0, self.fixChars(self.tag_to_string(divTag.contents[0],
use_alt=False)))
divTag.replaceWith(tag)
except:
self.log("ERROR: Problem in Add class=authorId to <div> so we can format with CSS")
return soup
def populate_article_metadata(self, article, soup, first): def populate_article_metadata(self, article, soup, first):
shortparagraph = "" shortparagraph = ""
try: try:

View File

@ -0,0 +1,120 @@
import re
import urllib2
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, SoupStrainer
class Ebert(BasicNewsRecipe):
title = 'Roger Ebert'
__author__ = 'Shane Erstad'
description = 'Roger Ebert Movie Reviews'
publisher = 'Chicago Sun Times'
category = 'movies'
oldest_article = 8
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
encoding = 'utf-8'
masthead_url = 'http://rogerebert.suntimes.com/graphics/global/roger.jpg'
language = 'en'
remove_empty_feeds = False
PREFIX = 'http://rogerebert.suntimes.com'
patternReviews = r'<span class="*?movietitle"*?>(.*?)</span>.*?<div class="*?headline"*?>(.*?)</div>(.*?)</div>'
patternCommentary = r'<div class="*?headline"*?>.*?(<a href="/apps/pbcs.dll/article\?AID=.*?COMMENTARY.*?" id="ltred">.*?</a>).*?<div class="blurb clear">(.*?)</div>'
patternPeople = r'<div class="*?headline"*?>.*?(<a href="/apps/pbcs.dll/article\?AID=.*?PEOPLE.*?" id="ltred">.*?</a>).*?<div class="blurb clear">(.*?)</div>'
patternGlossary = r'<div class="*?headline"*?>.*?(<a href="/apps/pbcs.dll/article\?AID=.*?GLOSSARY.*?" id="ltred">.*?</a>).*?<div class="blurb clear">(.*?)</div>'
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
, 'linearize_tables' : True
}
feeds = [
(u'Reviews' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=reviews' )
,(u'Commentary' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=COMMENTARY')
,(u'Great Movies' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=REVIEWS08')
,(u'People' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=PEOPLE')
,(u'Glossary' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=GLOSSARY')
]
preprocess_regexps = [
(re.compile(r'<font.*?>.*?This is a printer friendly.*?</font>.*?<hr>', re.DOTALL|re.IGNORECASE),
lambda m: '')
]
def print_version(self, url):
return url + '&template=printart'
def parse_index(self):
totalfeeds = []
lfeeds = self.get_feeds()
for feedobj in lfeeds:
feedtitle, feedurl = feedobj
self.log('\tFeedurl: ', feedurl)
self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
articles = []
page = urllib2.urlopen(feedurl).read()
if feedtitle == 'Reviews' or feedtitle == 'Great Movies':
pattern = self.patternReviews
elif feedtitle == 'Commentary':
pattern = self.patternCommentary
elif feedtitle == 'People':
pattern = self.patternPeople
elif feedtitle == 'Glossary':
pattern = self.patternGlossary
regex = re.compile(pattern, re.IGNORECASE|re.DOTALL)
for match in regex.finditer(page):
if feedtitle == 'Reviews' or feedtitle == 'Great Movies':
movietitle = match.group(1)
thislink = match.group(2)
description = match.group(3)
elif feedtitle == 'Commentary' or feedtitle == 'People' or feedtitle == 'Glossary':
thislink = match.group(1)
description = match.group(2)
self.log(thislink)
for link in BeautifulSoup(thislink, parseOnlyThese=SoupStrainer('a')):
thisurl = self.PREFIX + link['href']
thislinktext = self.tag_to_string(link)
if feedtitle == 'Reviews' or feedtitle == 'Great Movies':
thistitle = movietitle
elif feedtitle == 'Commentary' or feedtitle == 'People' or feedtitle == 'Glossary':
thistitle = thislinktext
if thistitle == '':
thistitle = 'Ebert Journal Post'
"""
pattern2 = r'AID=\/(.*?)\/'
reg2 = re.compile(pattern2, re.IGNORECASE|re.DOTALL)
match2 = reg2.search(thisurl)
date = match2.group(1)
c = time.strptime(match2.group(1),"%Y%m%d")
date=time.strftime("%a, %b %d, %Y", c)
self.log(date)
"""
articles.append({
'title' :thistitle
,'date' :''
,'url' :thisurl
,'description':description
})
totalfeeds.append((feedtitle, articles))
return totalfeeds

View File

@ -43,8 +43,9 @@ class Stage3(Command):
description = 'Stage 3 of the publish process' description = 'Stage 3 of the publish process'
sub_commands = ['upload_user_manual', 'upload_demo', 'sdist', sub_commands = ['upload_user_manual', 'upload_demo', 'sdist',
'upload_to_google_code', 'tag_release', 'upload_to_server', 'upload_to_google_code', 'upload_to_sourceforge',
'upload_to_sourceforge', 'upload_to_mobileread', 'tag_release', 'upload_to_server',
'upload_to_mobileread',
] ]
class Stage4(Command): class Stage4(Command):

View File

@ -241,7 +241,7 @@ def get_parsed_proxy(typ='http', debug=True):
return ans return ans
def browser(honor_time=True, max_time=2, mobile_browser=False): def browser(honor_time=True, max_time=2, mobile_browser=False, user_agent=None):
''' '''
Create a mechanize browser for web scraping. The browser handles cookies, Create a mechanize browser for web scraping. The browser handles cookies,
refresh requests and ignores robots.txt. Also uses proxy if avaialable. refresh requests and ignores robots.txt. Also uses proxy if avaialable.
@ -253,8 +253,10 @@ def browser(honor_time=True, max_time=2, mobile_browser=False):
opener = Browser() opener = Browser()
opener.set_handle_refresh(True, max_time=max_time, honor_time=honor_time) opener.set_handle_refresh(True, max_time=max_time, honor_time=honor_time)
opener.set_handle_robots(False) opener.set_handle_robots(False)
opener.addheaders = [('User-agent', ' Mozilla/5.0 (Windows; U; Windows CE 5.1; rv:1.8.1a3) Gecko/20060610 Minimo/0.016' if mobile_browser else \ if user_agent is None:
'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101210 Gentoo Firefox/3.6.13')] user_agent = ' Mozilla/5.0 (Windows; U; Windows CE 5.1; rv:1.8.1a3) Gecko/20060610 Minimo/0.016' if mobile_browser else \
'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101210 Gentoo Firefox/3.6.13'
opener.addheaders = [('User-agent', user_agent)]
http_proxy = get_proxies().get('http', None) http_proxy = get_proxies().get('http', None)
if http_proxy: if http_proxy:
opener.set_proxies({'http':http_proxy}) opener.set_proxies({'http':http_proxy})

View File

@ -2,7 +2,7 @@ __license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net' __copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en' __docformat__ = 'restructuredtext en'
__appname__ = 'calibre' __appname__ = 'calibre'
__version__ = '0.7.40' __version__ = '0.7.42'
__author__ = "Kovid Goyal <kovid@kovidgoyal.net>" __author__ = "Kovid Goyal <kovid@kovidgoyal.net>"
import re import re

View File

@ -160,18 +160,6 @@ class InputFormatPlugin(Plugin):
''' '''
raise NotImplementedError() raise NotImplementedError()
def preprocess_html(self, opts, html):
'''
This method is called by the conversion pipeline on all HTML before it
is parsed. It is meant to be used to do any required preprocessing on
the HTML, like removing hard line breaks, etc.
:param html: A unicode string
:return: A unicode string
'''
return html
def convert(self, stream, options, file_ext, log, accelerators): def convert(self, stream, options, file_ext, log, accelerators):
''' '''
This method must be implemented in sub-classes. It must return This method must be implemented in sub-classes. It must return

View File

@ -21,7 +21,7 @@ class ANDROID(USBMS):
# HTC # HTC
0x0bb4 : { 0x0c02 : [0x100, 0x0227, 0x0226], 0x0c01 : [0x100, 0x0227], 0x0ff9 0x0bb4 : { 0x0c02 : [0x100, 0x0227, 0x0226], 0x0c01 : [0x100, 0x0227], 0x0ff9
: [0x0100, 0x0227, 0x0226], 0x0c87: [0x0100, 0x0227, 0x0226], : [0x0100, 0x0227, 0x0226], 0x0c87: [0x0100, 0x0227, 0x0226],
0xc92 : [0x100], 0xc97: [0x226]}, 0xc92 : [0x100], 0xc97: [0x226], 0xc99 : [0x0100]},
# Eken # Eken
0x040d : { 0x8510 : [0x0001], 0x0851 : [0x1] }, 0x040d : { 0x8510 : [0x0001], 0x0851 : [0x1] },
@ -54,7 +54,7 @@ class ANDROID(USBMS):
0x1004 : { 0x61cc : [0x100] }, 0x1004 : { 0x61cc : [0x100] },
# Archos # Archos
0x0e79 : { 0x1420 : [0x0216]}, 0x0e79 : { 0x1419: [0x0216], 0x1420 : [0x0216]},
} }
EBOOK_DIR_MAIN = ['eBooks/import', 'wordplayer/calibretransfer', 'Books'] EBOOK_DIR_MAIN = ['eBooks/import', 'wordplayer/calibretransfer', 'Books']
@ -70,10 +70,10 @@ class ANDROID(USBMS):
'__UMS_COMPOSITE', '_MB200', 'MASS_STORAGE', '_-_CARD', 'SGH-I897', '__UMS_COMPOSITE', '_MB200', 'MASS_STORAGE', '_-_CARD', 'SGH-I897',
'GT-I9000', 'FILE-STOR_GADGET', 'SGH-T959', 'SAMSUNG_ANDROID', 'GT-I9000', 'FILE-STOR_GADGET', 'SGH-T959', 'SAMSUNG_ANDROID',
'SCH-I500_CARD', 'SPH-D700_CARD', 'MB810', 'GT-P1000', 'DESIRE', 'SCH-I500_CARD', 'SPH-D700_CARD', 'MB810', 'GT-P1000', 'DESIRE',
'SGH-T849', '_MB300', 'A70S', 'S_ANDROID'] 'SGH-T849', '_MB300', 'A70S', 'S_ANDROID', 'A101IT']
WINDOWS_CARD_A_MEM = ['ANDROID_PHONE', 'GT-I9000_CARD', 'SGH-I897', WINDOWS_CARD_A_MEM = ['ANDROID_PHONE', 'GT-I9000_CARD', 'SGH-I897',
'FILE-STOR_GADGET', 'SGH-T959', 'SAMSUNG_ANDROID', 'GT-P1000_CARD', 'FILE-STOR_GADGET', 'SGH-T959', 'SAMSUNG_ANDROID', 'GT-P1000_CARD',
'A70S'] 'A70S', 'A101IT']
OSX_MAIN_MEM = 'Android Device Main Memory' OSX_MAIN_MEM = 'Android Device Main Memory'

View File

@ -106,7 +106,7 @@ class PDNOVEL(USBMS):
WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = '__UMS_COMPOSITE' WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = '__UMS_COMPOSITE'
THUMBNAIL_HEIGHT = 130 THUMBNAIL_HEIGHT = 130
EBOOK_DIR_MAIN = 'eBooks' EBOOK_DIR_MAIN = EBOOK_DIR_CARD_A = 'eBooks'
SUPPORTS_SUB_DIRS = False SUPPORTS_SUB_DIRS = False
DELETE_EXTS = ['.jpg', '.jpeg', '.png'] DELETE_EXTS = ['.jpg', '.jpeg', '.png']

View File

@ -98,6 +98,9 @@ class PRS505(USBMS):
THUMBNAIL_HEIGHT = 200 THUMBNAIL_HEIGHT = 200
MAX_PATH_LEN = 201 # 250 - (max(len(CACHE_THUMBNAIL), len(MEDIA_THUMBNAIL)) +
# len('main_thumbnail.jpg') + 1)
def windows_filter_pnp_id(self, pnp_id): def windows_filter_pnp_id(self, pnp_id):
return '_LAUNCHER' in pnp_id return '_LAUNCHER' in pnp_id
@ -201,10 +204,13 @@ class PRS505(USBMS):
self._card_b_prefix if idx == 2 \ self._card_b_prefix if idx == 2 \
else self._main_prefix else self._main_prefix
for book in bl: for book in bl:
p = os.path.join(prefix, book.lpath) try:
self._upload_cover(os.path.dirname(p), p = os.path.join(prefix, book.lpath)
os.path.splitext(os.path.basename(p))[0], self._upload_cover(os.path.dirname(p),
book, p) os.path.splitext(os.path.basename(p))[0],
book, p)
except:
debug_print('FAILED to upload cover', p)
else: else:
debug_print('PRS505: NOT uploading covers in sync_booklists') debug_print('PRS505: NOT uploading covers in sync_booklists')
@ -229,7 +235,10 @@ class PRS505(USBMS):
debug_print('PRS505: not uploading cover') debug_print('PRS505: not uploading cover')
return return
debug_print('PRS505: uploading cover') debug_print('PRS505: uploading cover')
self._upload_cover(path, filename, metadata, filepath) try:
self._upload_cover(path, filename, metadata, filepath)
except:
debug_print('FAILED to upload cover', filepath)
def _upload_cover(self, path, filename, metadata, filepath): def _upload_cover(self, path, filename, metadata, filepath):
if metadata.thumbnail and metadata.thumbnail[-1]: if metadata.thumbnail and metadata.thumbnail[-1]:

View File

@ -98,6 +98,9 @@ class Device(DeviceConfig, DevicePlugin):
# copy these back to the library # copy these back to the library
BACKLOADING_ERROR_MESSAGE = None BACKLOADING_ERROR_MESSAGE = None
#: The maximum length of paths created on the device
MAX_PATH_LEN = 250
def reset(self, key='-1', log_packets=False, report_progress=None, def reset(self, key='-1', log_packets=False, report_progress=None,
detected_device=None): detected_device=None):
self._main_prefix = self._card_a_prefix = self._card_b_prefix = None self._main_prefix = self._card_a_prefix = self._card_b_prefix = None
@ -875,7 +878,7 @@ class Device(DeviceConfig, DevicePlugin):
def create_upload_path(self, path, mdata, fname, create_dirs=True): def create_upload_path(self, path, mdata, fname, create_dirs=True):
path = os.path.abspath(path) path = os.path.abspath(path)
extra_components = [] maxlen = self.MAX_PATH_LEN
special_tag = None special_tag = None
if mdata.tags: if mdata.tags:
@ -902,7 +905,7 @@ class Device(DeviceConfig, DevicePlugin):
app_id = str(getattr(mdata, 'application_id', '')) app_id = str(getattr(mdata, 'application_id', ''))
# The db id will be in the created filename # The db id will be in the created filename
extra_components = get_components(template, mdata, fname, extra_components = get_components(template, mdata, fname,
timefmt=opts.send_timefmt, length=250-len(app_id)-1) timefmt=opts.send_timefmt, length=maxlen-len(app_id)-1)
if not extra_components: if not extra_components:
extra_components.append(sanitize(self.filename_callback(fname, extra_components.append(sanitize(self.filename_callback(fname,
mdata))) mdata)))
@ -937,12 +940,11 @@ class Device(DeviceConfig, DevicePlugin):
return ans return ans
extra_components = list(map(remove_trailing_periods, extra_components)) extra_components = list(map(remove_trailing_periods, extra_components))
components = shorten_components_to(250 - len(path), extra_components) components = shorten_components_to(maxlen - len(path), extra_components)
components = self.sanitize_path_components(components) components = self.sanitize_path_components(components)
filepath = os.path.join(path, *components) filepath = os.path.join(path, *components)
filedir = os.path.dirname(filepath) filedir = os.path.dirname(filepath)
if create_dirs and not os.path.exists(filedir): if create_dirs and not os.path.exists(filedir):
os.makedirs(filedir) os.makedirs(filedir)

View File

@ -75,7 +75,7 @@ class CHMInput(InputFormatPlugin):
def _create_oebbook(self, hhcpath, basedir, opts, log, mi): def _create_oebbook(self, hhcpath, basedir, opts, log, mi):
from calibre.ebooks.conversion.plumber import create_oebbook from calibre.ebooks.conversion.plumber import create_oebbook
from calibre.ebooks.oeb.base import DirContainer from calibre.ebooks.oeb.base import DirContainer
oeb = create_oebbook(log, None, opts, self, oeb = create_oebbook(log, None, opts,
encoding=opts.input_encoding, populate=False) encoding=opts.input_encoding, populate=False)
self.oeb = oeb self.oeb = oeb

View File

@ -42,6 +42,12 @@ option.
For full documentation of the conversion system see For full documentation of the conversion system see
''') + 'http://calibre-ebook.com/user_manual/conversion.html' ''') + 'http://calibre-ebook.com/user_manual/conversion.html'
HEURISTIC_OPTIONS = ['markup_chapter_headings',
'italicize_common_cases', 'fix_indents',
'html_unwrap_factor', 'unwrap_lines',
'delete_blank_paragraphs', 'format_scene_breaks',
'dehyphenate', 'renumber_headings']
def print_help(parser, log): def print_help(parser, log):
help = parser.format_help().encode(preferred_encoding, 'replace') help = parser.format_help().encode(preferred_encoding, 'replace')
log(help) log(help)
@ -83,6 +89,8 @@ def option_recommendation_to_cli_option(add_option, rec):
if opt.long_switch == 'verbose': if opt.long_switch == 'verbose':
attrs['action'] = 'count' attrs['action'] = 'count'
attrs.pop('type', '') attrs.pop('type', '')
if opt.name in HEURISTIC_OPTIONS and rec.recommended_value is True:
switches = ['--disable-'+opt.long_switch]
add_option(Option(*switches, **attrs)) add_option(Option(*switches, **attrs))
def add_input_output_options(parser, plumber): def add_input_output_options(parser, plumber):
@ -126,18 +134,33 @@ def add_pipeline_options(parser, plumber):
'margin_top', 'margin_left', 'margin_right', 'margin_top', 'margin_left', 'margin_right',
'margin_bottom', 'change_justification', 'margin_bottom', 'change_justification',
'insert_blank_line', 'remove_paragraph_spacing','remove_paragraph_spacing_indent_size', 'insert_blank_line', 'remove_paragraph_spacing','remove_paragraph_spacing_indent_size',
'asciiize', 'remove_header', 'header_regex', 'asciiize',
'remove_footer', 'footer_regex',
] ]
), ),
'HEURISTIC PROCESSING' : (
_('Modify the document text and structure using common'
' patterns. Disabled by default. Use %s to enable. '
' Individual actions can be disabled with the %s options.')
% ('--enable-heuristics', '--disable-*'),
['enable_heuristics'] + HEURISTIC_OPTIONS
),
'SEARCH AND REPLACE' : (
_('Modify the document text and structure using user defined patterns.'),
[
'sr1_search', 'sr1_replace',
'sr2_search', 'sr2_replace',
'sr3_search', 'sr3_replace',
]
),
'STRUCTURE DETECTION' : ( 'STRUCTURE DETECTION' : (
_('Control auto-detection of document structure.'), _('Control auto-detection of document structure.'),
[ [
'chapter', 'chapter_mark', 'chapter', 'chapter_mark',
'prefer_metadata_cover', 'remove_first_image', 'prefer_metadata_cover', 'remove_first_image',
'insert_metadata', 'page_breaks_before', 'insert_metadata', 'page_breaks_before',
'preprocess_html', 'html_unwrap_factor',
] ]
), ),
@ -164,7 +187,8 @@ def add_pipeline_options(parser, plumber):
} }
group_order = ['', 'LOOK AND FEEL', 'STRUCTURE DETECTION', group_order = ['', 'LOOK AND FEEL', 'HEURISTIC PROCESSING',
'SEARCH AND REPLACE', 'STRUCTURE DETECTION',
'TABLE OF CONTENTS', 'METADATA', 'DEBUG'] 'TABLE OF CONTENTS', 'METADATA', 'DEBUG']
for group in group_order: for group in group_order:

View File

@ -72,7 +72,8 @@ class Plumber(object):
] ]
def __init__(self, input, output, log, report_progress=DummyReporter(), def __init__(self, input, output, log, report_progress=DummyReporter(),
dummy=False, merge_plugin_recs=True, abort_after_input_dump=False): dummy=False, merge_plugin_recs=True, abort_after_input_dump=False,
override_input_metadata=False):
''' '''
:param input: Path to input file. :param input: Path to input file.
:param output: Path to output file/directory :param output: Path to output file/directory
@ -87,6 +88,7 @@ class Plumber(object):
self.log = log self.log = log
self.ui_reporter = report_progress self.ui_reporter = report_progress
self.abort_after_input_dump = abort_after_input_dump self.abort_after_input_dump = abort_after_input_dump
self.override_input_metadata = override_input_metadata
# Pipeline options {{{ # Pipeline options {{{
# Initialize the conversion options that are independent of input and # Initialize the conversion options that are independent of input and
@ -376,23 +378,6 @@ OptionRecommendation(name='insert_metadata',
) )
), ),
OptionRecommendation(name='preprocess_html',
recommended_value=False, level=OptionRecommendation.LOW,
help=_('Attempt to detect and correct hard line breaks and other '
'problems in the source file. This may make things worse, so use '
'with care.'
)
),
OptionRecommendation(name='html_unwrap_factor',
recommended_value=0.40, level=OptionRecommendation.LOW,
help=_('Scale used to determine the length at which a line should '
'be unwrapped if preprocess is enabled. Valid values are a decimal between 0 and 1. The '
'default is 0.40, just below the median line length. This will unwrap typical books '
' with hard line breaks, but should be reduced if the line length is variable.'
)
),
OptionRecommendation(name='smarten_punctuation', OptionRecommendation(name='smarten_punctuation',
recommended_value=False, level=OptionRecommendation.LOW, recommended_value=False, level=OptionRecommendation.LOW,
help=_('Convert plain quotes, dashes and ellipsis to their ' help=_('Convert plain quotes, dashes and ellipsis to their '
@ -401,32 +386,6 @@ OptionRecommendation(name='smarten_punctuation',
) )
), ),
OptionRecommendation(name='remove_header',
recommended_value=False, level=OptionRecommendation.LOW,
help=_('Use a regular expression to try and remove the header.'
)
),
OptionRecommendation(name='header_regex',
recommended_value='(?i)(?<=<hr>)((\s*<a name=\d+></a>((<img.+?>)*<br>\s*)?\d+<br>\s*.*?\s*)|(\s*<a name=\d+></a>((<img.+?>)*<br>\s*)?.*?<br>\s*\d+))(?=<br>)',
level=OptionRecommendation.LOW,
help=_('The regular expression to use to remove the header.'
)
),
OptionRecommendation(name='remove_footer',
recommended_value=False, level=OptionRecommendation.LOW,
help=_('Use a regular expression to try and remove the footer.'
)
),
OptionRecommendation(name='footer_regex',
recommended_value='(?i)(?<=<hr>)((\s*<a name=\d+></a>((<img.+?>)*<br>\s*)?\d+<br>\s*.*?\s*)|(\s*<a name=\d+></a>((<img.+?>)*<br>\s*)?.*?<br>\s*\d+))(?=<br>)',
level=OptionRecommendation.LOW,
help=_('The regular expression to use to remove the footer.'
)
),
OptionRecommendation(name='read_metadata_from_opf', OptionRecommendation(name='read_metadata_from_opf',
recommended_value=None, level=OptionRecommendation.LOW, recommended_value=None, level=OptionRecommendation.LOW,
short_switch='m', short_switch='m',
@ -527,6 +486,89 @@ OptionRecommendation(name='timestamp',
recommended_value=None, level=OptionRecommendation.LOW, recommended_value=None, level=OptionRecommendation.LOW,
help=_('Set the book timestamp (used by the date column in calibre).')), help=_('Set the book timestamp (used by the date column in calibre).')),
OptionRecommendation(name='enable_heuristics',
recommended_value=False, level=OptionRecommendation.LOW,
help=_('Enable heuristic processing. This option must be set for any '
'heuristic processing to take place.')),
OptionRecommendation(name='markup_chapter_headings',
recommended_value=True, level=OptionRecommendation.LOW,
help=_('Detect unformatted chapter headings and sub headings. Change '
'them to h2 and h3 tags. This setting will not create a TOC, '
'but can be used in conjunction with structure detection to create '
'one.')),
OptionRecommendation(name='italicize_common_cases',
recommended_value=True, level=OptionRecommendation.LOW,
help=_('Look for common words and patterns that denote '
'italics and italicize them.')),
OptionRecommendation(name='fix_indents',
recommended_value=True, level=OptionRecommendation.LOW,
help=_('Turn indentation created from multiple non-breaking space entities '
'into CSS indents.')),
OptionRecommendation(name='html_unwrap_factor',
recommended_value=0.40, level=OptionRecommendation.LOW,
help=_('Scale used to determine the length at which a line should '
'be unwrapped. Valid values are a decimal between 0 and 1. The '
'default is 0.4, just below the median line length. If only a '
'few lines in the document require unwrapping this value should '
'be reduced')),
OptionRecommendation(name='unwrap_lines',
recommended_value=True, level=OptionRecommendation.LOW,
help=_('Unwrap lines using punctuation and other formatting clues.')),
OptionRecommendation(name='delete_blank_paragraphs',
recommended_value=True, level=OptionRecommendation.LOW,
help=_('Remove empty paragraphs from the document when they exist between '
'every other paragraph')),
OptionRecommendation(name='format_scene_breaks',
recommended_value=True, level=OptionRecommendation.LOW,
help=_('Left aligned scene break markers are center aligned. '
'Replace soft scene breaks that use multiple blank lines with'
'horizontal rules.')),
OptionRecommendation(name='dehyphenate',
recommended_value=True, level=OptionRecommendation.LOW,
help=_('Analyze hyphenated words throughout the document. The '
'document itself is used as a dictionary to determine whether hyphens '
'should be retained or removed.')),
OptionRecommendation(name='renumber_headings',
recommended_value=True, level=OptionRecommendation.LOW,
help=_('Looks for occurrences of sequential <h1> or <h2> tags. '
'The tags are renumbered to prevent splitting in the middle '
'of chapter headings.')),
OptionRecommendation(name='sr1_search',
recommended_value='', level=OptionRecommendation.LOW,
help=_('Search pattern (regular expression) to be replaced with '
'sr1-replace.')),
OptionRecommendation(name='sr1_replace',
recommended_value='', level=OptionRecommendation.LOW,
help=_('Replacement to replace the text found with sr1-search.')),
OptionRecommendation(name='sr2_search',
recommended_value='', level=OptionRecommendation.LOW,
help=_('Search pattern (regular expression) to be replaced with '
'sr2-replace.')),
OptionRecommendation(name='sr2_replace',
recommended_value='', level=OptionRecommendation.LOW,
help=_('Replacement to replace the text found with sr2-search.')),
OptionRecommendation(name='sr3_search',
recommended_value='', level=OptionRecommendation.LOW,
help=_('Search pattern (regular expression) to be replaced with '
'sr3-replace.')),
OptionRecommendation(name='sr3_replace',
recommended_value='', level=OptionRecommendation.LOW,
help=_('Replacement to replace the text found with sr3-search.')),
] ]
# }}} # }}}
@ -861,7 +903,6 @@ OptionRecommendation(name='timestamp',
self.opts_to_mi(self.user_metadata) self.opts_to_mi(self.user_metadata)
if not hasattr(self.oeb, 'manifest'): if not hasattr(self.oeb, 'manifest'):
self.oeb = create_oebbook(self.log, self.oeb, self.opts, self.oeb = create_oebbook(self.log, self.oeb, self.opts,
self.input_plugin,
encoding=self.input_plugin.output_encoding) encoding=self.input_plugin.output_encoding)
self.input_plugin.postprocess_book(self.oeb, self.opts, self.log) self.input_plugin.postprocess_book(self.oeb, self.opts, self.log)
self.opts.is_image_collection = self.input_plugin.is_image_collection self.opts.is_image_collection = self.input_plugin.is_image_collection
@ -885,7 +926,8 @@ OptionRecommendation(name='timestamp',
self.opts.dest = self.opts.output_profile self.opts.dest = self.opts.output_profile
from calibre.ebooks.oeb.transforms.metadata import MergeMetadata from calibre.ebooks.oeb.transforms.metadata import MergeMetadata
MergeMetadata()(self.oeb, self.user_metadata, self.opts) MergeMetadata()(self.oeb, self.user_metadata, self.opts,
override_input_metadata=self.override_input_metadata)
pr(0.2) pr(0.2)
self.flush() self.flush()
@ -971,14 +1013,13 @@ OptionRecommendation(name='timestamp',
self.log(self.output_fmt.upper(), 'output written to', self.output) self.log(self.output_fmt.upper(), 'output written to', self.output)
self.flush() self.flush()
def create_oebbook(log, path_or_stream, opts, input_plugin, reader=None, def create_oebbook(log, path_or_stream, opts, reader=None,
encoding='utf-8', populate=True): encoding='utf-8', populate=True):
''' '''
Create an OEBBook. Create an OEBBook.
''' '''
from calibre.ebooks.oeb.base import OEBBook from calibre.ebooks.oeb.base import OEBBook
html_preprocessor = HTMLPreProcessor(input_plugin.preprocess_html, html_preprocessor = HTMLPreProcessor(log, opts)
opts.preprocess_html, opts)
if not encoding: if not encoding:
encoding = None encoding = None
oeb = OEBBook(log, html_preprocessor, oeb = OEBBook(log, html_preprocessor,

View File

@ -7,7 +7,7 @@ __docformat__ = 'restructuredtext en'
import functools, re import functools, re
from calibre import entity_to_unicode from calibre import entity_to_unicode, as_unicode
XMLDECL_RE = re.compile(r'^\s*<[?]xml.*?[?]>') XMLDECL_RE = re.compile(r'^\s*<[?]xml.*?[?]>')
SVG_NS = 'http://www.w3.org/2000/svg' SVG_NS = 'http://www.w3.org/2000/svg'
@ -174,13 +174,19 @@ class Dehyphenator(object):
retain hyphens. retain hyphens.
''' '''
def __init__(self): def __init__(self, verbose=0, log=None):
self.log = log
self.verbose = verbose
# Add common suffixes to the regex below to increase the likelihood of a match - # Add common suffixes to the regex below to increase the likelihood of a match -
# don't add suffixes which are also complete words, such as 'able' or 'sex' # don't add suffixes which are also complete words, such as 'able' or 'sex'
self.removesuffixes = re.compile(r"((ed)?ly|('e)?s|a?(t|s)?ion(s|al(ly)?)?|ings?|er|(i)?ous|(i|a)ty|(it)?ies|ive|gence|istic(ally)?|(e|a)nce|m?ents?|ism|ated|(e|u)ct(ed)?|ed|(i|ed)?ness|(e|a)ncy|ble|ier|al|ex|ian)$", re.IGNORECASE) # only remove if it's not already the point of hyphenation
self.suffix_string = "((ed)?ly|'?e?s||a?(t|s)?ion(s|al(ly)?)?|ings?|er|(i)?ous|(i|a)ty|(it)?ies|ive|gence|istic(ally)?|(e|a)nce|m?ents?|ism|ated|(e|u)ct(ed)?|ed|(i|ed)?ness|(e|a)ncy|ble|ier|al|ex|ian)$"
self.suffixes = re.compile(r"^%s" % self.suffix_string, re.IGNORECASE)
self.removesuffixes = re.compile(r"%s" % self.suffix_string, re.IGNORECASE)
# remove prefixes if the prefix was not already the point of hyphenation # remove prefixes if the prefix was not already the point of hyphenation
self.prefixes = re.compile(r'^(dis|re|un|in|ex)$', re.IGNORECASE) self.prefix_string = '^(dis|re|un|in|ex)'
self.removeprefix = re.compile(r'^(dis|re|un|in|ex)', re.IGNORECASE) self.prefixes = re.compile(r'%s$' % self.prefix_string, re.IGNORECASE)
self.removeprefix = re.compile(r'%s' % self.prefix_string, re.IGNORECASE)
def dehyphenate(self, match): def dehyphenate(self, match):
firsthalf = match.group('firstpart') firsthalf = match.group('firstpart')
@ -191,31 +197,48 @@ class Dehyphenator(object):
wraptags = '' wraptags = ''
hyphenated = unicode(firsthalf) + "-" + unicode(secondhalf) hyphenated = unicode(firsthalf) + "-" + unicode(secondhalf)
dehyphenated = unicode(firsthalf) + unicode(secondhalf) dehyphenated = unicode(firsthalf) + unicode(secondhalf)
lookupword = self.removesuffixes.sub('', dehyphenated) if self.suffixes.match(secondhalf) is None:
if self.prefixes.match(firsthalf) is None: lookupword = self.removesuffixes.sub('', dehyphenated)
else:
lookupword = dehyphenated
if len(firsthalf) > 4 and self.prefixes.match(firsthalf) is None:
lookupword = self.removeprefix.sub('', lookupword) lookupword = self.removeprefix.sub('', lookupword)
#print "lookup word is: "+str(lookupword)+", orig is: " + str(hyphenated) if self.verbose > 2:
self.log("lookup word is: "+str(lookupword)+", orig is: " + str(hyphenated))
try: try:
searchresult = self.html.find(lookupword.lower()) searchresult = self.html.find(lookupword.lower())
except: except:
return hyphenated return hyphenated
if self.format == 'html_cleanup' or self.format == 'txt_cleanup': if self.format == 'html_cleanup' or self.format == 'txt_cleanup':
if self.html.find(lookupword) != -1 or searchresult != -1: if self.html.find(lookupword) != -1 or searchresult != -1:
#print "Cleanup:returned dehyphenated word: " + str(dehyphenated) if self.verbose > 2:
self.log(" Cleanup:returned dehyphenated word: " + str(dehyphenated))
return dehyphenated return dehyphenated
elif self.html.find(hyphenated) != -1: elif self.html.find(hyphenated) != -1:
#print "Cleanup:returned hyphenated word: " + str(hyphenated) if self.verbose > 2:
self.log(" Cleanup:returned hyphenated word: " + str(hyphenated))
return hyphenated return hyphenated
else: else:
#print "Cleanup:returning original text "+str(firsthalf)+" + linefeed "+str(secondhalf) if self.verbose > 2:
self.log(" Cleanup:returning original text "+str(firsthalf)+" + linefeed "+str(secondhalf))
return firsthalf+u'\u2014'+wraptags+secondhalf return firsthalf+u'\u2014'+wraptags+secondhalf
else: else:
if self.format == 'individual_words' and len(firsthalf) + len(secondhalf) <= 6:
if self.verbose > 2:
self.log("too short, returned hyphenated word: " + str(hyphenated))
return hyphenated
if len(firsthalf) <= 2 and len(secondhalf) <= 2:
if self.verbose > 2:
self.log("too short, returned hyphenated word: " + str(hyphenated))
return hyphenated
if self.html.find(lookupword) != -1 or searchresult != -1: if self.html.find(lookupword) != -1 or searchresult != -1:
#print "returned dehyphenated word: " + str(dehyphenated) if self.verbose > 2:
self.log(" returned dehyphenated word: " + str(dehyphenated))
return dehyphenated return dehyphenated
else: else:
#print " returned hyphenated word: " + str(hyphenated) if self.verbose > 2:
self.log(" returned hyphenated word: " + str(hyphenated))
return hyphenated return hyphenated
def __call__(self, html, format, length=1): def __call__(self, html, format, length=1):
@ -228,7 +251,7 @@ class Dehyphenator(object):
elif format == 'txt': elif format == 'txt':
intextmatch = re.compile(u'(?<=.{%i})(?P<firstpart>[^\[\]\\\^\$\.\|\?\*\+\(\)“"\s>]+)(-|)(\u0020|\u0009)*(?P<wraptags>(\n(\u0020|\u0009)*)+)(?P<secondpart>[\w\d]+)'% length) intextmatch = re.compile(u'(?<=.{%i})(?P<firstpart>[^\[\]\\\^\$\.\|\?\*\+\(\)“"\s>]+)(-|)(\u0020|\u0009)*(?P<wraptags>(\n(\u0020|\u0009)*)+)(?P<secondpart>[\w\d]+)'% length)
elif format == 'individual_words': elif format == 'individual_words':
intextmatch = re.compile(u'>[^<]*\b(?P<firstpart>[^\[\]\\\^\$\.\|\?\*\+\(\)"\s>]+)(-|)\u0020*(?P<secondpart>\w+)\b[^<]*<') # for later, not called anywhere yet intextmatch = re.compile(u'(?!<)(?P<firstpart>\w+)(-|)\s*(?P<secondpart>\w+)(?![^<]*?>)')
elif format == 'html_cleanup': elif format == 'html_cleanup':
intextmatch = re.compile(u'(?P<firstpart>[^\[\]\\\^\$\.\|\?\*\+\(\)“"\s>]+)(-|)\s*(?=<)(?P<wraptags></span>\s*(</[iubp]>\s*<[iubp][^>]*>\s*)?<span[^>]*>|</[iubp]>\s*<[iubp][^>]*>)?\s*(?P<secondpart>[\w\d]+)') intextmatch = re.compile(u'(?P<firstpart>[^\[\]\\\^\$\.\|\?\*\+\(\)“"\s>]+)(-|)\s*(?=<)(?P<wraptags></span>\s*(</[iubp]>\s*<[iubp][^>]*>\s*)?<span[^>]*>|</[iubp]>\s*<[iubp][^>]*>)?\s*(?P<secondpart>[\w\d]+)')
elif format == 'txt_cleanup': elif format == 'txt_cleanup':
@ -397,10 +420,8 @@ class HTMLPreProcessor(object):
(re.compile('<span[^><]*?id=subtitle[^><]*?>(.*?)</span>', re.IGNORECASE|re.DOTALL), (re.compile('<span[^><]*?id=subtitle[^><]*?>(.*?)</span>', re.IGNORECASE|re.DOTALL),
lambda match : '<h3 class="subtitle">%s</h3>'%(match.group(1),)), lambda match : '<h3 class="subtitle">%s</h3>'%(match.group(1),)),
] ]
def __init__(self, input_plugin_preprocess, plugin_preprocess, def __init__(self, log=None, extra_opts=None):
extra_opts=None): self.log = log
self.input_plugin_preprocess = input_plugin_preprocess
self.plugin_preprocess = plugin_preprocess
self.extra_opts = extra_opts self.extra_opts = extra_opts
def is_baen(self, src): def is_baen(self, src):
@ -436,27 +457,20 @@ class HTMLPreProcessor(object):
if not getattr(self.extra_opts, 'keep_ligatures', False): if not getattr(self.extra_opts, 'keep_ligatures', False):
html = _ligpat.sub(lambda m:LIGATURES[m.group()], html) html = _ligpat.sub(lambda m:LIGATURES[m.group()], html)
for search, replace in [['sr3_search', 'sr3_replace'], ['sr2_search', 'sr2_replace'], ['sr1_search', 'sr1_replace']]:
search_pattern = getattr(self.extra_opts, search, '')
if search_pattern:
try:
search_re = re.compile(search_pattern)
replace_txt = getattr(self.extra_opts, replace, '')
if not replace_txt:
replace_txt = ''
rules.insert(0, (search_re, replace_txt))
except Exception as e:
self.log.error('Failed to parse %r regexp because %s' %
(search, as_unicode(e)))
end_rules = [] end_rules = []
if getattr(self.extra_opts, 'remove_header', None):
try:
rules.insert(0,
(re.compile(self.extra_opts.header_regex), lambda match : '')
)
except:
import traceback
print 'Failed to parse remove_header regexp'
traceback.print_exc()
if getattr(self.extra_opts, 'remove_footer', None):
try:
rules.insert(0,
(re.compile(self.extra_opts.footer_regex), lambda match : '')
)
except:
import traceback
print 'Failed to parse remove_footer regexp'
traceback.print_exc()
# delete soft hyphens - moved here so it's executed after header/footer removal # delete soft hyphens - moved here so it's executed after header/footer removal
if is_pdftohtml: if is_pdftohtml:
# unwrap/delete soft hyphens # unwrap/delete soft hyphens
@ -464,12 +478,6 @@ class HTMLPreProcessor(object):
# unwrap/delete soft hyphens with formatting # unwrap/delete soft hyphens with formatting
end_rules.append((re.compile(u'[­]\s*(</(i|u|b)>)+(</p>\s*<p>\s*)+\s*(<(i|u|b)>)+\s*(?=[[a-z\d])'), lambda match: '')) end_rules.append((re.compile(u'[­]\s*(</(i|u|b)>)+(</p>\s*<p>\s*)+\s*(<(i|u|b)>)+\s*(?=[[a-z\d])'), lambda match: ''))
# Make the more aggressive chapter marking regex optional with the preprocess option to
# reduce false positives and move after header/footer removal
if getattr(self.extra_opts, 'preprocess_html', None):
if is_pdftohtml:
end_rules.append((re.compile(r'<p>\s*(?P<chap>(<[ibu]>){0,2}\s*([A-Z \'"!]{3,})\s*([\dA-Z:]+\s){0,4}\s*(</[ibu]>){0,2})\s*<p>\s*(?P<title>(<[ibu]>){0,2}(\s*\w+){1,4}\s*(</[ibu]>){0,2}\s*<p>)?'), chap_head),)
length = -1 length = -1
if getattr(self.extra_opts, 'unwrap_factor', 0.0) > 0.01: if getattr(self.extra_opts, 'unwrap_factor', 0.0) > 0.01:
docanalysis = DocAnalysis('pdf', html) docanalysis = DocAnalysis('pdf', html)
@ -512,15 +520,14 @@ class HTMLPreProcessor(object):
if is_pdftohtml and length > -1: if is_pdftohtml and length > -1:
# Dehyphenate # Dehyphenate
dehyphenator = Dehyphenator() dehyphenator = Dehyphenator(self.extra_opts.verbose, self.log)
html = dehyphenator(html,'html', length) html = dehyphenator(html,'html', length)
if is_pdftohtml: if is_pdftohtml:
from calibre.ebooks.conversion.utils import PreProcessor from calibre.ebooks.conversion.utils import HeuristicProcessor
pdf_markup = PreProcessor(self.extra_opts, None) pdf_markup = HeuristicProcessor(self.extra_opts, None)
totalwords = 0 totalwords = 0
totalwords = pdf_markup.get_word_count(html) if pdf_markup.get_word_count(html) > 7000:
if totalwords > 7000:
html = pdf_markup.markup_chapters(html, totalwords, True) html = pdf_markup.markup_chapters(html, totalwords, True)
#dump(html, 'post-preprocess') #dump(html, 'post-preprocess')
@ -540,8 +547,10 @@ class HTMLPreProcessor(object):
unidecoder = Unidecoder() unidecoder = Unidecoder()
html = unidecoder.decode(html) html = unidecoder.decode(html)
if self.plugin_preprocess: if getattr(self.extra_opts, 'enable_heuristics', False):
html = self.input_plugin_preprocess(self.extra_opts, html) from calibre.ebooks.conversion.utils import HeuristicProcessor
preprocessor = HeuristicProcessor(self.extra_opts, self.log)
html = preprocessor(html)
if getattr(self.extra_opts, 'smarten_punctuation', False): if getattr(self.extra_opts, 'smarten_punctuation', False):
html = self.smarten_punctuation(html) html = self.smarten_punctuation(html)

View File

@ -11,13 +11,22 @@ from calibre.ebooks.conversion.preprocess import DocAnalysis, Dehyphenator
from calibre.utils.logging import default_log from calibre.utils.logging import default_log
from calibre.utils.wordcount import get_wordcount_obj from calibre.utils.wordcount import get_wordcount_obj
class PreProcessor(object): class HeuristicProcessor(object):
def __init__(self, extra_opts=None, log=None): def __init__(self, extra_opts=None, log=None):
self.log = default_log if log is None else log self.log = default_log if log is None else log
self.html_preprocess_sections = 0 self.html_preprocess_sections = 0
self.found_indents = 0 self.found_indents = 0
self.extra_opts = extra_opts self.extra_opts = extra_opts
self.deleted_nbsps = False
self.totalwords = 0
self.min_chapters = 1
self.chapters_no_title = 0
self.chapters_with_title = 0
self.blanks_deleted = False
self.linereg = re.compile('(?<=<p).*?(?=</p>)', re.IGNORECASE|re.DOTALL)
self.blankreg = re.compile(r'\s*(?P<openline><p(?!\sid=\"softbreak\")[^>]*>)\s*(?P<closeline></p>)', re.IGNORECASE)
self.multi_blank = re.compile(r'(\s*<p[^>]*>\s*</p>){2,}', re.IGNORECASE)
def is_pdftohtml(self, src): def is_pdftohtml(self, src):
return '<!-- created by calibre\'s pdftohtml -->' in src[:1000] return '<!-- created by calibre\'s pdftohtml -->' in src[:1000]
@ -27,12 +36,12 @@ class PreProcessor(object):
title = match.group('title') title = match.group('title')
if not title: if not title:
self.html_preprocess_sections = self.html_preprocess_sections + 1 self.html_preprocess_sections = self.html_preprocess_sections + 1
self.log("marked " + unicode(self.html_preprocess_sections) + self.log.debug("marked " + unicode(self.html_preprocess_sections) +
" chapters. - " + unicode(chap)) " chapters. - " + unicode(chap))
return '<h2>'+chap+'</h2>\n' return '<h2>'+chap+'</h2>\n'
else: else:
self.html_preprocess_sections = self.html_preprocess_sections + 1 self.html_preprocess_sections = self.html_preprocess_sections + 1
self.log("marked " + unicode(self.html_preprocess_sections) + self.log.debug("marked " + unicode(self.html_preprocess_sections) +
" chapters & titles. - " + unicode(chap) + ", " + unicode(title)) " chapters & titles. - " + unicode(chap) + ", " + unicode(title))
return '<h2>'+chap+'</h2>\n<h3>'+title+'</h3>\n' return '<h2>'+chap+'</h2>\n<h3>'+title+'</h3>\n'
@ -40,10 +49,18 @@ class PreProcessor(object):
chap = match.group('section') chap = match.group('section')
styles = match.group('styles') styles = match.group('styles')
self.html_preprocess_sections = self.html_preprocess_sections + 1 self.html_preprocess_sections = self.html_preprocess_sections + 1
self.log("marked " + unicode(self.html_preprocess_sections) + self.log.debug("marked " + unicode(self.html_preprocess_sections) +
" section markers based on punctuation. - " + unicode(chap)) " section markers based on punctuation. - " + unicode(chap))
return '<'+styles+' style="page-break-before:always">'+chap return '<'+styles+' style="page-break-before:always">'+chap
def analyze_title_matches(self, match):
#chap = match.group('chap')
title = match.group('title')
if not title:
self.chapters_no_title = self.chapters_no_title + 1
else:
self.chapters_with_title = self.chapters_with_title + 1
def insert_indent(self, match): def insert_indent(self, match):
pstyle = match.group('formatting') pstyle = match.group('formatting')
span = match.group('span') span = match.group('span')
@ -75,8 +92,8 @@ class PreProcessor(object):
line_end = line_end_ere.findall(raw) line_end = line_end_ere.findall(raw)
tot_htm_ends = len(htm_end) tot_htm_ends = len(htm_end)
tot_ln_fds = len(line_end) tot_ln_fds = len(line_end)
self.log("There are " + unicode(tot_ln_fds) + " total Line feeds, and " + #self.log.debug("There are " + unicode(tot_ln_fds) + " total Line feeds, and " +
unicode(tot_htm_ends) + " marked up endings") # unicode(tot_htm_ends) + " marked up endings")
if percent > 1: if percent > 1:
percent = 1 percent = 1
@ -84,9 +101,8 @@ class PreProcessor(object):
percent = 0 percent = 0
min_lns = tot_ln_fds * percent min_lns = tot_ln_fds * percent
self.log("There must be fewer than " + unicode(min_lns) + " unmarked lines to add markup") #self.log.debug("There must be fewer than " + unicode(min_lns) + " unmarked lines to add markup")
if min_lns > tot_htm_ends: return min_lns > tot_htm_ends
return True
def dump(self, raw, where): def dump(self, raw, where):
import os import os
@ -112,16 +128,55 @@ class PreProcessor(object):
wordcount = get_wordcount_obj(word_count_text) wordcount = get_wordcount_obj(word_count_text)
return wordcount.words return wordcount.words
def markup_italicis(self, html):
ITALICIZE_WORDS = [
'Etc.', 'etc.', 'viz.', 'ie.', 'i.e.', 'Ie.', 'I.e.', 'eg.',
'e.g.', 'Eg.', 'E.g.', 'et al.', 'et cetera', 'n.b.', 'N.b.',
'nota bene', 'Nota bene', 'Ste.', 'Mme.', 'Mdme.',
'Mlle.', 'Mons.', 'PS.', 'PPS.',
]
ITALICIZE_STYLE_PATS = [
r'(?msu)(?<=\s)_(?P<words>\S[^_]{0,40}?\S)?_(?=\s)',
r'(?msu)(?<=\s)/(?P<words>\S[^/]{0,40}?\S)?/(?=\s)',
r'(?msu)(?<=\s)~~(?P<words>\S[^~]{0,40}?\S)?~~(?=\s)',
r'(?msu)(?<=\s)\*(?P<words>\S[^\*]{0,40}?\S)?\*(?=\s)',
r'(?msu)(?<=\s)~(?P<words>\S[^~]{0,40}?\S)?~(?=\s)',
r'(?msu)(?<=\s)_/(?P<words>\S[^/_]{0,40}?\S)?/_(?=\s)',
r'(?msu)(?<=\s)_\*(?P<words>\S[^\*_]{0,40}?\S)?\*_(?=\s)',
r'(?msu)(?<=\s)\*/(?P<words>\S[^/\*]{0,40}?\S)?/\*(?=\s)',
r'(?msu)(?<=\s)_\*/(?P<words>\S[^\*_]{0,40}?\S)?/\*_(?=\s)',
r'(?msu)(?<=\s)/:(?P<words>\S[^:/]{0,40}?\S)?:/(?=\s)',
r'(?msu)(?<=\s)\|:(?P<words>\S[^:\|]{0,40}?\S)?:\|(?=\s)',
]
for word in ITALICIZE_WORDS:
html = html.replace(word, '<i>%s</i>' % word)
for pat in ITALICIZE_STYLE_PATS:
html = re.sub(pat, lambda mo: '<i>%s</i>' % mo.group('words'), html)
return html
def markup_chapters(self, html, wordcount, blanks_between_paragraphs): def markup_chapters(self, html, wordcount, blanks_between_paragraphs):
'''
Searches for common chapter headings throughout the document
attempts multiple patterns based on likelihood of a match
with minimum false positives. Exits after finding a successful pattern
'''
# Typical chapters are between 2000 and 7000 words, use the larger number to decide the # Typical chapters are between 2000 and 7000 words, use the larger number to decide the
# minimum of chapters to search for # minimum of chapters to search for. A max limit is calculated to prevent things like OCR
self.min_chapters = 1 # or pdf page numbers from being treated as TOC markers
max_chapters = 150
typical_chapters = 7000.
if wordcount > 7000: if wordcount > 7000:
self.min_chapters = int(ceil(wordcount / 7000.)) if wordcount > 200000:
#print "minimum chapters required are: "+str(self.min_chapters) typical_chapters = 15000.
self.min_chapters = int(ceil(wordcount / typical_chapters))
self.log.debug("minimum chapters required are: "+str(self.min_chapters))
heading = re.compile('<h[1-3][^>]*>', re.IGNORECASE) heading = re.compile('<h[1-3][^>]*>', re.IGNORECASE)
self.html_preprocess_sections = len(heading.findall(html)) self.html_preprocess_sections = len(heading.findall(html))
self.log("found " + unicode(self.html_preprocess_sections) + " pre-existing headings") self.log.debug("found " + unicode(self.html_preprocess_sections) + " pre-existing headings")
# Build the Regular Expressions in pieces # Build the Regular Expressions in pieces
init_lookahead = "(?=<(p|div))" init_lookahead = "(?=<(p|div))"
@ -151,103 +206,160 @@ class PreProcessor(object):
n_lookahead_open = "\s+(?!" n_lookahead_open = "\s+(?!"
n_lookahead_close = ")" n_lookahead_close = ")"
default_title = r"(<[ibu][^>]*>)?\s{0,3}([\w\:\'\"-]+\s{0,3}){1,5}?(</[ibu][^>]*>)?(?=<)" default_title = r"(<[ibu][^>]*>)?\s{0,3}(?!Chapter)([\w\:\'\"-]+\s{0,3}){1,5}?(</[ibu][^>]*>)?(?=<)"
simple_title = r"(<[ibu][^>]*>)?\s{0,3}(?!(Chapter|\s+<)).{0,65}?(</[ibu][^>]*>)?(?=<)"
analysis_result = []
chapter_types = [ chapter_types = [
[r"[^'\"]?(Introduction|Synopsis|Acknowledgements|Chapter|Kapitel|Epilogue|Volume\s|Prologue|Book\s|Part\s|Dedication|Preface)\s*([\d\w-]+\:?\'?\s*){0,5}", True, "Searching for common Chapter Headings"], [r"[^'\"]?(Introduction|Synopsis|Acknowledgements|Epilogue|CHAPTER|Kapitel|Volume\b|Prologue|Book\b|Part\b|Dedication|Preface)\s*([\d\w-]+\:?\'?\s*){0,5}", True, True, True, False, "Searching for common section headings", 'common'],
[r"([A-Z-]\s+){3,}\s*([\d\w-]+\s*){0,3}\s*", True, "Searching for letter spaced headings"], # Spaced Lettering [r"[^'\"]?(CHAPTER|Kapitel)\s*([\dA-Z\-\'\"\?!#,]+\s*){0,7}\s*", True, True, True, False, "Searching for most common chapter headings", 'chapter'], # Highest frequency headings which include titles
[r"<b[^>]*>\s*(<span[^>]*>)?\s*(?!([*#•]+\s*)+)(\s*(?=[\d.\w#\-*\s]+<)([\d.\w#-*]+\s*){1,5}\s*)(?!\.)(</span>)?\s*</b>", True, "Searching for emphasized lines"], # Emphasized lines [r"<b[^>]*>\s*(<span[^>]*>)?\s*(?!([*#•=]+\s*)+)(\s*(?=[\d.\w#\-*\s]+<)([\d.\w#-*]+\s*){1,5}\s*)(?!\.)(</span>)?\s*</b>", True, True, True, False, "Searching for emphasized lines", 'emphasized'], # Emphasized lines
[r"[^'\"]?(\d+(\.|:)|CHAPTER)\s*([\dA-Z\-\'\"#,]+\s*){0,7}\s*", True, "Searching for numeric chapter headings"], # Numeric Chapters [r"[^'\"]?(\d+(\.|:))\s*([\dA-Z\-\'\"#,]+\s*){0,7}\s*", True, True, True, False, "Searching for numeric chapter headings", 'numeric'], # Numeric Chapters
[r"[^'\"]?(\d+\.?\s+([\d\w-]+\:?\'?-?\s?){0,5})\s*", True, "Searching for numeric chapters with titles"], # Numeric Titles [r"([A-Z]\s+){3,}\s*([\d\w-]+\s*){0,3}\s*", True, True, True, False, "Searching for letter spaced headings", 'letter_spaced'], # Spaced Lettering
[r"[^'\"]?(\d+|CHAPTER)\s*([\dA-Z\-\'\"\?!#,]+\s*){0,7}\s*", True, "Searching for simple numeric chapter headings"], # Numeric Chapters, no dot or colon [r"[^'\"]?(\d+\.?\s+([\d\w-]+\:?\'?-?\s?){0,5})\s*", True, True, True, False, "Searching for numeric chapters with titles", 'numeric_title'], # Numeric Titles
[r"\s*[^'\"]?([A-Z#]+(\s|-){0,3}){1,5}\s*", False, "Searching for chapters with Uppercase Characters" ] # Uppercase Chapters [r"[^'\"]?(\d+)\s*([\dA-Z\-\'\"\?!#,]+\s*){0,7}\s*", True, True, True, False, "Searching for simple numeric headings", 'plain_number'], # Numeric Chapters, no dot or colon
[r"\s*[^'\"]?([A-Z#]+(\s|-){0,3}){1,5}\s*", False, True, False, False, "Searching for chapters with Uppercase Characters", 'uppercase' ] # Uppercase Chapters
] ]
# Start with most typical chapter headings, get more aggressive until one works def recurse_patterns(html, analyze):
for [chapter_type, lookahead_ignorecase, log_message] in chapter_types: # Start with most typical chapter headings, get more aggressive until one works
if self.html_preprocess_sections >= self.min_chapters: for [chapter_type, n_lookahead_req, strict_title, ignorecase, title_req, log_message, type_name] in chapter_types:
break n_lookahead = ''
full_chapter_line = chapter_line_open+chapter_header_open+chapter_type+chapter_header_close+chapter_line_close hits = 0
n_lookahead = re.sub("(ou|in|cha)", "lookahead_", full_chapter_line) self.chapters_no_title = 0
self.log("Marked " + unicode(self.html_preprocess_sections) + " headings, " + log_message) self.chapters_with_title = 0
if lookahead_ignorecase:
chapter_marker = init_lookahead+full_chapter_line+blank_lines+n_lookahead_open+n_lookahead+n_lookahead_close+opt_title_open+title_line_open+title_header_open+default_title+title_header_close+title_line_close+opt_title_close if n_lookahead_req:
chapdetect = re.compile(r'%s' % chapter_marker, re.IGNORECASE) lp_n_lookahead_open = n_lookahead_open
else: lp_n_lookahead_close = n_lookahead_close
chapter_marker = init_lookahead+full_chapter_line+blank_lines+opt_title_open+title_line_open+title_header_open+default_title+title_header_close+title_line_close+opt_title_close+n_lookahead_open+n_lookahead+n_lookahead_close else:
chapdetect = re.compile(r'%s' % chapter_marker, re.UNICODE) lp_n_lookahead_open = ''
html = chapdetect.sub(self.chapter_head, html) lp_n_lookahead_close = ''
if strict_title:
lp_title = default_title
else:
lp_title = simple_title
if ignorecase:
arg_ignorecase = r'(?i)'
else:
arg_ignorecase = ''
if title_req:
lp_opt_title_open = ''
lp_opt_title_close = ''
else:
lp_opt_title_open = opt_title_open
lp_opt_title_close = opt_title_close
if self.html_preprocess_sections >= self.min_chapters:
break
full_chapter_line = chapter_line_open+chapter_header_open+chapter_type+chapter_header_close+chapter_line_close
if n_lookahead_req:
n_lookahead = re.sub("(ou|in|cha)", "lookahead_", full_chapter_line)
if not analyze:
self.log.debug("Marked " + unicode(self.html_preprocess_sections) + " headings, " + log_message)
chapter_marker = arg_ignorecase+init_lookahead+full_chapter_line+blank_lines+lp_n_lookahead_open+n_lookahead+lp_n_lookahead_close+lp_opt_title_open+title_line_open+title_header_open+lp_title+title_header_close+title_line_close+lp_opt_title_close
chapdetect = re.compile(r'%s' % chapter_marker)
if analyze:
hits = len(chapdetect.findall(html))
if hits:
chapdetect.sub(self.analyze_title_matches, html)
if float(self.chapters_with_title) / float(hits) > .5:
title_req = True
strict_title = False
self.log.debug(unicode(type_name)+" had "+unicode(hits)+" hits - "+unicode(self.chapters_no_title)+" chapters with no title, "+unicode(self.chapters_with_title)+" chapters with titles, "+unicode(float(self.chapters_with_title) / float(hits))+" percent. ")
if type_name == 'common':
analysis_result.append([chapter_type, n_lookahead_req, strict_title, ignorecase, title_req, log_message, type_name])
elif self.min_chapters <= hits < max_chapters:
analysis_result.append([chapter_type, n_lookahead_req, strict_title, ignorecase, title_req, log_message, type_name])
break
else:
html = chapdetect.sub(self.chapter_head, html)
return html
recurse_patterns(html, True)
chapter_types = analysis_result
html = recurse_patterns(html, False)
words_per_chptr = wordcount words_per_chptr = wordcount
if words_per_chptr > 0 and self.html_preprocess_sections > 0: if words_per_chptr > 0 and self.html_preprocess_sections > 0:
words_per_chptr = wordcount / self.html_preprocess_sections words_per_chptr = wordcount / self.html_preprocess_sections
self.log("Total wordcount is: "+ str(wordcount)+", Average words per section is: "+str(words_per_chptr)+", Marked up "+str(self.html_preprocess_sections)+" chapters") self.log.debug("Total wordcount is: "+ str(wordcount)+", Average words per section is: "+str(words_per_chptr)+", Marked up "+str(self.html_preprocess_sections)+" chapters")
return html return html
def punctuation_unwrap(self, length, content, format): def punctuation_unwrap(self, length, content, format):
'''
Unwraps lines based on line length and punctuation
supports a range of html markup and text files
'''
# define the pieces of the regex # define the pieces of the regex
lookahead = "(?<=.{"+str(length)+"}([a-zäëïöüàèìòùáćéíóńśúâêîôûçąężıãõñæøþðßě,:)\IA\u00DF]|(?<!\&\w{4});))" # (?<!\&\w{4});) is a semicolon not part of an entity lookahead = "(?<=.{"+str(length)+u"}([a-zäëïöüàèìòùáćéíóńśúâêîôûçąężıãõñæøþðßě,:)\IA\u00DF]|(?<!\&\w{4});))" # (?<!\&\w{4});) is a semicolon not part of an entity
line_ending = "\s*</(span|p|div)>\s*(</(p|span|div)>)?" em_en_lookahead = "(?<=.{"+str(length)+u"}[\u2013\u2014])"
soft_hyphen = u"\xad"
line_ending = "\s*</(span|[iubp]|div)>\s*(</(span|[iubp]|div)>)?"
blanklines = "\s*(?P<up2threeblanks><(p|span|div)[^>]*>\s*(<(p|span|div)[^>]*>\s*</(span|p|div)>\s*)</(span|p|div)>\s*){0,3}\s*" blanklines = "\s*(?P<up2threeblanks><(p|span|div)[^>]*>\s*(<(p|span|div)[^>]*>\s*</(span|p|div)>\s*)</(span|p|div)>\s*){0,3}\s*"
line_opening = "<(span|div|p)[^>]*>\s*(<(span|div|p)[^>]*>)?\s*" line_opening = "<(span|[iubp]|div)[^>]*>\s*(<(span|[iubp]|div)[^>]*>)?\s*"
txt_line_wrap = u"((\u0020|\u0009)*\n){1,4}" txt_line_wrap = u"((\u0020|\u0009)*\n){1,4}"
unwrap_regex = lookahead+line_ending+blanklines+line_opening unwrap_regex = lookahead+line_ending+blanklines+line_opening
em_en_unwrap_regex = em_en_lookahead+line_ending+blanklines+line_opening
shy_unwrap_regex = soft_hyphen+line_ending+blanklines+line_opening
if format == 'txt': if format == 'txt':
unwrap_regex = lookahead+txt_line_wrap unwrap_regex = lookahead+txt_line_wrap
em_en_unwrap_regex = em_en_lookahead+txt_line_wrap
shy_unwrap_regex = soft_hyphen+txt_line_wrap
unwrap = re.compile(u"%s" % unwrap_regex, re.UNICODE) unwrap = re.compile(u"%s" % unwrap_regex, re.UNICODE)
em_en_unwrap = re.compile(u"%s" % em_en_unwrap_regex, re.UNICODE)
shy_unwrap = re.compile(u"%s" % shy_unwrap_regex, re.UNICODE)
content = unwrap.sub(' ', content) content = unwrap.sub(' ', content)
content = em_en_unwrap.sub('', content)
content = shy_unwrap.sub('', content)
return content return content
def txt_process(self, match):
from calibre.ebooks.txt.processor import convert_basic, preserve_spaces, \
separate_paragraphs_single_line
content = match.group('text')
content = separate_paragraphs_single_line(content)
content = preserve_spaces(content)
content = convert_basic(content, epub_split_size_kb=0)
return content
def __call__(self, html): def markup_pre(self, html):
self.log("********* Preprocessing HTML *********") pre = re.compile(r'<pre>', re.IGNORECASE)
if len(pre.findall(html)) >= 1:
self.log.debug("Running Text Processing")
outerhtml = re.compile(r'.*?(?<=<pre>)(?P<text>.*?)</pre>', re.IGNORECASE|re.DOTALL)
html = outerhtml.sub(self.txt_process, html)
else:
# Add markup naively
# TODO - find out if there are cases where there are more than one <pre> tag or
# other types of unmarked html and handle them in some better fashion
add_markup = re.compile('(?<!>)(\n)')
html = add_markup.sub('</p>\n<p>', html)
return html
# Count the words in the document to estimate how many chapters to look for and whether def arrange_htm_line_endings(self, html):
# other types of processing are attempted
totalwords = 0
totalwords = self.get_word_count(html)
if totalwords < 50:
self.log("not enough text, not preprocessing")
return html
# Arrange line feeds and </p> tags so the line_length and no_markup functions work correctly
html = re.sub(r"\s*</(?P<tag>p|div)>", "</"+"\g<tag>"+">\n", html) html = re.sub(r"\s*</(?P<tag>p|div)>", "</"+"\g<tag>"+">\n", html)
html = re.sub(r"\s*<(?P<tag>p|div)(?P<style>[^>]*)>\s*", "\n<"+"\g<tag>"+"\g<style>"+">", html) html = re.sub(r"\s*<(?P<tag>p|div)(?P<style>[^>]*)>\s*", "\n<"+"\g<tag>"+"\g<style>"+">", html)
return html
###### Check Markup ###### def fix_nbsp_indents(self, html):
#
# some lit files don't have any <p> tags or equivalent (generally just plain text between
# <pre> tags), check and mark up line endings if required before proceeding
if self.no_markup(html, 0.1):
self.log("not enough paragraph markers, adding now")
# check if content is in pre tags, use txt processor to mark up if so
pre = re.compile(r'<pre>', re.IGNORECASE)
if len(pre.findall(html)) == 1:
self.log("Running Text Processing")
from calibre.ebooks.txt.processor import convert_basic, preserve_spaces, \
separate_paragraphs_single_line
outerhtml = re.compile(r'.*?(?<=<pre>)(?P<text>.*)(?=</pre>).*', re.IGNORECASE|re.DOTALL)
html = outerhtml.sub('\g<text>', html)
html = separate_paragraphs_single_line(html)
html = preserve_spaces(html)
html = convert_basic(html, epub_split_size_kb=0)
else:
# Add markup naively
# TODO - find out if there are cases where there are more than one <pre> tag or
# other types of unmarked html and handle them in some better fashion
add_markup = re.compile('(?<!>)(\n)')
html = add_markup.sub('</p>\n<p>', html)
###### Mark Indents/Cleanup ######
#
# Replace series of non-breaking spaces with text-indent
txtindent = re.compile(ur'<p(?P<formatting>[^>]*)>\s*(?P<span>(<span[^>]*>\s*)+)?\s*(\u00a0){2,}', re.IGNORECASE) txtindent = re.compile(ur'<p(?P<formatting>[^>]*)>\s*(?P<span>(<span[^>]*>\s*)+)?\s*(\u00a0){2,}', re.IGNORECASE)
html = txtindent.sub(self.insert_indent, html) html = txtindent.sub(self.insert_indent, html)
if self.found_indents > 1: if self.found_indents > 1:
self.log("replaced "+unicode(self.found_indents)+ " nbsp indents with inline styles") self.log.debug("replaced "+unicode(self.found_indents)+ " nbsp indents with inline styles")
return html
def cleanup_markup(self, html):
# remove remaining non-breaking spaces # remove remaining non-breaking spaces
html = re.sub(ur'\u00a0', ' ', html) html = re.sub(ur'\u00a0', ' ', html)
# Get rid of various common microsoft specific tags which can cause issues later # Get rid of various common microsoft specific tags which can cause issues later
@ -255,108 +367,166 @@ class PreProcessor(object):
html = re.sub(ur'\s*<o:p>\s*</o:p>', ' ', html) html = re.sub(ur'\s*<o:p>\s*</o:p>', ' ', html)
# Delete microsoft 'smart' tags # Delete microsoft 'smart' tags
html = re.sub('(?i)</?st1:\w+>', '', html) html = re.sub('(?i)</?st1:\w+>', '', html)
# Get rid of empty span, bold, & italics tags # Get rid of empty span, bold, font, em, & italics tags
html = re.sub(r"\s*<span[^>]*>\s*(<span[^>]*>\s*</span>){0,2}\s*</span>\s*", " ", html) html = re.sub(r"\s*<span[^>]*>\s*(<span[^>]*>\s*</span>){0,2}\s*</span>\s*", " ", html)
html = re.sub(r"\s*<[ibu][^>]*>\s*(<[ibu][^>]*>\s*</[ibu]>\s*){0,2}\s*</[ibu]>", " ", html) html = re.sub(r"\s*<(font|[ibu]|em)[^>]*>\s*(<(font|[ibu]|em)[^>]*>\s*</(font|[ibu]|em)>\s*){0,2}\s*</(font|[ibu]|em)>", " ", html)
html = re.sub(r"\s*<span[^>]*>\s*(<span[^>]>\s*</span>){0,2}\s*</span>\s*", " ", html) html = re.sub(r"\s*<span[^>]*>\s*(<span[^>]>\s*</span>){0,2}\s*</span>\s*", " ", html)
# ADE doesn't render <br />, change to empty paragraphs html = re.sub(r"\s*<(font|[ibu]|em)[^>]*>\s*(<(font|[ibu]|em)[^>]*>\s*</(font|[ibu]|em)>\s*){0,2}\s*</(font|[ibu]|em)>", " ", html)
#html = re.sub('<br[^>]*>', u'<p>\u00a0</p>', html) self.deleted_nbsps = True
return html
# If more than 40% of the lines are empty paragraphs and the user has enabled remove def analyze_line_endings(self, html):
# paragraph spacing then delete blank lines to clean up spacing '''
linereg = re.compile('(?<=<p).*?(?=</p>)', re.IGNORECASE|re.DOTALL) determines the type of html line ending used most commonly in a document
blankreg = re.compile(r'\s*(?P<openline><p[^>]*>)\s*(?P<closeline></p>)', re.IGNORECASE) use before calling docanalysis functions
#multi_blank = re.compile(r'(\s*<p[^>]*>\s*(<(b|i|u)>)?\s*(</(b|i|u)>)?\s*</p>){2,}', re.IGNORECASE) '''
blanklines = blankreg.findall(html)
lines = linereg.findall(html)
blanks_between_paragraphs = False
if len(lines) > 1:
self.log("There are " + unicode(len(blanklines)) + " blank lines. " +
unicode(float(len(blanklines)) / float(len(lines))) + " percent blank")
if float(len(blanklines)) / float(len(lines)) > 0.40 and getattr(self.extra_opts,
'remove_paragraph_spacing', False):
self.log("deleting blank lines")
html = blankreg.sub('', html)
elif float(len(blanklines)) / float(len(lines)) > 0.40:
blanks_between_paragraphs = True
#print "blanks between paragraphs is marked True"
else:
blanks_between_paragraphs = False
#self.dump(html, 'before_chapter_markup')
# detect chapters/sections to match xpath or splitting logic
#
html = self.markup_chapters(html, totalwords, blanks_between_paragraphs)
###### Unwrap lines ######
#
# Some OCR sourced files have line breaks in the html using a combination of span & p tags
# span are used for hard line breaks, p for new paragraphs. Determine which is used so
# that lines can be un-wrapped across page boundaries
paras_reg = re.compile('<p[^>]*>', re.IGNORECASE) paras_reg = re.compile('<p[^>]*>', re.IGNORECASE)
spans_reg = re.compile('<span[^>]*>', re.IGNORECASE) spans_reg = re.compile('<span[^>]*>', re.IGNORECASE)
paras = len(paras_reg.findall(html)) paras = len(paras_reg.findall(html))
spans = len(spans_reg.findall(html)) spans = len(spans_reg.findall(html))
if spans > 1: if spans > 1:
if float(paras) / float(spans) < 0.75: if float(paras) / float(spans) < 0.75:
format = 'spanned_html' return 'spanned_html'
else: else:
format = 'html' return 'html'
else: else:
format = 'html' return 'html'
def analyze_blanks(self, html):
blanklines = self.blankreg.findall(html)
lines = self.linereg.findall(html)
if len(lines) > 1:
self.log.debug("There are " + unicode(len(blanklines)) + " blank lines. " +
unicode(float(len(blanklines)) / float(len(lines))) + " percent blank")
if float(len(blanklines)) / float(len(lines)) > 0.40:
return True
else:
return False
def cleanup_required(self):
for option in ['unwrap_lines', 'markup_chapter_headings', 'format_scene_breaks', 'delete_blank_paragraphs']:
if getattr(self.extra_opts, option, False):
return True
return False
def __call__(self, html):
self.log.debug("********* Heuristic processing HTML *********")
# Count the words in the document to estimate how many chapters to look for and whether
# other types of processing are attempted
try:
self.totalwords = self.get_word_count(html)
except:
self.log.warn("Can't get wordcount")
if self.totalwords < 50:
self.log.warn("flow is too short, not running heuristics")
return html
# Arrange line feeds and </p> tags so the line_length and no_markup functions work correctly
html = self.arrange_htm_line_endings(html)
if self.cleanup_required():
###### Check Markup ######
#
# some lit files don't have any <p> tags or equivalent (generally just plain text between
# <pre> tags), check and mark up line endings if required before proceeding
# fix indents must run after this step
if self.no_markup(html, 0.1):
self.log.debug("not enough paragraph markers, adding now")
# markup using text processing
html = self.markup_pre(html)
# Replace series of non-breaking spaces with text-indent
if getattr(self.extra_opts, 'fix_indents', False):
html = self.fix_nbsp_indents(html)
if self.cleanup_required():
# fix indents must run before this step, as it removes non-breaking spaces
html = self.cleanup_markup(html)
# ADE doesn't render <br />, change to empty paragraphs
#html = re.sub('<br[^>]*>', u'<p>\u00a0</p>', html)
# Determine whether the document uses interleaved blank lines
blanks_between_paragraphs = self.analyze_blanks(html)
#self.dump(html, 'before_chapter_markup')
# detect chapters/sections to match xpath or splitting logic
if getattr(self.extra_opts, 'markup_chapter_headings', False):
html = self.markup_chapters(html, self.totalwords, blanks_between_paragraphs)
if getattr(self.extra_opts, 'italicize_common_cases', False):
html = self.markup_italicis(html)
# If more than 40% of the lines are empty paragraphs and the user has enabled delete
# blank paragraphs then delete blank lines to clean up spacing
if blanks_between_paragraphs and getattr(self.extra_opts, 'delete_blank_paragraphs', False):
self.log.debug("deleting blank lines")
self.blanks_deleted = True
html = self.multi_blank.sub('\n<p id="softbreak" style="margin-top:1.5em; margin-bottom:1.5em"> </p>', html)
html = self.blankreg.sub('', html)
# Determine line ending type
# Some OCR sourced files have line breaks in the html using a combination of span & p tags
# span are used for hard line breaks, p for new paragraphs. Determine which is used so
# that lines can be un-wrapped across page boundaries
format = self.analyze_line_endings(html)
# Check Line histogram to determine if the document uses hard line breaks, If 50% or # Check Line histogram to determine if the document uses hard line breaks, If 50% or
# more of the lines break in the same region of the document then unwrapping is required # more of the lines break in the same region of the document then unwrapping is required
docanalysis = DocAnalysis(format, html) docanalysis = DocAnalysis(format, html)
hardbreaks = docanalysis.line_histogram(.50) hardbreaks = docanalysis.line_histogram(.50)
self.log("Hard line breaks check returned "+unicode(hardbreaks)) self.log.debug("Hard line breaks check returned "+unicode(hardbreaks))
# Calculate Length # Calculate Length
unwrap_factor = getattr(self.extra_opts, 'html_unwrap_factor', 0.4) unwrap_factor = getattr(self.extra_opts, 'html_unwrap_factor', 0.4)
length = docanalysis.line_length(unwrap_factor) length = docanalysis.line_length(unwrap_factor)
self.log("Median line length is " + unicode(length) + ", calculated with " + format + " format") self.log.debug("Median line length is " + unicode(length) + ", calculated with " + format + " format")
# only go through unwrapping code if the histogram shows unwrapping is required or if the user decreased the default unwrap_factor
if hardbreaks or unwrap_factor < 0.4:
self.log("Unwrapping required, unwrapping Lines")
# Unwrap em/en dashes
html = re.sub(u'(?<=.{%i}[\u2013\u2014])\s*(?=<)(</span>\s*(</[iubp]>\s*<[iubp][^>]*>\s*)?<span[^>]*>|</[iubp]>\s*<[iubp][^>]*>)?\s*(?=[[a-z\d])' % length, '', html)
# Dehyphenate
self.log("Unwrapping/Removing hyphens")
dehyphenator = Dehyphenator()
html = dehyphenator(html,'html', length)
self.log("Done dehyphenating")
# Unwrap lines using punctation and line length
#unwrap_quotes = re.compile(u"(?<=.{%i}\"')\s*</(span|p|div)>\s*(</(p|span|div)>)?\s*(?P<up2threeblanks><(p|span|div)[^>]*>\s*(<(p|span|div)[^>]*>\s*</(span|p|div)>\s*)</(span|p|div)>\s*){0,3}\s*<(span|div|p)[^>]*>\s*(<(span|div|p)[^>]*>)?\s*(?=[a-z])" % length, re.UNICODE)
html = self.punctuation_unwrap(length, html, 'html')
#check any remaining hyphens, but only unwrap if there is a match
dehyphenator = Dehyphenator()
html = dehyphenator(html,'html_cleanup', length)
else:
# dehyphenate in cleanup mode to fix anything previous conversions/editing missed
self.log("Cleaning up hyphenation")
dehyphenator = Dehyphenator()
html = dehyphenator(html,'html_cleanup', length)
self.log("Done dehyphenating")
# delete soft hyphens ###### Unwrap lines ######
html = re.sub(u'\xad\s*(</span>\s*(</[iubp]>\s*<[iubp][^>]*>\s*)?<span[^>]*>|</[iubp]>\s*<[iubp][^>]*>)?\s*', '', html) if getattr(self.extra_opts, 'unwrap_lines', False):
# only go through unwrapping code if the histogram shows unwrapping is required or if the user decreased the default unwrap_factor
if hardbreaks or unwrap_factor < 0.4:
self.log.debug("Unwrapping required, unwrapping Lines")
# Dehyphenate with line length limiters
dehyphenator = Dehyphenator(self.extra_opts.verbose, self.log)
html = dehyphenator(html,'html', length)
html = self.punctuation_unwrap(length, html, 'html')
if getattr(self.extra_opts, 'dehyphenate', False):
# dehyphenate in cleanup mode to fix anything previous conversions/editing missed
self.log.debug("Fixing hyphenated content")
dehyphenator = Dehyphenator(self.extra_opts.verbose, self.log)
html = dehyphenator(html,'html_cleanup', length)
html = dehyphenator(html, 'individual_words', length)
# If still no sections after unwrapping mark split points on lines with no punctuation # If still no sections after unwrapping mark split points on lines with no punctuation
if self.html_preprocess_sections < self.min_chapters: if self.html_preprocess_sections < self.min_chapters and getattr(self.extra_opts, 'markup_chapter_headings', False):
self.log("Looking for more split points based on punctuation," self.log.debug("Looking for more split points based on punctuation,"
" currently have " + unicode(self.html_preprocess_sections)) " currently have " + unicode(self.html_preprocess_sections))
chapdetect3 = re.compile(r'<(?P<styles>(p|div)[^>]*)>\s*(?P<section>(<span[^>]*>)?\s*(?!([*#•]+\s*)+)(<[ibu][^>]*>){0,2}\s*(<span[^>]*>)?\s*(<[ibu][^>]*>){0,2}\s*(<span[^>]*>)?\s*.?(?=[a-z#\-*\s]+<)([a-z#-*]+\s*){1,5}\s*\s*(</span>)?(</[ibu]>){0,2}\s*(</span>)?\s*(</[ibu]>){0,2}\s*(</span>)?\s*</(p|div)>)', re.IGNORECASE) chapdetect3 = re.compile(r'<(?P<styles>(p|div)[^>]*)>\s*(?P<section>(<span[^>]*>)?\s*(?!([*#•]+\s*)+)(<[ibu][^>]*>){0,2}\s*(<span[^>]*>)?\s*(<[ibu][^>]*>){0,2}\s*(<span[^>]*>)?\s*.?(?=[a-z#\-*\s]+<)([a-z#-*]+\s*){1,5}\s*\s*(</span>)?(</[ibu]>){0,2}\s*(</span>)?\s*(</[ibu]>){0,2}\s*(</span>)?\s*</(p|div)>)', re.IGNORECASE)
html = chapdetect3.sub(self.chapter_break, html) html = chapdetect3.sub(self.chapter_break, html)
# search for places where a first or second level heading is immediately followed by another
# top level heading. demote the second heading to h3 to prevent splitting between chapter
# headings and titles, images, etc
doubleheading = re.compile(r'(?P<firsthead><h(1|2)[^>]*>.+?</h(1|2)>\s*(<(?!h\d)[^>]*>\s*)*)<h(1|2)(?P<secondhead>[^>]*>.+?)</h(1|2)>', re.IGNORECASE)
html = doubleheading.sub('\g<firsthead>'+'\n<h3'+'\g<secondhead>'+'</h3>', html)
# put back non-breaking spaces in empty paragraphs to preserve original formatting if getattr(self.extra_opts, 'renumber_headings', False):
html = blankreg.sub('\n'+r'\g<openline>'+u'\u00a0'+r'\g<closeline>', html) # search for places where a first or second level heading is immediately followed by another
# top level heading. demote the second heading to h3 to prevent splitting between chapter
# headings and titles, images, etc
doubleheading = re.compile(r'(?P<firsthead><h(1|2)[^>]*>.+?</h(1|2)>\s*(<(?!h\d)[^>]*>\s*)*)<h(1|2)(?P<secondhead>[^>]*>.+?)</h(1|2)>', re.IGNORECASE)
html = doubleheading.sub('\g<firsthead>'+'\n<h3'+'\g<secondhead>'+'</h3>', html)
# Center separator lines if getattr(self.extra_opts, 'format_scene_breaks', False):
html = re.sub(u'<(?P<outer>p|div)[^>]*>\s*(<(?P<inner1>font|span|[ibu])[^>]*>)?\s*(<(?P<inner2>font|span|[ibu])[^>]*>)?\s*(<(?P<inner3>font|span|[ibu])[^>]*>)?\s*(?P<break>([*#•=✦]+\s*)+)\s*(</(?P=inner3)>)?\s*(</(?P=inner2)>)?\s*(</(?P=inner1)>)?\s*</(?P=outer)>', '<p style="text-align:center">' + '\g<break>' + '</p>', html) # Center separator lines
html = re.sub(u'<(?P<outer>p|div)[^>]*>\s*(<(?P<inner1>font|span|[ibu])[^>]*>)?\s*(<(?P<inner2>font|span|[ibu])[^>]*>)?\s*(<(?P<inner3>font|span|[ibu])[^>]*>)?\s*(?P<break>([*#•=✦]+\s*)+)\s*(</(?P=inner3)>)?\s*(</(?P=inner2)>)?\s*(</(?P=inner1)>)?\s*</(?P=outer)>', '<p style="text-align:center; margin-top:1.25em; margin-bottom:1.25em">' + '\g<break>' + '</p>', html)
if not self.blanks_deleted:
html = self.multi_blank.sub('\n<p id="softbreak" style="margin-top:1.5em; margin-bottom:1.5em"> </p>', html)
html = re.sub('<p\s+id="softbreak"[^>]*>\s*</p>', '<div id="softbreak" style="margin-left: 45%; margin-right: 45%; margin-top:1.5em; margin-bottom:1.5em"><hr style="height: 3px; background:#505050" /></div>', html)
if self.deleted_nbsps:
# put back non-breaking spaces in empty paragraphs to preserve original formatting
html = self.blankreg.sub('\n'+r'\g<openline>'+u'\u00a0'+r'\g<closeline>', html)
return html return html

View File

@ -21,10 +21,9 @@ from calibre.customize.conversion import InputFormatPlugin
from calibre.ebooks.chardet import xml_to_unicode from calibre.ebooks.chardet import xml_to_unicode
from calibre.customize.conversion import OptionRecommendation from calibre.customize.conversion import OptionRecommendation
from calibre.constants import islinux, isfreebsd, iswindows from calibre.constants import islinux, isfreebsd, iswindows
from calibre import unicode_path from calibre import unicode_path, as_unicode
from calibre.utils.localization import get_lang from calibre.utils.localization import get_lang
from calibre.utils.filenames import ascii_filename from calibre.utils.filenames import ascii_filename
from calibre.ebooks.conversion.utils import PreProcessor
class Link(object): class Link(object):
''' '''
@ -112,7 +111,7 @@ class HTMLFile(object):
with open(self.path, 'rb') as f: with open(self.path, 'rb') as f:
src = f.read() src = f.read()
except IOError, err: except IOError, err:
msg = 'Could not read from file: %s with error: %s'%(self.path, unicode(err)) msg = 'Could not read from file: %s with error: %s'%(self.path, as_unicode(err))
if level == 0: if level == 0:
raise IOError(msg) raise IOError(msg)
raise IgnoreFile(msg, err.errno) raise IgnoreFile(msg, err.errno)
@ -296,7 +295,7 @@ class HTMLInput(InputFormatPlugin):
return oeb return oeb
from calibre.ebooks.conversion.plumber import create_oebbook from calibre.ebooks.conversion.plumber import create_oebbook
return create_oebbook(log, stream.name, opts, self, return create_oebbook(log, stream.name, opts,
encoding=opts.input_encoding) encoding=opts.input_encoding)
def is_case_sensitive(self, path): def is_case_sensitive(self, path):
@ -485,9 +484,3 @@ class HTMLInput(InputFormatPlugin):
self.log.exception('Failed to read CSS file: %r'%link) self.log.exception('Failed to read CSS file: %r'%link)
return (None, None) return (None, None)
return (None, raw) return (None, raw)
def preprocess_html(self, options, html):
self.options = options
preprocessor = PreProcessor(self.options, log=getattr(self, 'log', None))
return preprocessor(html)

View File

@ -7,8 +7,6 @@ __copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en' __docformat__ = 'restructuredtext en'
from calibre.customize.conversion import InputFormatPlugin from calibre.customize.conversion import InputFormatPlugin
from calibre.ebooks.conversion.utils import PreProcessor
class LITInput(InputFormatPlugin): class LITInput(InputFormatPlugin):
@ -22,7 +20,7 @@ class LITInput(InputFormatPlugin):
from calibre.ebooks.lit.reader import LitReader from calibre.ebooks.lit.reader import LitReader
from calibre.ebooks.conversion.plumber import create_oebbook from calibre.ebooks.conversion.plumber import create_oebbook
self.log = log self.log = log
return create_oebbook(log, stream, options, self, reader=LitReader) return create_oebbook(log, stream, options, reader=LitReader)
def postprocess_book(self, oeb, opts, log): def postprocess_book(self, oeb, opts, log):
from calibre.ebooks.oeb.base import XHTML_NS, XPath, XHTML from calibre.ebooks.oeb.base import XHTML_NS, XPath, XHTML
@ -39,10 +37,13 @@ class LITInput(InputFormatPlugin):
body = body[0] body = body[0]
if len(body) == 1 and body[0].tag == XHTML('pre'): if len(body) == 1 and body[0].tag == XHTML('pre'):
pre = body[0] pre = body[0]
from calibre.ebooks.txt.processor import convert_basic from calibre.ebooks.txt.processor import convert_basic, preserve_spaces, \
separate_paragraphs_single_line
from lxml import etree from lxml import etree
import copy import copy
html = convert_basic(pre.text).replace('<html>', html = separate_paragraphs_single_line(pre.text)
html = preserve_spaces(html)
html = convert_basic(html).replace('<html>',
'<html xmlns="%s">'%XHTML_NS) '<html xmlns="%s">'%XHTML_NS)
root = etree.fromstring(html) root = etree.fromstring(html)
body = XPath('//h:body')(root) body = XPath('//h:body')(root)
@ -51,10 +52,3 @@ class LITInput(InputFormatPlugin):
for elem in body: for elem in body:
ne = copy.deepcopy(elem) ne = copy.deepcopy(elem)
pre.append(ne) pre.append(ne)
def preprocess_html(self, options, html):
self.options = options
preprocessor = PreProcessor(self.options, log=getattr(self, 'log', None))
return preprocessor(html)

View File

@ -12,7 +12,6 @@ from copy import deepcopy
from lxml import etree from lxml import etree
from calibre.customize.conversion import InputFormatPlugin from calibre.customize.conversion import InputFormatPlugin
from calibre.ebooks.conversion.utils import PreProcessor
from calibre import guess_type from calibre import guess_type
class Canvas(etree.XSLTExtension): class Canvas(etree.XSLTExtension):
@ -419,11 +418,3 @@ class LRFInput(InputFormatPlugin):
f.write(result) f.write(result)
styles.write() styles.write()
return os.path.abspath('content.opf') return os.path.abspath('content.opf')
def preprocess_html(self, options, html):
self.options = options
preprocessor = PreProcessor(self.options, log=getattr(self, 'log', None))
return preprocessor(html)

View File

@ -4,7 +4,7 @@ __copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
Fetch cover from LibraryThing.com based on ISBN number. Fetch cover from LibraryThing.com based on ISBN number.
''' '''
import sys, socket, os, re import sys, socket, os, re, random
from lxml import html from lxml import html
import mechanize import mechanize
@ -16,13 +16,26 @@ from calibre.ebooks.chardet import strip_encoding_declarations
OPENLIBRARY = 'http://covers.openlibrary.org/b/isbn/%s-L.jpg?default=false' OPENLIBRARY = 'http://covers.openlibrary.org/b/isbn/%s-L.jpg?default=false'
def get_ua():
choices = [
'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11'
'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'
'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)'
'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)'
'Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7A341 Safari/528.16'
'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/0.2.153.1 Safari/525.19'
'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11'
]
return choices[random.randint(0, len(choices)-1)]
class HeadRequest(mechanize.Request): class HeadRequest(mechanize.Request):
def get_method(self): def get_method(self):
return 'HEAD' return 'HEAD'
def check_for_cover(isbn, timeout=5.): def check_for_cover(isbn, timeout=5.):
br = browser() br = browser(user_agent=get_ua())
br.set_handle_redirect(False) br.set_handle_redirect(False)
try: try:
br.open_novisit(HeadRequest(OPENLIBRARY%isbn), timeout=timeout) br.open_novisit(HeadRequest(OPENLIBRARY%isbn), timeout=timeout)
@ -51,7 +64,7 @@ def login(br, username, password, force=True):
def cover_from_isbn(isbn, timeout=5., username=None, password=None): def cover_from_isbn(isbn, timeout=5., username=None, password=None):
src = None src = None
br = browser() br = browser(user_agent=get_ua())
try: try:
return br.open(OPENLIBRARY%isbn, timeout=timeout).read(), 'jpg' return br.open(OPENLIBRARY%isbn, timeout=timeout).read(), 'jpg'
except: except:
@ -100,7 +113,7 @@ def get_social_metadata(title, authors, publisher, isbn, username=None,
from calibre.ebooks.metadata import MetaInformation from calibre.ebooks.metadata import MetaInformation
mi = MetaInformation(title, authors) mi = MetaInformation(title, authors)
if isbn: if isbn:
br = browser() br = browser(user_agent=get_ua())
if username and password: if username and password:
try: try:
login(br, username, password, force=False) login(br, username, password, force=False)

View File

@ -3,7 +3,6 @@ __license__ = 'GPL 3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>' __copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en' __docformat__ = 'restructuredtext en'
import re
from calibre.customize.conversion import InputFormatPlugin from calibre.customize.conversion import InputFormatPlugin
class MOBIInput(InputFormatPlugin): class MOBIInput(InputFormatPlugin):
@ -39,11 +38,3 @@ class MOBIInput(InputFormatPlugin):
accelerators['pagebreaks'] = '//h:div[@class="mbp_pagebreak"]' accelerators['pagebreaks'] = '//h:div[@class="mbp_pagebreak"]'
return mr.created_opf_path return mr.created_opf_path
def preprocess_html(self, options, html):
# search for places where a first or second level heading is immediately followed by another
# top level heading. demote the second heading to h3 to prevent splitting between chapter
# headings and titles, images, etc
doubleheading = re.compile(r'(?P<firsthead><h(1|2)[^>]*>.+?</h(1|2)>\s*(<(?!h\d)[^>]*>\s*)*)<h(1|2)(?P<secondhead>[^>]*>.+?)</h(1|2)>', re.IGNORECASE)
html = doubleheading.sub('\g<firsthead>'+'\n<h3'+'\g<secondhead>'+'</h3>', html)
return html

View File

@ -1541,7 +1541,10 @@ class MobiWriter(object):
exth.write(data) exth.write(data)
nrecs += 1 nrecs += 1
if term == 'rights' : if term == 'rights' :
rights = unicode(oeb.metadata.rights[0]).encode('utf-8') try:
rights = unicode(oeb.metadata.rights[0]).encode('utf-8')
except:
rights = 'Unknown'
exth.write(pack('>II', EXTH_CODES['rights'], len(rights) + 8)) exth.write(pack('>II', EXTH_CODES['rights'], len(rights) + 8))
exth.write(rights) exth.write(rights)

View File

@ -221,7 +221,10 @@ def rewrite_links(root, link_repl_func, resolve_base_href=False):
el.text): el.text):
stylesheet = parseString(el.text) stylesheet = parseString(el.text)
replaceUrls(stylesheet, link_repl_func) replaceUrls(stylesheet, link_repl_func)
el.text = '\n'+stylesheet.cssText + '\n' repl = stylesheet.cssText
if isbytestring(repl):
repl = repl.decode('utf-8')
el.text = '\n'+ repl + '\n'
if 'style' in el.attrib: if 'style' in el.attrib:
text = el.attrib['style'] text = el.attrib['style']
@ -234,8 +237,11 @@ def rewrite_links(root, link_repl_func, resolve_base_href=False):
set_property(item) set_property(item)
elif v.CSS_PRIMITIVE_VALUE == v.cssValueType: elif v.CSS_PRIMITIVE_VALUE == v.cssValueType:
set_property(v) set_property(v)
el.attrib['style'] = stext.cssText.replace('\n', ' ').replace('\r', repl = stext.cssText.replace('\n', ' ').replace('\r',
' ') ' ')
if isbytestring(repl):
repl = repl.decode('utf-8')
el.attrib['style'] = repl

View File

@ -199,8 +199,8 @@ class EbookIterator(object):
not hasattr(self.pathtoopf, 'manifest'): not hasattr(self.pathtoopf, 'manifest'):
if hasattr(self.pathtoopf, 'manifest'): if hasattr(self.pathtoopf, 'manifest'):
self.pathtoopf = write_oebbook(self.pathtoopf, self.base) self.pathtoopf = write_oebbook(self.pathtoopf, self.base)
self.pathtoopf = create_oebbook(self.log, self.pathtoopf, plumber.opts, self.pathtoopf = create_oebbook(self.log, self.pathtoopf,
plumber.input_plugin) plumber.opts)
if hasattr(self.pathtoopf, 'manifest'): if hasattr(self.pathtoopf, 'manifest'):
self.pathtoopf = write_oebbook(self.pathtoopf, self.base) self.pathtoopf = write_oebbook(self.pathtoopf, self.base)

View File

@ -10,7 +10,7 @@ import os
from calibre.utils.date import isoformat, now from calibre.utils.date import isoformat, now
from calibre import guess_type from calibre import guess_type
def meta_info_to_oeb_metadata(mi, m, log): def meta_info_to_oeb_metadata(mi, m, log, override_input_metadata=False):
from calibre.ebooks.oeb.base import OPF from calibre.ebooks.oeb.base import OPF
if not mi.is_null('title'): if not mi.is_null('title'):
m.clear('title') m.clear('title')
@ -29,15 +29,23 @@ def meta_info_to_oeb_metadata(mi, m, log):
if not mi.is_null('book_producer'): if not mi.is_null('book_producer'):
m.filter('contributor', lambda x : x.role.lower() == 'bkp') m.filter('contributor', lambda x : x.role.lower() == 'bkp')
m.add('contributor', mi.book_producer, role='bkp') m.add('contributor', mi.book_producer, role='bkp')
elif override_input_metadata:
m.filter('contributor', lambda x : x.role.lower() == 'bkp')
if not mi.is_null('comments'): if not mi.is_null('comments'):
m.clear('description') m.clear('description')
m.add('description', mi.comments) m.add('description', mi.comments)
elif override_input_metadata:
m.clear('description')
if not mi.is_null('publisher'): if not mi.is_null('publisher'):
m.clear('publisher') m.clear('publisher')
m.add('publisher', mi.publisher) m.add('publisher', mi.publisher)
elif override_input_metadata:
m.clear('publisher')
if not mi.is_null('series'): if not mi.is_null('series'):
m.clear('series') m.clear('series')
m.add('series', mi.series) m.add('series', mi.series)
elif override_input_metadata:
m.clear('series')
if not mi.is_null('isbn'): if not mi.is_null('isbn'):
has = False has = False
for x in m.identifier: for x in m.identifier:
@ -46,19 +54,27 @@ def meta_info_to_oeb_metadata(mi, m, log):
has = True has = True
if not has: if not has:
m.add('identifier', mi.isbn, scheme='ISBN') m.add('identifier', mi.isbn, scheme='ISBN')
elif override_input_metadata:
m.filter('identifier', lambda x: x.scheme.lower() == 'isbn')
if not mi.is_null('language'): if not mi.is_null('language'):
m.clear('language') m.clear('language')
m.add('language', mi.language) m.add('language', mi.language)
if not mi.is_null('series_index'): if not mi.is_null('series_index'):
m.clear('series_index') m.clear('series_index')
m.add('series_index', mi.format_series_index()) m.add('series_index', mi.format_series_index())
elif override_input_metadata:
m.clear('series_index')
if not mi.is_null('rating'): if not mi.is_null('rating'):
m.clear('rating') m.clear('rating')
m.add('rating', '%.2f'%mi.rating) m.add('rating', '%.2f'%mi.rating)
elif override_input_metadata:
m.clear('rating')
if not mi.is_null('tags'): if not mi.is_null('tags'):
m.clear('subject') m.clear('subject')
for t in mi.tags: for t in mi.tags:
m.add('subject', t) m.add('subject', t)
elif override_input_metadata:
m.clear('subject')
if not mi.is_null('pubdate'): if not mi.is_null('pubdate'):
m.clear('date') m.clear('date')
m.add('date', isoformat(mi.pubdate)) m.add('date', isoformat(mi.pubdate))
@ -71,6 +87,7 @@ def meta_info_to_oeb_metadata(mi, m, log):
if not mi.is_null('publication_type'): if not mi.is_null('publication_type'):
m.clear('publication_type') m.clear('publication_type')
m.add('publication_type', mi.publication_type) m.add('publication_type', mi.publication_type)
if not m.timestamp: if not m.timestamp:
m.add('timestamp', isoformat(now())) m.add('timestamp', isoformat(now()))
@ -78,11 +95,12 @@ def meta_info_to_oeb_metadata(mi, m, log):
class MergeMetadata(object): class MergeMetadata(object):
'Merge in user metadata, including cover' 'Merge in user metadata, including cover'
def __call__(self, oeb, mi, opts): def __call__(self, oeb, mi, opts, override_input_metadata=False):
self.oeb, self.log = oeb, oeb.log self.oeb, self.log = oeb, oeb.log
m = self.oeb.metadata m = self.oeb.metadata
self.log('Merging user specified metadata...') self.log('Merging user specified metadata...')
meta_info_to_oeb_metadata(mi, m, oeb.log) meta_info_to_oeb_metadata(mi, m, oeb.log,
override_input_metadata=override_input_metadata)
cover_id = self.set_cover(mi, opts.prefer_metadata_cover) cover_id = self.set_cover(mi, opts.prefer_metadata_cover)
m.clear('cover') m.clear('cover')
if cover_id is not None: if cover_id is not None:

View File

@ -9,7 +9,6 @@ import os
from calibre.customize.conversion import InputFormatPlugin from calibre.customize.conversion import InputFormatPlugin
from calibre.ebooks.pdb.header import PdbHeaderReader from calibre.ebooks.pdb.header import PdbHeaderReader
from calibre.ebooks.pdb import PDBError, IDENTITY_TO_NAME, get_reader from calibre.ebooks.pdb import PDBError, IDENTITY_TO_NAME, get_reader
from calibre.ebooks.conversion.utils import PreProcessor
class PDBInput(InputFormatPlugin): class PDBInput(InputFormatPlugin):
@ -32,8 +31,3 @@ class PDBInput(InputFormatPlugin):
opf = reader.extract_content(os.getcwd()) opf = reader.extract_content(os.getcwd())
return opf return opf
def preprocess_html(self, options, html):
self.options = options
preprocessor = PreProcessor(self.options, log=getattr(self, 'log', None))
return preprocessor(html)

View File

@ -7,7 +7,6 @@ import os, glob, re, textwrap
from lxml import etree from lxml import etree
from calibre.customize.conversion import InputFormatPlugin from calibre.customize.conversion import InputFormatPlugin
from calibre.ebooks.conversion.utils import PreProcessor
border_style_map = { border_style_map = {
'single' : 'solid', 'single' : 'solid',
@ -319,13 +318,9 @@ class RTFInput(InputFormatPlugin):
res = transform.tostring(result) res = transform.tostring(result)
res = res[:100].replace('xmlns:html', 'xmlns') + res[100:] res = res[:100].replace('xmlns:html', 'xmlns') + res[100:]
# Replace newlines inserted by the 'empty_paragraphs' option in rtf2xml with html blank lines # Replace newlines inserted by the 'empty_paragraphs' option in rtf2xml with html blank lines
if not getattr(self.opts, 'remove_paragraph_spacing', False): res = re.sub('\s*<body>', '<body>', res)
res = re.sub('\s*<body>', '<body>', res) res = re.sub('(?<=\n)\n{2}',
res = re.sub('(?<=\n)\n{2}', u'<p>\u00a0</p>\n'.encode('utf-8'), res)
u'<p>\u00a0</p>\n'.encode('utf-8'), res)
if self.opts.preprocess_html:
preprocessor = PreProcessor(self.opts, log=getattr(self, 'log', None))
res = preprocessor(res.decode('utf-8')).encode('utf-8')
f.write(res) f.write(res)
self.write_inline_css(inline_class, border_styles) self.write_inline_css(inline_class, border_styles)
stream.seek(0) stream.seek(0)

View File

@ -262,7 +262,7 @@ class RTFMLizer(object):
if hasattr(elem, 'tail') and elem.tail != None and elem.tail.strip() != '': if hasattr(elem, 'tail') and elem.tail != None and elem.tail.strip() != '':
if 'block' in tag_stack: if 'block' in tag_stack:
text += '%s ' % txt2rtf(elem.tail) text += '%s' % txt2rtf(elem.tail)
else: else:
text += '{\\par \\pard \\hyphpar %s}' % txt2rtf(elem.tail) text += '{\\par \\pard \\hyphpar %s}' % txt2rtf(elem.tail)

View File

@ -41,7 +41,7 @@ class SNBInput(InputFormatPlugin):
raise ValueError("Invalid SNB file") raise ValueError("Invalid SNB file")
log.debug("Handle meta data ...") log.debug("Handle meta data ...")
from calibre.ebooks.conversion.plumber import create_oebbook from calibre.ebooks.conversion.plumber import create_oebbook
oeb = create_oebbook(log, None, options, self, oeb = create_oebbook(log, None, options,
encoding=options.input_encoding, populate=False) encoding=options.input_encoding, populate=False)
meta = snbFile.GetFileStream('snbf/book.snbf') meta = snbFile.GetFileStream('snbf/book.snbf')
if meta != None: if meta != None:

View File

@ -1,58 +0,0 @@
# -*- coding: utf-8 -*-
__license__ = 'GPL 3'
__copyright__ = '2011, John Schember <john@nachtimwald.com>'
__docformat__ = 'restructuredtext en'
import re
from calibre import prepare_string_for_xml
class TXTHeuristicProcessor(object):
def __init__(self):
self.ITALICIZE_WORDS = [
'Etc.', 'etc.', 'viz.', 'ie.', 'i.e.', 'Ie.', 'I.e.', 'eg.',
'e.g.', 'Eg.', 'E.g.', 'et al.', 'et cetra', 'n.b.', 'N.b.',
'nota bene', 'Nota bene', 'Ste.', 'Mme.', 'Mdme.',
'Mlle.', 'Mons.', 'PS.', 'PPS.',
]
self.ITALICIZE_STYLE_PATS = [
r'(?msu)_(?P<words>.+?)_',
r'(?msu)/(?P<words>[^<>]+?)/',
r'(?msu)~~(?P<words>.+?)~~',
r'(?msu)\*(?P<words>.+?)\*',
r'(?msu)~(?P<words>.+?)~',
r'(?msu)_/(?P<words>[^<>]+?)/_',
r'(?msu)_\*(?P<words>.+?)\*_',
r'(?msu)\*/(?P<words>[^<>]+?)/\*',
r'(?msu)_\*/(?P<words>[^<>]+?)/\*_',
r'(?msu)/:(?P<words>[^<>]+?):/',
r'(?msu)\|:(?P<words>.+?):\|',
]
def process_paragraph(self, paragraph):
for word in self.ITALICIZE_WORDS:
paragraph = paragraph.replace(word, '<i>%s</i>' % word)
for pat in self.ITALICIZE_STYLE_PATS:
paragraph = re.sub(pat, lambda mo: '<i>%s</i>' % mo.group('words'), paragraph)
return paragraph
def convert(self, txt, title='', epub_split_size_kb=0):
from calibre.ebooks.txt.processor import clean_txt, split_txt, HTML_TEMPLATE
txt = clean_txt(txt)
txt = split_txt(txt, epub_split_size_kb)
processed = []
for line in txt.split('\n\n'):
processed.append(u'<p>%s</p>' % self.process_paragraph(prepare_string_for_xml(line.replace('\n', ' '))))
txt = u'\n'.join(processed)
txt = re.sub('[ ]{2,}', ' ', txt)
html = HTML_TEMPLATE % (title, txt)
from calibre.ebooks.conversion.utils import PreProcessor
pp = PreProcessor()
html = pp.markup_chapters(html, pp.get_word_count(html), False)
return html

View File

@ -12,7 +12,7 @@ from calibre.ebooks.chardet import detect
from calibre.ebooks.txt.processor import convert_basic, convert_markdown, \ from calibre.ebooks.txt.processor import convert_basic, convert_markdown, \
separate_paragraphs_single_line, separate_paragraphs_print_formatted, \ separate_paragraphs_single_line, separate_paragraphs_print_formatted, \
preserve_spaces, detect_paragraph_type, detect_formatting_type, \ preserve_spaces, detect_paragraph_type, detect_formatting_type, \
convert_heuristic, normalize_line_endings, convert_textile normalize_line_endings, convert_textile
from calibre import _ent_pat, xml_entity_to_unicode from calibre import _ent_pat, xml_entity_to_unicode
class TXTInput(InputFormatPlugin): class TXTInput(InputFormatPlugin):
@ -53,6 +53,7 @@ class TXTInput(InputFormatPlugin):
def convert(self, stream, options, file_ext, log, def convert(self, stream, options, file_ext, log,
accelerators): accelerators):
self.log = log
log.debug('Reading text from file...') log.debug('Reading text from file...')
txt = stream.read() txt = stream.read()
@ -70,21 +71,41 @@ class TXTInput(InputFormatPlugin):
txt = txt.decode(ienc, 'replace') txt = txt.decode(ienc, 'replace')
txt = _ent_pat.sub(xml_entity_to_unicode, txt) txt = _ent_pat.sub(xml_entity_to_unicode, txt)
# Normalize line endings
txt = normalize_line_endings(txt)
if options.formatting_type == 'auto':
options.formatting_type = detect_formatting_type(txt)
if options.formatting_type == 'heuristic':
setattr(options, 'enable_heuristics', True)
setattr(options, 'markup_chapter_headings', True)
setattr(options, 'italicize_common_cases', True)
setattr(options, 'fix_indents', True)
setattr(options, 'preserve_spaces', True)
setattr(options, 'delete_blank_paragraphs', True)
setattr(options, 'format_scene_breaks', True)
setattr(options, 'dehyphenate', True)
# Determine the paragraph type of the document.
if options.paragraph_type == 'auto':
options.paragraph_type = detect_paragraph_type(txt)
if options.paragraph_type == 'unknown':
log.debug('Could not reliably determine paragraph type using block')
options.paragraph_type = 'block'
else:
log.debug('Auto detected paragraph type as %s' % options.paragraph_type)
# Preserve spaces will replace multiple spaces to a space # Preserve spaces will replace multiple spaces to a space
# followed by the &nbsp; entity. # followed by the &nbsp; entity.
if options.preserve_spaces: if options.preserve_spaces:
txt = preserve_spaces(txt) txt = preserve_spaces(txt)
# Normalize line endings
txt = normalize_line_endings(txt)
# Get length for hyphen removal and punctuation unwrap # Get length for hyphen removal and punctuation unwrap
docanalysis = DocAnalysis('txt', txt) docanalysis = DocAnalysis('txt', txt)
length = docanalysis.line_length(.5) length = docanalysis.line_length(.5)
if options.formatting_type == 'auto':
options.formatting_type = detect_formatting_type(txt)
if options.formatting_type == 'markdown': if options.formatting_type == 'markdown':
log.debug('Running text though markdown conversion...') log.debug('Running text though markdown conversion...')
try: try:
@ -95,18 +116,10 @@ class TXTInput(InputFormatPlugin):
elif options.formatting_type == 'textile': elif options.formatting_type == 'textile':
log.debug('Running text though textile conversion...') log.debug('Running text though textile conversion...')
html = convert_textile(txt) html = convert_textile(txt)
else:
# Determine the paragraph type of the document.
if options.paragraph_type == 'auto':
options.paragraph_type = detect_paragraph_type(txt)
if options.paragraph_type == 'unknown':
log.debug('Could not reliably determine paragraph type using block')
options.paragraph_type = 'block'
else:
log.debug('Auto detected paragraph type as %s' % options.paragraph_type)
else:
# Dehyphenate # Dehyphenate
dehyphenator = Dehyphenator() dehyphenator = Dehyphenator(options.verbose, log=self.log)
txt = dehyphenator(txt,'txt', length) txt = dehyphenator(txt,'txt', length)
# We don't check for block because the processor assumes block. # We don't check for block because the processor assumes block.
@ -118,24 +131,15 @@ class TXTInput(InputFormatPlugin):
txt = separate_paragraphs_print_formatted(txt) txt = separate_paragraphs_print_formatted(txt)
if options.paragraph_type == 'unformatted': if options.paragraph_type == 'unformatted':
from calibre.ebooks.conversion.utils import PreProcessor from calibre.ebooks.conversion.utils import HeuristicProcessor
# get length # get length
# unwrap lines based on punctuation # unwrap lines based on punctuation
preprocessor = PreProcessor(options, log=getattr(self, 'log', None)) preprocessor = HeuristicProcessor(options, log=getattr(self, 'log', None))
txt = preprocessor.punctuation_unwrap(length, txt, 'txt') txt = preprocessor.punctuation_unwrap(length, txt, 'txt')
flow_size = getattr(options, 'flow_size', 0) flow_size = getattr(options, 'flow_size', 0)
html = convert_basic(txt, epub_split_size_kb=flow_size)
if options.formatting_type == 'heuristic':
html = convert_heuristic(txt, epub_split_size_kb=flow_size)
else:
html = convert_basic(txt, epub_split_size_kb=flow_size)
# Dehyphenate in cleanup mode for missed txt and markdown conversion
dehyphenator = Dehyphenator()
html = dehyphenator(html,'txt_cleanup', length)
html = dehyphenator(html,'html_cleanup', length)
from calibre.customize.ui import plugin_for_input_format from calibre.customize.ui import plugin_for_input_format
html_input = plugin_for_input_format('html') html_input = plugin_for_input_format('html')

View File

@ -51,12 +51,12 @@ class TXTOutput(OutputFormatPlugin):
recommended_value=False, level=OptionRecommendation.LOW, recommended_value=False, level=OptionRecommendation.LOW,
help=_('Do not remove links within the document. This is only ' \ help=_('Do not remove links within the document. This is only ' \
'useful when paired with the markdown-format option because' \ 'useful when paired with the markdown-format option because' \
'links are always removed with plain text output.')), ' links are always removed with plain text output.')),
OptionRecommendation(name='keep_image_references', OptionRecommendation(name='keep_image_references',
recommended_value=False, level=OptionRecommendation.LOW, recommended_value=False, level=OptionRecommendation.LOW,
help=_('Do not remove image references within the document. This is only ' \ help=_('Do not remove image references within the document. This is only ' \
'useful when paired with the markdown-format option because' \ 'useful when paired with the markdown-format option because' \
'image references are always removed with plain text output.')), ' image references are always removed with plain text output.')),
]) ])
def convert(self, oeb_book, output_path, input_plugin, opts, log): def convert(self, oeb_book, output_path, input_plugin, opts, log):

View File

@ -12,7 +12,6 @@ import os, re
from calibre import prepare_string_for_xml, isbytestring from calibre import prepare_string_for_xml, isbytestring
from calibre.ebooks.metadata.opf2 import OPFCreator from calibre.ebooks.metadata.opf2 import OPFCreator
from calibre.ebooks.txt.heuristicprocessor import TXTHeuristicProcessor
from calibre.ebooks.conversion.preprocess import DocAnalysis from calibre.ebooks.conversion.preprocess import DocAnalysis
from calibre.utils.cleantext import clean_ascii_chars from calibre.utils.cleantext import clean_ascii_chars
@ -67,10 +66,6 @@ def convert_basic(txt, title='', epub_split_size_kb=0):
return HTML_TEMPLATE % (title, u'\n'.join(lines)) return HTML_TEMPLATE % (title, u'\n'.join(lines))
def convert_heuristic(txt, title='', epub_split_size_kb=0):
tp = TXTHeuristicProcessor()
return tp.convert(txt, title, epub_split_size_kb)
def convert_markdown(txt, title='', disable_toc=False): def convert_markdown(txt, title='', disable_toc=False):
from calibre.ebooks.markdown import markdown from calibre.ebooks.markdown import markdown
md = markdown.Markdown( md = markdown.Markdown(
@ -180,9 +175,9 @@ def detect_formatting_type(txt):
# Block quote. # Block quote.
textile_count += len(re.findall(r'(?mu)^bq\.', txt)) textile_count += len(re.findall(r'(?mu)^bq\.', txt))
# Images # Images
textile_count += len(re.findall(r'\![^\s]+(:[^\s]+)*', txt)) textile_count += len(re.findall(r'\![^\s]+(?=.*?/)(:[^\s]+)*', txt))
# Links # Links
textile_count += len(re.findall(r'"(\(.+?\))*[^\(]+?(\(.+?\))*":[^\s]+', txt)) textile_count += len(re.findall(r'"(?=".*?\()(\(.+?\))*[^\(]+?(\(.+?\))*":[^\s]+', txt))
if markdown_count > 5 or textile_count > 5: if markdown_count > 5 or textile_count > 5:
if markdown_count > textile_count: if markdown_count > textile_count:

View File

@ -8,11 +8,12 @@ __docformat__ = 'restructuredtext en'
import os import os
from functools import partial from functools import partial
from PyQt4.Qt import QInputDialog, QPixmap, QMenu from PyQt4.Qt import QPixmap, QMenu
from calibre.gui2 import error_dialog, choose_files, \ from calibre.gui2 import error_dialog, choose_files, \
choose_dir, warning_dialog, info_dialog choose_dir, warning_dialog, info_dialog
from calibre.gui2.dialogs.add_empty_book import AddEmptyBookDialog
from calibre.gui2.widgets import IMAGE_EXTENSIONS from calibre.gui2.widgets import IMAGE_EXTENSIONS
from calibre.ebooks import BOOK_EXTENSIONS from calibre.ebooks import BOOK_EXTENSIONS
from calibre.utils.filenames import ascii_filename from calibre.utils.filenames import ascii_filename
@ -42,7 +43,7 @@ class AddAction(InterfaceAction):
'ebook file is a different book)'), self.add_recursive_multiple) 'ebook file is a different book)'), self.add_recursive_multiple)
self.add_menu.addSeparator() self.add_menu.addSeparator()
self.add_menu.addAction(_('Add Empty book. (Book entry with no ' self.add_menu.addAction(_('Add Empty book. (Book entry with no '
'formats)'), self.add_empty) 'formats)'), self.add_empty, _('Shift+Ctrl+E'))
self.add_menu.addAction(_('Add from ISBN'), self.add_from_isbn) self.add_menu.addAction(_('Add from ISBN'), self.add_from_isbn)
self.qaction.setMenu(self.add_menu) self.qaction.setMenu(self.add_menu)
self.qaction.triggered.connect(self.add_books) self.qaction.triggered.connect(self.add_books)
@ -83,12 +84,21 @@ class AddAction(InterfaceAction):
Add an empty book item to the library. This does not import any formats Add an empty book item to the library. This does not import any formats
from a book file. from a book file.
''' '''
num, ok = QInputDialog.getInt(self.gui, _('How many empty books?'), author = None
_('How many empty books should be added?'), 1, 1, 100) index = self.gui.library_view.currentIndex()
if ok: if index.isValid():
raw = index.model().db.authors(index.row())
if raw:
authors = [a.strip().replace('|', ',') for a in raw.split(',')]
if authors:
author = authors[0]
dlg = AddEmptyBookDialog(self.gui, self.gui.library_view.model().db, author)
if dlg.exec_() == dlg.Accepted:
num = dlg.qty_to_add
from calibre.ebooks.metadata import MetaInformation from calibre.ebooks.metadata import MetaInformation
for x in xrange(num): for x in xrange(num):
self.gui.library_view.model().db.import_book(MetaInformation(None), []) mi = MetaInformation(_('Unknown'), dlg.selected_authors)
self.gui.library_view.model().db.import_book(mi, [])
self.gui.library_view.model().books_added(num) self.gui.library_view.model().books_added(num)
def add_isbns(self, books, add_tags=[]): def add_isbns(self, books, add_tags=[]):

View File

@ -32,7 +32,7 @@ class LibraryUsageStats(object): # {{{
locs = list(self.stats.keys()) locs = list(self.stats.keys())
locs.sort(cmp=lambda x, y: cmp(self.stats[x], self.stats[y]), locs.sort(cmp=lambda x, y: cmp(self.stats[x], self.stats[y]),
reverse=True) reverse=True)
for key in locs[15:]: for key in locs[25:]:
self.stats.pop(key) self.stats.pop(key)
gprefs.set('library_usage_stats', self.stats) gprefs.set('library_usage_stats', self.stats)
@ -384,7 +384,28 @@ class ChooseLibraryAction(InterfaceAction):
return return
prefs['library_path'] = loc prefs['library_path'] = loc
#from calibre.utils.mem import memory
#import weakref
#from PyQt4.Qt import QTimer
#self.dbref = weakref.ref(self.gui.library_view.model().db)
#self.before_mem = memory()/1024**2
self.gui.library_moved(loc) self.gui.library_moved(loc)
#QTimer.singleShot(1000, self.debug_leak)
def debug_leak(self):
import gc
from calibre.utils.mem import memory
ref = self.dbref
for i in xrange(3): gc.collect()
if ref() is not None:
print 11111, ref()
for r in gc.get_referrers(ref())[:10]:
print r
print
print 'before:', self.before_mem
print 'after:', memory()/1024**2
self.dbref = self.before_mem = None
def qs_requested(self, idx, *args): def qs_requested(self, idx, *args):
self.switch_requested(self.qs_locations[idx]) self.switch_requested(self.qs_locations[idx])

View File

@ -144,6 +144,9 @@ class PluginWidget(QWidget,Ui_Form):
# Hook changes to thumb_width # Hook changes to thumb_width
self.thumb_width.valueChanged.connect(self.thumb_width_changed) self.thumb_width.valueChanged.connect(self.thumb_width_changed)
# Hook changes to Description section
self.generate_descriptions.stateChanged.connect(self.generate_descriptions_changed)
def options(self): def options(self):
# Save/return the current options # Save/return the current options
# exclude_genre stores literally # exclude_genre stores literally
@ -265,7 +268,7 @@ class PluginWidget(QWidget,Ui_Form):
custom_fields = {} custom_fields = {}
for custom_field in all_custom_fields: for custom_field in all_custom_fields:
field_md = self.db.metadata_for_field(custom_field) field_md = self.db.metadata_for_field(custom_field)
if field_md['datatype'] in ['text','comments']: if field_md['datatype'] in ['text','comments','composite']:
custom_fields[field_md['name']] = {'field':custom_field, custom_fields[field_md['name']] = {'field':custom_field,
'datatype':field_md['datatype']} 'datatype':field_md['datatype']}
# Blank field first # Blank field first
@ -324,6 +327,28 @@ class PluginWidget(QWidget,Ui_Form):
else: else:
self.exclude_pattern.setEnabled(False) self.exclude_pattern.setEnabled(False)
def generate_descriptions_changed(self,new_state):
'''
Process changes to Descriptions section
0: unchecked
2: checked
'''
return
if new_state == 0:
# unchecked
self.merge_source_field.setEnabled(False)
self.merge_before.setEnabled(False)
self.merge_after.setEnabled(False)
self.include_hr.setEnabled(False)
elif new_state == 2:
# checked
self.merge_source_field.setEnabled(True)
self.merge_before.setEnabled(True)
self.merge_after.setEnabled(True)
self.include_hr.setEnabled(True)
def header_note_source_field_changed(self,new_index): def header_note_source_field_changed(self,new_index):
''' '''
Process changes in the header_note_source_field combo box Process changes in the header_note_source_field combo box

View File

@ -35,7 +35,7 @@
</size> </size>
</property> </property>
<property name="toolTip"> <property name="toolTip">
<string>Sections to include in catalog. All catalogs include 'Books by Author'.</string> <string>Sections to include in catalog.</string>
</property> </property>
<property name="title"> <property name="title">
<string>Included sections</string> <string>Included sections</string>
@ -79,13 +79,13 @@
<item row="0" column="0"> <item row="0" column="0">
<widget class="QCheckBox" name="generate_authors"> <widget class="QCheckBox" name="generate_authors">
<property name="enabled"> <property name="enabled">
<bool>false</bool> <bool>true</bool>
</property> </property>
<property name="text"> <property name="text">
<string>Books by Author</string> <string>Books by Author</string>
</property> </property>
<property name="checked"> <property name="checked">
<bool>true</bool> <bool>false</bool>
</property> </property>
</widget> </widget>
</item> </item>

View File

@ -12,7 +12,8 @@ from lxml.html import soupparser
from PyQt4.Qt import QApplication, QFontInfo, QSize, QWidget, QPlainTextEdit, \ from PyQt4.Qt import QApplication, QFontInfo, QSize, QWidget, QPlainTextEdit, \
QToolBar, QVBoxLayout, QAction, QIcon, Qt, QTabWidget, QUrl, \ QToolBar, QVBoxLayout, QAction, QIcon, Qt, QTabWidget, QUrl, \
QSyntaxHighlighter, QColor, QChar, QColorDialog, QMenu, QInputDialog QSyntaxHighlighter, QColor, QChar, QColorDialog, QMenu, QInputDialog, \
QHBoxLayout
from PyQt4.QtWebKit import QWebView, QWebPage from PyQt4.QtWebKit import QWebView, QWebPage
from calibre.ebooks.chardet import xml_to_unicode from calibre.ebooks.chardet import xml_to_unicode
@ -488,7 +489,7 @@ class Highlighter(QSyntaxHighlighter):
class Editor(QWidget): # {{{ class Editor(QWidget): # {{{
def __init__(self, parent=None): def __init__(self, parent=None, one_line_toolbar=False):
QWidget.__init__(self, parent) QWidget.__init__(self, parent)
self.toolbar1 = QToolBar(self) self.toolbar1 = QToolBar(self)
self.toolbar2 = QToolBar(self) self.toolbar2 = QToolBar(self)
@ -508,9 +509,14 @@ class Editor(QWidget): # {{{
self.wyswyg.layout = l = QVBoxLayout(self.wyswyg) self.wyswyg.layout = l = QVBoxLayout(self.wyswyg)
self.setLayout(self._layout) self.setLayout(self._layout)
l.setContentsMargins(0, 0, 0, 0) l.setContentsMargins(0, 0, 0, 0)
l.addWidget(self.toolbar1) if one_line_toolbar:
l.addWidget(self.toolbar2) tb = QHBoxLayout()
l.addWidget(self.toolbar3) l.addLayout(tb)
else:
tb = l
tb.addWidget(self.toolbar1)
tb.addWidget(self.toolbar2)
tb.addWidget(self.toolbar3)
l.addWidget(self.editor) l.addWidget(self.editor)
self._layout.addWidget(self.tabs) self._layout.addWidget(self.tabs)
self.tabs.addTab(self.wyswyg, _('Normal view')) self.tabs.addTab(self.wyswyg, _('Normal view'))

View File

@ -11,6 +11,8 @@ from calibre.gui2.convert.single import Config, sort_formats_by_preference, \
from calibre.customize.ui import available_output_formats from calibre.customize.ui import available_output_formats
from calibre.gui2 import ResizableDialog from calibre.gui2 import ResizableDialog
from calibre.gui2.convert.look_and_feel import LookAndFeelWidget from calibre.gui2.convert.look_and_feel import LookAndFeelWidget
from calibre.gui2.convert.heuristics import HeuristicsWidget
from calibre.gui2.convert.search_and_replace import SearchAndReplaceWidget
from calibre.gui2.convert.page_setup import PageSetupWidget from calibre.gui2.convert.page_setup import PageSetupWidget
from calibre.gui2.convert.structure_detection import StructureDetectionWidget from calibre.gui2.convert.structure_detection import StructureDetectionWidget
from calibre.gui2.convert.toc import TOCWidget from calibre.gui2.convert.toc import TOCWidget
@ -69,6 +71,8 @@ class BulkConfig(Config):
self.setWindowTitle(_('Bulk Convert')) self.setWindowTitle(_('Bulk Convert'))
lf = widget_factory(LookAndFeelWidget) lf = widget_factory(LookAndFeelWidget)
hw = widget_factory(HeuristicsWidget)
sr = widget_factory(SearchAndReplaceWidget)
ps = widget_factory(PageSetupWidget) ps = widget_factory(PageSetupWidget)
sd = widget_factory(StructureDetectionWidget) sd = widget_factory(StructureDetectionWidget)
toc = widget_factory(TOCWidget) toc = widget_factory(TOCWidget)
@ -90,7 +94,7 @@ class BulkConfig(Config):
if not c: break if not c: break
self.stack.removeWidget(c) self.stack.removeWidget(c)
widgets = [lf, ps, sd, toc] widgets = [lf, hw, ps, sd, toc, sr]
if output_widget is not None: if output_widget is not None:
widgets.append(output_widget) widgets.append(output_widget)
for w in widgets: for w in widgets:

View File

@ -12,17 +12,24 @@ from calibre.customize.ui import plugin_for_catalog_format
from calibre.utils.logging import Log from calibre.utils.logging import Log
def gui_convert(input, output, recommendations, notification=DummyReporter(), def gui_convert(input, output, recommendations, notification=DummyReporter(),
abort_after_input_dump=False, log=None): abort_after_input_dump=False, log=None, override_input_metadata=False):
recommendations = list(recommendations) recommendations = list(recommendations)
recommendations.append(('verbose', 2, OptionRecommendation.HIGH)) recommendations.append(('verbose', 2, OptionRecommendation.HIGH))
if log is None: if log is None:
log = Log() log = Log()
plumber = Plumber(input, output, log, report_progress=notification, plumber = Plumber(input, output, log, report_progress=notification,
abort_after_input_dump=abort_after_input_dump) abort_after_input_dump=abort_after_input_dump,
override_input_metadata=override_input_metadata)
plumber.merge_ui_recommendations(recommendations) plumber.merge_ui_recommendations(recommendations)
plumber.run() plumber.run()
def gui_convert_override(input, output, recommendations, notification=DummyReporter(),
abort_after_input_dump=False, log=None):
gui_convert(input, output, recommendations, notification=notification,
abort_after_input_dump=abort_after_input_dump, log=log,
override_input_metadata=True)
def gui_catalog(fmt, title, dbspec, ids, out_file_name, sync, fmt_options, connected_device, def gui_catalog(fmt, title, dbspec, ids, out_file_name, sync, fmt_options, connected_device,
notification=DummyReporter(), log=None): notification=DummyReporter(), log=None):
if log is None: if log is None:

View File

@ -0,0 +1,58 @@
# -*- coding: utf-8 -*-
__license__ = 'GPL 3'
__copyright__ = '2011, John Schember <john@nachtimwald.com>'
__docformat__ = 'restructuredtext en'
from PyQt4.Qt import Qt
from calibre.gui2.convert.heuristics_ui import Ui_Form
from calibre.gui2.convert import Widget
class HeuristicsWidget(Widget, Ui_Form):
TITLE = _('Heuristic\nProcessing')
HELP = _('Modify the document text and structure using common patterns.')
COMMIT_NAME = 'heuristics'
ICON = I('heuristics.png')
def __init__(self, parent, get_option, get_help, db=None, book_id=None):
Widget.__init__(self, parent,
['enable_heuristics', 'markup_chapter_headings',
'italicize_common_cases', 'fix_indents',
'html_unwrap_factor', 'unwrap_lines',
'delete_blank_paragraphs', 'format_scene_breaks',
'dehyphenate', 'renumber_headings']
)
self.db, self.book_id = db, book_id
self.initialize_options(get_option, get_help, db, book_id)
self.opt_enable_heuristics.stateChanged.connect(self.enable_heuristics)
self.opt_unwrap_lines.stateChanged.connect(self.enable_unwrap)
self.enable_heuristics(self.opt_enable_heuristics.checkState())
def break_cycles(self):
Widget.break_cycles(self)
try:
self.opt_enable_heuristics.stateChanged.disconnect()
self.opt_unwrap_lines.stateChanged.disconnect()
except:
pass
def set_value_handler(self, g, val):
if val is None and g is self.opt_html_unwrap_factor:
g.setValue(0.0)
return True
def enable_heuristics(self, state):
state = state == Qt.Checked
self.heuristic_options.setEnabled(state)
def enable_unwrap(self, state):
if state == Qt.Checked:
state = True
else:
state = False
self.opt_html_unwrap_factor.setEnabled(state)

View File

@ -0,0 +1,227 @@
<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
<class>Form</class>
<widget class="QWidget" name="Form">
<property name="geometry">
<rect>
<x>0</x>
<y>0</y>
<width>724</width>
<height>470</height>
</rect>
</property>
<property name="windowTitle">
<string>Form</string>
</property>
<layout class="QVBoxLayout" name="verticalLayout">
<item>
<widget class="QLabel" name="label">
<property name="text">
<string>&lt;b&gt;Heuristic processing&lt;/b&gt; means that calibre will scan your book for common patterns and fix them. As the name implies, this involves guesswork, which means that it could end up worsening the result of a conversion, if calibre guesses wrong. Therefore, it is disabled by default. Often, if a conversion does not turn out as you expect, turning on heuristics can improve matters. Read more about the various heuristic processing options in the &lt;a href=&quot;http://calibre-ebook.com/user_manual/conversion.html#heuristic-processing&quot;&gt;User Manual&lt;/a&gt;.</string>
</property>
<property name="wordWrap">
<bool>true</bool>
</property>
<property name="openExternalLinks">
<bool>true</bool>
</property>
</widget>
</item>
<item>
<spacer name="verticalSpacer_2">
<property name="orientation">
<enum>Qt::Vertical</enum>
</property>
<property name="sizeType">
<enum>QSizePolicy::Fixed</enum>
</property>
<property name="sizeHint" stdset="0">
<size>
<width>20</width>
<height>15</height>
</size>
</property>
</spacer>
</item>
<item>
<widget class="QCheckBox" name="opt_enable_heuristics">
<property name="text">
<string>Enable &amp;heuristic processing</string>
</property>
</widget>
</item>
<item>
<widget class="QGroupBox" name="heuristic_options">
<property name="title">
<string>Heuristic Processing</string>
</property>
<layout class="QVBoxLayout" name="verticalLayout_2">
<item>
<widget class="QCheckBox" name="opt_unwrap_lines">
<property name="text">
<string>Unwrap lines</string>
</property>
</widget>
</item>
<item>
<layout class="QHBoxLayout" name="horizontalLayout">
<item>
<spacer name="horizontalSpacer">
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
<property name="sizeType">
<enum>QSizePolicy::Fixed</enum>
</property>
<property name="sizeHint" stdset="0">
<size>
<width>40</width>
<height>20</height>
</size>
</property>
</spacer>
</item>
<item>
<widget class="QLabel" name="huf_label">
<property name="text">
<string>Line &amp;un-wrap factor :</string>
</property>
<property name="buddy">
<cstring>opt_html_unwrap_factor</cstring>
</property>
</widget>
</item>
<item>
<widget class="QDoubleSpinBox" name="opt_html_unwrap_factor">
<property name="toolTip">
<string/>
</property>
<property name="maximum">
<double>1.000000000000000</double>
</property>
<property name="singleStep">
<double>0.050000000000000</double>
</property>
<property name="value">
<double>0.400000000000000</double>
</property>
</widget>
</item>
<item>
<spacer name="horizontalSpacer_2">
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
<property name="sizeHint" stdset="0">
<size>
<width>40</width>
<height>20</height>
</size>
</property>
</spacer>
</item>
</layout>
</item>
<item>
<widget class="QCheckBox" name="opt_markup_chapter_headings">
<property name="text">
<string>Detect and markup unformatted chapter headings and sub headings</string>
</property>
</widget>
</item>
<item>
<widget class="QCheckBox" name="opt_renumber_headings">
<property name="text">
<string>Renumber sequences of &lt;h1&gt; or &lt;h2&gt; tags to prevent splitting</string>
</property>
</widget>
</item>
<item>
<widget class="QCheckBox" name="opt_delete_blank_paragraphs">
<property name="text">
<string>Delete blank lines between paragraphs</string>
</property>
</widget>
</item>
<item>
<widget class="QCheckBox" name="opt_format_scene_breaks">
<property name="text">
<string>Ensure scene breaks are consistently formatted</string>
</property>
</widget>
</item>
<item>
<widget class="QCheckBox" name="opt_dehyphenate">
<property name="text">
<string>Remove unnecessary hyphens</string>
</property>
</widget>
</item>
<item>
<widget class="QCheckBox" name="opt_italicize_common_cases">
<property name="text">
<string>Italicize common words and patterns</string>
</property>
</widget>
</item>
<item>
<widget class="QCheckBox" name="opt_fix_indents">
<property name="text">
<string>Replace entity indents with CSS indents</string>
</property>
</widget>
</item>
<item>
<spacer name="verticalSpacer">
<property name="orientation">
<enum>Qt::Vertical</enum>
</property>
<property name="sizeHint" stdset="0">
<size>
<width>131</width>
<height>35</height>
</size>
</property>
</spacer>
</item>
</layout>
</widget>
</item>
</layout>
</widget>
<resources/>
<connections>
<connection>
<sender>opt_enable_heuristics</sender>
<signal>toggled(bool)</signal>
<receiver>opt_html_unwrap_factor</receiver>
<slot>setEnabled(bool)</slot>
<hints>
<hint type="sourcelabel">
<x>328</x>
<y>87</y>
</hint>
<hint type="destinationlabel">
<x>481</x>
<y>113</y>
</hint>
</hints>
</connection>
<connection>
<sender>opt_enable_heuristics</sender>
<signal>toggled(bool)</signal>
<receiver>huf_label</receiver>
<slot>setEnabled(bool)</slot>
<hints>
<hint type="sourcelabel">
<x>295</x>
<y>88</y>
</hint>
<hint type="destinationlabel">
<x>291</x>
<y>105</y>
</hint>
</hints>
</connection>
</connections>
</ui>

View File

@ -6,8 +6,6 @@ __docformat__ = 'restructuredtext en'
from calibre.gui2.convert.pdb_output_ui import Ui_Form from calibre.gui2.convert.pdb_output_ui import Ui_Form
from calibre.gui2.convert import Widget from calibre.gui2.convert import Widget
from calibre.ebooks.pdb import FORMAT_WRITERS
from calibre.gui2.widgets import BasicComboModel
format_model = None format_model = None
@ -21,17 +19,8 @@ class PluginWidget(Widget, Ui_Form):
def __init__(self, parent, get_option, get_help, db=None, book_id=None): def __init__(self, parent, get_option, get_help, db=None, book_id=None):
Widget.__init__(self, parent, ['format', 'inline_toc', 'pdb_output_encoding']) Widget.__init__(self, parent, ['format', 'inline_toc', 'pdb_output_encoding'])
self.db, self.book_id = db, book_id self.db, self.book_id = db, book_id
for x in get_option('format').option.choices:
self.opt_format.addItem(x)
self.initialize_options(get_option, get_help, db, book_id) self.initialize_options(get_option, get_help, db, book_id)
default = self.opt_format.currentText()
global format_model
if format_model is None:
format_model = BasicComboModel(FORMAT_WRITERS.keys())
self.format_model = format_model
self.opt_format.setModel(self.format_model)
default_index = self.opt_format.findText(default)
format_index = self.opt_format.findText('doc')
self.opt_format.setCurrentIndex(default_index if default_index != -1 else format_index if format_index != -1 else 0)

View File

@ -6,8 +6,6 @@ __docformat__ = 'restructuredtext en'
from calibre.gui2.convert.pdf_output_ui import Ui_Form from calibre.gui2.convert.pdf_output_ui import Ui_Form
from calibre.gui2.convert import Widget from calibre.gui2.convert import Widget
from calibre.ebooks.pdf.pageoptions import PAPER_SIZES, ORIENTATIONS
from calibre.gui2.widgets import BasicComboModel
paper_size_model = None paper_size_model = None
orientation_model = None orientation_model = None
@ -23,28 +21,11 @@ class PluginWidget(Widget, Ui_Form):
Widget.__init__(self, parent, ['paper_size', Widget.__init__(self, parent, ['paper_size',
'orientation', 'preserve_cover_aspect_ratio']) 'orientation', 'preserve_cover_aspect_ratio'])
self.db, self.book_id = db, book_id self.db, self.book_id = db, book_id
for x in get_option('paper_size').option.choices:
self.opt_paper_size.addItem(x)
for x in get_option('orientation').option.choices:
self.opt_orientation.addItem(x)
self.initialize_options(get_option, get_help, db, book_id) self.initialize_options(get_option, get_help, db, book_id)
default_paper_size = self.opt_paper_size.currentText()
default_orientation = self.opt_orientation.currentText()
global paper_size_model
if paper_size_model is None:
paper_size_model = BasicComboModel(PAPER_SIZES.keys())
self.paper_size_model = paper_size_model
self.opt_paper_size.setModel(self.paper_size_model)
default_paper_size_index = self.opt_paper_size.findText(default_paper_size)
letter_index = self.opt_paper_size.findText('letter')
self.opt_paper_size.setCurrentIndex(default_paper_size_index if default_paper_size_index != -1 else letter_index if letter_index != -1 else 0)
global orientation_model
if orientation_model is None:
orientation_model = BasicComboModel(ORIENTATIONS.keys())
self.orientation_model = orientation_model
self.opt_orientation.setModel(self.orientation_model)
default_orientation_index = self.opt_orientation.findText(default_orientation)
orientation_index = self.opt_orientation.findText('portrait')
self.opt_orientation.setCurrentIndex(default_orientation_index if default_orientation_index != -1 else orientation_index if orientation_index != -1 else 0)

View File

@ -35,6 +35,10 @@ class RegexBuilder(QDialog, Ui_RegexBuilder):
self.connect(self.button_box, SIGNAL('clicked(QAbstractButton*)'), self.button_clicked) self.connect(self.button_box, SIGNAL('clicked(QAbstractButton*)'), self.button_clicked)
self.connect(self.regex, SIGNAL('textChanged(QString)'), self.regex_valid) self.connect(self.regex, SIGNAL('textChanged(QString)'), self.regex_valid)
self.connect(self.test, SIGNAL('clicked()'), self.do_test) self.connect(self.test, SIGNAL('clicked()'), self.do_test)
self.connect(self.previous, SIGNAL('clicked()'), self.goto_previous)
self.connect(self.next, SIGNAL('clicked()'), self.goto_next)
self.match_locs = []
def regex_valid(self): def regex_valid(self):
regex = unicode(self.regex.text()) regex = unicode(self.regex.text())
@ -42,15 +46,23 @@ class RegexBuilder(QDialog, Ui_RegexBuilder):
try: try:
re.compile(regex) re.compile(regex)
self.regex.setStyleSheet('QLineEdit { color: black; background-color: rgba(0,255,0,20%); }') self.regex.setStyleSheet('QLineEdit { color: black; background-color: rgba(0,255,0,20%); }')
return True
except: except:
self.regex.setStyleSheet('QLineEdit { color: black; background-color: rgb(255,0,0,20%); }') self.regex.setStyleSheet('QLineEdit { color: black; background-color: rgb(255,0,0,20%); }')
return False
else: else:
self.regex.setStyleSheet('QLineEdit { color: black; background-color: white; }') self.regex.setStyleSheet('QLineEdit { color: black; background-color: white; }')
return True self.preview.setExtraSelections([])
self.match_locs = []
self.next.setEnabled(False)
self.previous.setEnabled(False)
self.occurrences.setText('0')
return False
def do_test(self): def do_test(self):
selections = [] selections = []
self.match_locs = []
if self.regex_valid(): if self.regex_valid():
text = unicode(self.preview.toPlainText()) text = unicode(self.preview.toPlainText())
regex = unicode(self.regex.text()) regex = unicode(self.regex.text())
@ -64,9 +76,43 @@ class RegexBuilder(QDialog, Ui_RegexBuilder):
es.cursor.setPosition(match.start(), QTextCursor.MoveAnchor) es.cursor.setPosition(match.start(), QTextCursor.MoveAnchor)
es.cursor.setPosition(match.end(), QTextCursor.KeepAnchor) es.cursor.setPosition(match.end(), QTextCursor.KeepAnchor)
selections.append(es) selections.append(es)
self.match_locs.append((match.start(), match.end()))
except: except:
pass pass
self.preview.setExtraSelections(selections) self.preview.setExtraSelections(selections)
if self.match_locs:
self.next.setEnabled(True)
self.previous.setEnabled(True)
self.occurrences.setText(str(len(self.match_locs)))
def goto_previous(self):
pos = self.preview.textCursor().position()
if self.match_locs:
match_loc = len(self.match_locs) - 1
for i in xrange(len(self.match_locs) - 1, -1, -1):
loc = self.match_locs[i][1]
if pos > loc:
match_loc = i
break
self.goto_loc(self.match_locs[match_loc][1], operation=QTextCursor.Left, n=self.match_locs[match_loc][1] - self.match_locs[match_loc][0])
def goto_next(self):
pos = self.preview.textCursor().position()
if self.match_locs:
match_loc = 0
for i in xrange(len(self.match_locs)):
loc = self.match_locs[i][0]
if pos < loc:
match_loc = i
break
self.goto_loc(self.match_locs[match_loc][0], n=self.match_locs[match_loc][1] - self.match_locs[match_loc][0])
def goto_loc(self, loc, operation=QTextCursor.Right, mode=QTextCursor.KeepAnchor, n=0):
cursor = QTextCursor(self.preview.document())
cursor.setPosition(loc)
if n:
cursor.movePosition(operation, mode, n)
self.preview.setTextCursor(cursor)
def select_format(self, db, book_id): def select_format(self, db, book_id):
format = None format = None
@ -125,6 +171,11 @@ class RegexEdit(QWidget, Ui_Edit):
if bld.exec_() == bld.Accepted: if bld.exec_() == bld.Accepted:
self.edit.setText(bld.regex.text()) self.edit.setText(bld.regex.text())
def setObjectName(self, *args):
QWidget.setObjectName(self, *args)
if hasattr(self, 'edit'):
self.edit.initialize('regex_edit_'+unicode(self.objectName()))
def set_msg(self, msg): def set_msg(self, msg):
self.msg.setText(msg) self.msg.setText(msg)

View File

@ -6,15 +6,102 @@
<rect> <rect>
<x>0</x> <x>0</x>
<y>0</y> <y>0</y>
<width>662</width> <width>580</width>
<height>505</height> <height>503</height>
</rect> </rect>
</property> </property>
<property name="windowTitle"> <property name="windowTitle">
<string>Regex Builder</string> <string>Regex Builder</string>
</property> </property>
<layout class="QGridLayout" name="gridLayout"> <layout class="QVBoxLayout" name="verticalLayout_2">
<item row="1" column="0" colspan="5"> <item>
<layout class="QHBoxLayout" name="horizontalLayout_6">
<item>
<widget class="QLabel" name="label">
<property name="text">
<string>Regex:</string>
</property>
</widget>
</item>
<item>
<layout class="QHBoxLayout" name="horizontalLayout">
<item>
<widget class="QLineEdit" name="regex"/>
</item>
<item>
<widget class="QPushButton" name="test">
<property name="text">
<string>Test</string>
</property>
</widget>
</item>
</layout>
</item>
</layout>
</item>
<item>
<layout class="QHBoxLayout" name="horizontalLayout_5">
<item>
<widget class="QLabel" name="label_3">
<property name="text">
<string>Occurrences:</string>
</property>
</widget>
</item>
<item>
<widget class="QLabel" name="occurrences">
<property name="text">
<string>0</string>
</property>
</widget>
</item>
<item>
<spacer name="horizontalSpacer_2">
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
<property name="sizeHint" stdset="0">
<size>
<width>298</width>
<height>20</height>
</size>
</property>
</spacer>
</item>
<item>
<layout class="QHBoxLayout" name="horizontalLayout_3">
<item>
<widget class="QLabel" name="label_2">
<property name="text">
<string>Goto:</string>
</property>
</widget>
</item>
<item>
<widget class="QPushButton" name="previous">
<property name="enabled">
<bool>false</bool>
</property>
<property name="text">
<string>&amp;Previous</string>
</property>
</widget>
</item>
<item>
<widget class="QPushButton" name="next">
<property name="enabled">
<bool>false</bool>
</property>
<property name="text">
<string>&amp;Next</string>
</property>
</widget>
</item>
</layout>
</item>
</layout>
</item>
<item>
<widget class="QGroupBox" name="groupBox"> <widget class="QGroupBox" name="groupBox">
<property name="title"> <property name="title">
<string>Preview</string> <string>Preview</string>
@ -36,32 +123,28 @@
</layout> </layout>
</widget> </widget>
</item> </item>
<item row="2" column="4"> <item>
<widget class="QDialogButtonBox" name="button_box"> <layout class="QHBoxLayout" name="horizontalLayout_4">
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
<property name="standardButtons">
<set>QDialogButtonBox::Cancel|QDialogButtonBox::Ok</set>
</property>
</widget>
</item>
<item row="0" column="0">
<widget class="QLabel" name="label">
<property name="text">
<string>Regex:</string>
</property>
</widget>
</item>
<item row="0" column="1" colspan="4">
<layout class="QHBoxLayout" name="horizontalLayout">
<item> <item>
<widget class="QLineEdit" name="regex"/> <spacer name="horizontalSpacer">
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
<property name="sizeHint" stdset="0">
<size>
<width>328</width>
<height>20</height>
</size>
</property>
</spacer>
</item> </item>
<item> <item>
<widget class="QPushButton" name="test"> <widget class="QDialogButtonBox" name="button_box">
<property name="text"> <property name="orientation">
<string>Test</string> <enum>Qt::Horizontal</enum>
</property>
<property name="standardButtons">
<set>QDialogButtonBox::Cancel|QDialogButtonBox::Ok</set>
</property> </property>
</widget> </widget>
</item> </item>

View File

@ -0,0 +1,55 @@
# -*- coding: utf-8 -*-
__license__ = 'GPL 3'
__copyright__ = '2011, John Schember <john@nachtimwald.com>'
__docformat__ = 'restructuredtext en'
import re
from calibre.gui2.convert.search_and_replace_ui import Ui_Form
from calibre.gui2.convert import Widget
from calibre.gui2 import error_dialog
class SearchAndReplaceWidget(Widget, Ui_Form):
TITLE = _('Search\n&\nReplace')
HELP = _('Modify the document text and structure using user defined patterns.')
COMMIT_NAME = 'search_and_replace'
ICON = I('search.png')
def __init__(self, parent, get_option, get_help, db=None, book_id=None):
Widget.__init__(self, parent,
['sr1_search', 'sr1_replace',
'sr2_search', 'sr2_replace',
'sr3_search', 'sr3_replace']
)
self.db, self.book_id = db, book_id
self.initialize_options(get_option, get_help, db, book_id)
self.opt_sr1_search.set_msg(_('&Search Regular Expression'))
self.opt_sr1_search.set_book_id(book_id)
self.opt_sr1_search.set_db(db)
self.opt_sr2_search.set_msg(_('&Search Regular Expression'))
self.opt_sr2_search.set_book_id(book_id)
self.opt_sr2_search.set_db(db)
self.opt_sr3_search.set_msg(_('&Search Regular Expression'))
self.opt_sr3_search.set_book_id(book_id)
self.opt_sr3_search.set_db(db)
def break_cycles(self):
Widget.break_cycles(self)
self.opt_sr1_search.break_cycles()
self.opt_sr2_search.break_cycles()
self.opt_sr3_search.break_cycles()
def pre_commit_check(self):
for x in ('sr1_search', 'sr2_search', 'sr3_search'):
x = getattr(self, 'opt_'+x)
try:
pat = unicode(x.regex)
re.compile(pat)
except Exception, err:
error_dialog(self, _('Invalid regular expression'),
_('Invalid regular expression: %s')%err, show=True)
return False
return True

View File

@ -0,0 +1,213 @@
<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
<class>Form</class>
<widget class="QWidget" name="Form">
<property name="geometry">
<rect>
<x>0</x>
<y>0</y>
<width>468</width>
<height>451</height>
</rect>
</property>
<property name="sizePolicy">
<sizepolicy hsizetype="Minimum" vsizetype="Preferred">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
<property name="windowTitle">
<string>Form</string>
</property>
<layout class="QGridLayout" name="gridLayout_4">
<property name="sizeConstraint">
<enum>QLayout::SetDefaultConstraint</enum>
</property>
<item row="1" column="0">
<widget class="QGroupBox" name="groupBox">
<property name="sizePolicy">
<sizepolicy hsizetype="Minimum" vsizetype="Preferred">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
<property name="title">
<string>First expression</string>
</property>
<layout class="QGridLayout" name="gridLayout_2">
<property name="sizeConstraint">
<enum>QLayout::SetMinimumSize</enum>
</property>
<item row="0" column="0">
<widget class="RegexEdit" name="opt_sr1_search" native="true">
<property name="sizePolicy">
<sizepolicy hsizetype="Minimum" vsizetype="Preferred">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
</widget>
</item>
<item row="1" column="0">
<widget class="QLabel" name="label_4">
<property name="sizePolicy">
<sizepolicy hsizetype="Minimum" vsizetype="Preferred">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
<property name="text">
<string>&amp;Replacement Text</string>
</property>
<property name="buddy">
<cstring>opt_sr1_replace</cstring>
</property>
</widget>
</item>
<item row="2" column="0">
<widget class="QLineEdit" name="opt_sr1_replace">
<property name="sizePolicy">
<sizepolicy hsizetype="Minimum" vsizetype="Fixed">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
</widget>
</item>
</layout>
</widget>
</item>
<item row="2" column="0">
<widget class="QGroupBox" name="groupBox_2">
<property name="sizePolicy">
<sizepolicy hsizetype="Minimum" vsizetype="Preferred">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
<property name="title">
<string>Second Expression</string>
</property>
<layout class="QGridLayout" name="gridLayout">
<property name="sizeConstraint">
<enum>QLayout::SetMinimumSize</enum>
</property>
<item row="0" column="0">
<widget class="RegexEdit" name="opt_sr2_search" native="true">
<property name="sizePolicy">
<sizepolicy hsizetype="Minimum" vsizetype="Preferred">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
</widget>
</item>
<item row="1" column="0">
<widget class="QLabel" name="label_5">
<property name="sizePolicy">
<sizepolicy hsizetype="Minimum" vsizetype="Preferred">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
<property name="text">
<string>&amp;Replacement Text</string>
</property>
<property name="buddy">
<cstring>opt_sr2_replace</cstring>
</property>
</widget>
</item>
<item row="2" column="0">
<widget class="QLineEdit" name="opt_sr2_replace">
<property name="sizePolicy">
<sizepolicy hsizetype="Minimum" vsizetype="Fixed">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
</widget>
</item>
</layout>
</widget>
</item>
<item row="3" column="0">
<widget class="QGroupBox" name="groupBox_3">
<property name="sizePolicy">
<sizepolicy hsizetype="Minimum" vsizetype="Preferred">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
<property name="title">
<string>Third expression</string>
</property>
<layout class="QGridLayout" name="gridLayout_3">
<property name="sizeConstraint">
<enum>QLayout::SetMinimumSize</enum>
</property>
<item row="0" column="0">
<widget class="RegexEdit" name="opt_sr3_search" native="true">
<property name="sizePolicy">
<sizepolicy hsizetype="Minimum" vsizetype="Preferred">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
</widget>
</item>
<item row="1" column="0">
<widget class="QLabel" name="label_6">
<property name="sizePolicy">
<sizepolicy hsizetype="Minimum" vsizetype="Preferred">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
<property name="text">
<string>&amp;Replacement Text</string>
</property>
<property name="buddy">
<cstring>opt_sr3_replace</cstring>
</property>
</widget>
</item>
<item row="2" column="0">
<widget class="QLineEdit" name="opt_sr3_replace">
<property name="sizePolicy">
<sizepolicy hsizetype="Minimum" vsizetype="Fixed">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
</widget>
</item>
</layout>
</widget>
</item>
<item row="0" column="0">
<widget class="QLabel" name="label">
<property name="text">
<string>&lt;p&gt;Search and replace uses &lt;i&gt;regular expressions&lt;/i&gt;. See the &lt;a href=&quot;http://calibre-ebook.com/user_manual/regexp.html&quot;&gt;regular expressions tutorial&lt;/a&gt; to get started with regular expressions. Also clicking the wizard buttons below will allow you to test your regular expression against the current input document.</string>
</property>
<property name="wordWrap">
<bool>true</bool>
</property>
<property name="openExternalLinks">
<bool>true</bool>
</property>
</widget>
</item>
</layout>
</widget>
<customwidgets>
<customwidget>
<class>RegexEdit</class>
<extends>QWidget</extends>
<header>regex_builder.h</header>
<container>1</container>
</customwidget>
</customwidgets>
<resources/>
<connections/>
</ui>

View File

@ -16,6 +16,8 @@ from calibre.ebooks.conversion.config import GuiRecommendations, save_specifics,
from calibre.gui2.convert.single_ui import Ui_Dialog from calibre.gui2.convert.single_ui import Ui_Dialog
from calibre.gui2.convert.metadata import MetadataWidget from calibre.gui2.convert.metadata import MetadataWidget
from calibre.gui2.convert.look_and_feel import LookAndFeelWidget from calibre.gui2.convert.look_and_feel import LookAndFeelWidget
from calibre.gui2.convert.heuristics import HeuristicsWidget
from calibre.gui2.convert.search_and_replace import SearchAndReplaceWidget
from calibre.gui2.convert.page_setup import PageSetupWidget from calibre.gui2.convert.page_setup import PageSetupWidget
from calibre.gui2.convert.structure_detection import StructureDetectionWidget from calibre.gui2.convert.structure_detection import StructureDetectionWidget
from calibre.gui2.convert.toc import TOCWidget from calibre.gui2.convert.toc import TOCWidget
@ -170,6 +172,8 @@ class Config(ResizableDialog, Ui_Dialog):
self.mw = widget_factory(MetadataWidget) self.mw = widget_factory(MetadataWidget)
self.setWindowTitle(_('Convert')+ ' ' + unicode(self.mw.title.text())) self.setWindowTitle(_('Convert')+ ' ' + unicode(self.mw.title.text()))
lf = widget_factory(LookAndFeelWidget) lf = widget_factory(LookAndFeelWidget)
hw = widget_factory(HeuristicsWidget)
sr = widget_factory(SearchAndReplaceWidget)
ps = widget_factory(PageSetupWidget) ps = widget_factory(PageSetupWidget)
sd = widget_factory(StructureDetectionWidget) sd = widget_factory(StructureDetectionWidget)
toc = widget_factory(TOCWidget) toc = widget_factory(TOCWidget)
@ -203,7 +207,7 @@ class Config(ResizableDialog, Ui_Dialog):
if not c: break if not c: break
self.stack.removeWidget(c) self.stack.removeWidget(c)
widgets = [self.mw, lf, ps, sd, toc] widgets = [self.mw, lf, hw, ps, sd, toc, sr]
if input_widget is not None: if input_widget is not None:
widgets.append(input_widget) widgets.append(input_widget)
if output_widget is not None: if output_widget is not None:

View File

@ -100,7 +100,7 @@
</size> </size>
</property> </property>
<property name="spacing"> <property name="spacing">
<number>20</number> <number>10</number>
</property> </property>
<property name="wordWrap"> <property name="wordWrap">
<bool>true</bool> <bool>true</bool>
@ -129,8 +129,8 @@
<rect> <rect>
<x>0</x> <x>0</x>
<y>0</y> <y>0</y>
<width>805</width> <width>810</width>
<height>484</height> <height>494</height>
</rect> </rect>
</property> </property>
<layout class="QVBoxLayout" name="verticalLayout_3"> <layout class="QVBoxLayout" name="verticalLayout_3">

View File

@ -6,8 +6,6 @@ __license__ = 'GPL v3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>' __copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en' __docformat__ = 'restructuredtext en'
import re
from calibre.gui2.convert.structure_detection_ui import Ui_Form from calibre.gui2.convert.structure_detection_ui import Ui_Form
from calibre.gui2.convert import Widget from calibre.gui2.convert import Widget
from calibre.gui2 import error_dialog from calibre.gui2 import error_dialog
@ -24,12 +22,8 @@ class StructureDetectionWidget(Widget, Ui_Form):
Widget.__init__(self, parent, Widget.__init__(self, parent,
['chapter', 'chapter_mark', ['chapter', 'chapter_mark',
'remove_first_image', 'remove_first_image',
'insert_metadata', 'page_breaks_before', 'insert_metadata', 'page_breaks_before']
'preprocess_html', 'remove_header', 'header_regex',
'remove_footer', 'footer_regex','html_unwrap_factor']
) )
self.opt_html_unwrap_factor.setEnabled(False)
self.huf_label.setEnabled(False)
self.db, self.book_id = db, book_id self.db, self.book_id = db, book_id
for x in ('pagebreak', 'rule', 'both', 'none'): for x in ('pagebreak', 'rule', 'both', 'none'):
self.opt_chapter_mark.addItem(x) self.opt_chapter_mark.addItem(x)
@ -37,28 +31,11 @@ class StructureDetectionWidget(Widget, Ui_Form):
self.opt_chapter.set_msg(_('Detect chapters at (XPath expression):')) self.opt_chapter.set_msg(_('Detect chapters at (XPath expression):'))
self.opt_page_breaks_before.set_msg(_('Insert page breaks before ' self.opt_page_breaks_before.set_msg(_('Insert page breaks before '
'(XPath expression):')) '(XPath expression):'))
self.opt_header_regex.set_msg(_('Header regular expression:'))
self.opt_header_regex.set_book_id(book_id)
self.opt_header_regex.set_db(db)
self.opt_footer_regex.set_msg(_('Footer regular expression:'))
self.opt_footer_regex.set_book_id(book_id)
self.opt_footer_regex.set_db(db)
def break_cycles(self): def break_cycles(self):
Widget.break_cycles(self) Widget.break_cycles(self)
self.opt_header_regex.break_cycles()
self.opt_footer_regex.break_cycles()
def pre_commit_check(self): def pre_commit_check(self):
for x in ('header_regex', 'footer_regex'):
x = getattr(self, 'opt_'+x)
try:
pat = unicode(x.regex)
re.compile(pat)
except Exception, err:
error_dialog(self, _('Invalid regular expression'),
_('Invalid regular expression: %s')%err).exec_()
return False
for x in ('chapter', 'page_breaks_before'): for x in ('chapter', 'page_breaks_before'):
x = getattr(self, 'opt_'+x) x = getattr(self, 'opt_'+x)
if not x.check(): if not x.check():
@ -66,8 +43,3 @@ class StructureDetectionWidget(Widget, Ui_Form):
_('The XPath expression %s is invalid.')%x.text).exec_() _('The XPath expression %s is invalid.')%x.text).exec_()
return False return False
return True return True
def set_value_handler(self, g, val):
if val is None and g is self.opt_html_unwrap_factor:
g.setValue(0.0)
return True

View File

@ -14,10 +14,10 @@
<string>Form</string> <string>Form</string>
</property> </property>
<layout class="QGridLayout" name="gridLayout"> <layout class="QGridLayout" name="gridLayout">
<item row="0" column="1" colspan="2"> <item row="0" column="0" colspan="3">
<widget class="XPathEdit" name="opt_chapter" native="true"/> <widget class="XPathEdit" name="opt_chapter" native="true"/>
</item> </item>
<item row="1" column="0" colspan="2"> <item row="1" column="0">
<widget class="QLabel" name="label"> <widget class="QLabel" name="label">
<property name="text"> <property name="text">
<string>Chapter &amp;mark:</string> <string>Chapter &amp;mark:</string>
@ -27,7 +27,7 @@
</property> </property>
</widget> </widget>
</item> </item>
<item row="1" column="2"> <item row="1" column="1">
<widget class="QComboBox" name="opt_chapter_mark"> <widget class="QComboBox" name="opt_chapter_mark">
<property name="minimumContentsLength"> <property name="minimumContentsLength">
<number>20</number> <number>20</number>
@ -41,17 +41,17 @@
</property> </property>
</widget> </widget>
</item> </item>
<item row="5" column="0" colspan="2"> <item row="3" column="0" colspan="2">
<widget class="QCheckBox" name="opt_insert_metadata"> <widget class="QCheckBox" name="opt_insert_metadata">
<property name="text"> <property name="text">
<string>Insert &amp;metadata as page at start of book</string> <string>Insert &amp;metadata as page at start of book</string>
</property> </property>
</widget> </widget>
</item> </item>
<item row="11" column="0" colspan="3"> <item row="5" column="0" colspan="3">
<widget class="XPathEdit" name="opt_page_breaks_before" native="true"/> <widget class="XPathEdit" name="opt_page_breaks_before" native="true"/>
</item> </item>
<item row="12" column="0" colspan="3"> <item row="6" column="0" colspan="3">
<spacer name="verticalSpacer"> <spacer name="verticalSpacer">
<property name="orientation"> <property name="orientation">
<enum>Qt::Vertical</enum> <enum>Qt::Vertical</enum>
@ -64,53 +64,7 @@
</property> </property>
</spacer> </spacer>
</item> </item>
<item row="8" column="0" colspan="2"> <item row="1" column="2">
<widget class="QCheckBox" name="opt_remove_footer">
<property name="text">
<string>Remove F&amp;ooter</string>
</property>
</widget>
</item>
<item row="6" column="0" colspan="2">
<widget class="QCheckBox" name="opt_remove_header">
<property name="text">
<string>Remove H&amp;eader</string>
</property>
</widget>
</item>
<item row="7" column="0" colspan="3">
<widget class="RegexEdit" name="opt_header_regex" native="true"/>
</item>
<item row="9" column="0" colspan="3">
<widget class="RegexEdit" name="opt_footer_regex" native="true"/>
</item>
<item row="4" column="1">
<widget class="QLabel" name="huf_label">
<property name="text">
<string>Line &amp;un-wrap factor during preprocess:</string>
</property>
<property name="buddy">
<cstring>opt_html_unwrap_factor</cstring>
</property>
</widget>
</item>
<item row="4" column="2">
<widget class="QDoubleSpinBox" name="opt_html_unwrap_factor">
<property name="toolTip">
<string/>
</property>
<property name="maximum">
<double>1.000000000000000</double>
</property>
<property name="singleStep">
<double>0.050000000000000</double>
</property>
<property name="value">
<double>0.400000000000000</double>
</property>
</widget>
</item>
<item row="4" column="0">
<spacer name="horizontalSpacer"> <spacer name="horizontalSpacer">
<property name="orientation"> <property name="orientation">
<enum>Qt::Horizontal</enum> <enum>Qt::Horizontal</enum>
@ -123,13 +77,6 @@
</property> </property>
</spacer> </spacer>
</item> </item>
<item row="3" column="0" colspan="2">
<widget class="QCheckBox" name="opt_preprocess_html">
<property name="text">
<string>&amp;Preprocess input file to possibly improve structure detection</string>
</property>
</widget>
</item>
</layout> </layout>
</widget> </widget>
<customwidgets> <customwidgets>
@ -139,46 +86,7 @@
<header>convert/xpath_wizard.h</header> <header>convert/xpath_wizard.h</header>
<container>1</container> <container>1</container>
</customwidget> </customwidget>
<customwidget>
<class>RegexEdit</class>
<extends>QWidget</extends>
<header>regex_builder.h</header>
<container>1</container>
</customwidget>
</customwidgets> </customwidgets>
<resources/> <resources/>
<connections> <connections/>
<connection>
<sender>opt_preprocess_html</sender>
<signal>toggled(bool)</signal>
<receiver>opt_html_unwrap_factor</receiver>
<slot>setEnabled(bool)</slot>
<hints>
<hint type="sourcelabel">
<x>328</x>
<y>87</y>
</hint>
<hint type="destinationlabel">
<x>481</x>
<y>113</y>
</hint>
</hints>
</connection>
<connection>
<sender>opt_preprocess_html</sender>
<signal>toggled(bool)</signal>
<receiver>huf_label</receiver>
<slot>setEnabled(bool)</slot>
<hints>
<hint type="sourcelabel">
<x>295</x>
<y>88</y>
</hint>
<hint type="destinationlabel">
<x>291</x>
<y>105</y>
</hint>
</hints>
</connection>
</connections>
</ui> </ui>

View File

@ -4,10 +4,10 @@ __license__ = 'GPL 3'
__copyright__ = '2009, John Schember <john@nachtimwald.com>' __copyright__ = '2009, John Schember <john@nachtimwald.com>'
__docformat__ = 'restructuredtext en' __docformat__ = 'restructuredtext en'
from PyQt4.Qt import Qt
from calibre.gui2.convert.txt_output_ui import Ui_Form from calibre.gui2.convert.txt_output_ui import Ui_Form
from calibre.gui2.convert import Widget from calibre.gui2.convert import Widget
from calibre.ebooks.txt.newlines import TxtNewlines
from calibre.gui2.widgets import BasicComboModel
newline_model = None newline_model = None
@ -24,16 +24,23 @@ class PluginWidget(Widget, Ui_Form):
'inline_toc', 'markdown_format', 'keep_links', 'keep_image_references', 'inline_toc', 'markdown_format', 'keep_links', 'keep_image_references',
'txt_output_encoding']) 'txt_output_encoding'])
self.db, self.book_id = db, book_id self.db, self.book_id = db, book_id
for x in get_option('newline').option.choices:
self.opt_newline.addItem(x)
self.initialize_options(get_option, get_help, db, book_id) self.initialize_options(get_option, get_help, db, book_id)
default = self.opt_newline.currentText() self.opt_markdown_format.stateChanged.connect(self.enable_markdown_format)
self.enable_markdown_format(self.opt_markdown_format.checkState())
global newline_model def break_cycles(self):
if newline_model is None: Widget.break_cycles(self)
newline_model = BasicComboModel(TxtNewlines.NEWLINE_TYPES.keys())
self.newline_model = newline_model try:
self.opt_newline.setModel(self.newline_model) self.opt_markdown_format.stateChanged.disconnect()
except:
pass
def enable_markdown_format(self, state):
state = state == Qt.Checked
self.opt_keep_links.setEnabled(state)
self.opt_keep_image_references.setEnabled(state)
default_index = self.opt_newline.findText(default)
system_index = self.opt_newline.findText('system')
self.opt_newline.setCurrentIndex(default_index if default_index != -1 else system_index if system_index != -1 else 0)

View File

@ -6,8 +6,8 @@
<rect> <rect>
<x>0</x> <x>0</x>
<y>0</y> <y>0</y>
<width>422</width> <width>430</width>
<height>64</height> <height>74</height>
</rect> </rect>
</property> </property>
<property name="windowTitle"> <property name="windowTitle">
@ -53,7 +53,7 @@
<item row="0" column="1"> <item row="0" column="1">
<widget class="QToolButton" name="button"> <widget class="QToolButton" name="button">
<property name="toolTip"> <property name="toolTip">
<string>Use a wizard to help construct the XPath expression</string> <string>Use a wizard to help construct the Regular expression</string>
</property> </property>
<property name="text"> <property name="text">
<string>...</string> <string>...</string>
@ -70,19 +70,6 @@
</property> </property>
</widget> </widget>
</item> </item>
<item row="0" column="2">
<spacer name="horizontalSpacer">
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
<property name="sizeHint" stdset="0">
<size>
<width>20</width>
<height>20</height>
</size>
</property>
</spacer>
</item>
</layout> </layout>
</widget> </widget>
<customwidgets> <customwidgets>

View File

@ -379,7 +379,8 @@ def populate_metadata_page(layout, db, book_id, bulk=False, two_column=False, pa
w = bulk_widgets[type](db, col, parent) w = bulk_widgets[type](db, col, parent)
else: else:
w = widgets[type](db, col, parent) w = widgets[type](db, col, parent)
w.initialize(book_id) if book_id is not None:
w.initialize(book_id)
return w return w
x = db.custom_column_num_map x = db.custom_column_num_map
cols = list(x) cols = list(x)
@ -599,7 +600,7 @@ class BulkEnumeration(BulkBase, Enumeration):
value = None value = None
ret_value = None ret_value = None
dialog_shown = False dialog_shown = False
for book_id in book_ids: for i,book_id in enumerate(book_ids):
val = self.db.get_custom(book_id, num=self.col_id, index_is_id=True) val = self.db.get_custom(book_id, num=self.col_id, index_is_id=True)
if val and val not in self.col_metadata['display']['enum_values']: if val and val not in self.col_metadata['display']['enum_values']:
if not dialog_shown: if not dialog_shown:
@ -610,7 +611,7 @@ class BulkEnumeration(BulkBase, Enumeration):
show=True, show_copy_button=False) show=True, show_copy_button=False)
dialog_shown = True dialog_shown = True
ret_value = ' nochange ' ret_value = ' nochange '
elif value is not None and value != val: elif (value is not None and value != val) or (val and i != 0):
ret_value = ' nochange ' ret_value = ' nochange '
value = val value = val
if ret_value is None: if ret_value is None:

View File

@ -13,13 +13,13 @@ from calibre.customize.ui import available_input_formats, available_output_forma
device_plugins device_plugins
from calibre.devices.interface import DevicePlugin from calibre.devices.interface import DevicePlugin
from calibre.devices.errors import UserFeedback, OpenFeedback from calibre.devices.errors import UserFeedback, OpenFeedback
from calibre.gui2.dialogs.choose_format import ChooseFormatDialog from calibre.gui2.dialogs.choose_format_device import ChooseFormatDeviceDialog
from calibre.utils.ipc.job import BaseJob from calibre.utils.ipc.job import BaseJob
from calibre.devices.scanner import DeviceScanner from calibre.devices.scanner import DeviceScanner
from calibre.gui2 import config, error_dialog, Dispatcher, dynamic, \ from calibre.gui2 import config, error_dialog, Dispatcher, dynamic, \
warning_dialog, info_dialog, choose_dir warning_dialog, info_dialog, choose_dir
from calibre.ebooks.metadata import authors_to_string from calibre.ebooks.metadata import authors_to_string
from calibre import preferred_encoding, prints, force_unicode from calibre import preferred_encoding, prints, force_unicode, as_unicode
from calibre.utils.filenames import ascii_filename from calibre.utils.filenames import ascii_filename
from calibre.devices.errors import FreeSpaceError from calibre.devices.errors import FreeSpaceError
from calibre.devices.apple.driver import ITUNES_ASYNC from calibre.devices.apple.driver import ITUNES_ASYNC
@ -68,13 +68,7 @@ class DeviceJob(BaseJob): # {{{
if self._aborted: if self._aborted:
return return
self.failed = True self.failed = True
try: ex = as_unicode(err)
ex = unicode(err)
except:
try:
ex = str(err).decode(preferred_encoding, 'replace')
except:
ex = repr(err)
self._details = ex + '\n\n' + \ self._details = ex + '\n\n' + \
traceback.format_exc() traceback.format_exc()
self.exception = err self.exception = err
@ -832,8 +826,24 @@ class DeviceMixin(object): # {{{
fmt = None fmt = None
if specific: if specific:
d = ChooseFormatDialog(self, _('Choose format to send to device'), formats = []
self.device_manager.device.settings().format_map) aval_out_formats = available_output_formats()
format_count = {}
for row in rows:
fmts = self.library_view.model().db.formats(row.row())
if fmts:
for f in fmts.split(','):
f = f.lower()
if format_count.has_key(f):
format_count[f] += 1
else:
format_count[f] = 1
for f in self.device_manager.device.settings().format_map:
if f in format_count.keys():
formats.append((f, _('%i of %i Books' % (format_count[f], len(rows))), True if f in aval_out_formats else False))
elif f in aval_out_formats:
formats.append((f, _('0 of %i Books' % len(rows)), True))
d = ChooseFormatDeviceDialog(self, _('Choose format to send to device'), formats)
if d.exec_() != QDialog.Accepted: if d.exec_() != QDialog.Accepted:
return return
if d.format(): if d.format():

View File

@ -0,0 +1,85 @@
#!/usr/bin/env python
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en'
__license__ = 'GPL v3'
from PyQt4.Qt import QDialog, QGridLayout, QLabel, QDialogButtonBox, \
QApplication, QSpinBox, QToolButton, QIcon
from calibre.ebooks.metadata import authors_to_string, string_to_authors
from calibre.gui2.widgets import CompleteComboBox
from calibre.utils.icu import sort_key
class AddEmptyBookDialog(QDialog):
def __init__(self, parent, db, author):
QDialog.__init__(self, parent)
self.db = db
self.setWindowTitle(_('How many empty books?'))
self._layout = QGridLayout(self)
self.setLayout(self._layout)
self.qty_label = QLabel(_('How many empty books should be added?'))
self._layout.addWidget(self.qty_label, 0, 0, 1, 2)
self.qty_spinbox = QSpinBox(self)
self.qty_spinbox.setRange(1, 10000)
self.qty_spinbox.setValue(1)
self._layout.addWidget(self.qty_spinbox, 1, 0, 1, 2)
self.author_label = QLabel(_('Set the author of the new books to:'))
self._layout.addWidget(self.author_label, 2, 0, 1, 2)
self.authors_combo = CompleteComboBox(self)
self.authors_combo.setSizeAdjustPolicy(
self.authors_combo.AdjustToMinimumContentsLengthWithIcon)
self.authors_combo.setEditable(True)
self._layout.addWidget(self.authors_combo, 3, 0, 1, 1)
self.initialize_authors(db, author)
self.clear_button = QToolButton(self)
self.clear_button.setIcon(QIcon(I('trash.png')))
self.clear_button.setToolTip(_('Reset author to Unknown'))
self.clear_button.clicked.connect(self.reset_author)
self._layout.addWidget(self.clear_button, 3, 1, 1, 1)
button_box = QDialogButtonBox(QDialogButtonBox.Ok | QDialogButtonBox.Cancel)
button_box.accepted.connect(self.accept)
button_box.rejected.connect(self.reject)
self._layout.addWidget(button_box)
self.resize(self.sizeHint())
def reset_author(self, *args):
self.authors_combo.setEditText(_('Unknown'))
def initialize_authors(self, db, author):
all_authors = db.all_authors()
all_authors.sort(key=lambda x : sort_key(x[1]))
for i in all_authors:
id, name = i
name = [name.strip().replace('|', ',') for n in name.split(',')]
self.authors_combo.addItem(authors_to_string(name))
au = author
if not au:
au = _('Unknown')
self.authors_combo.setEditText(au.replace('|', ','))
self.authors_combo.set_separator('&')
self.authors_combo.set_space_before_sep(True)
self.authors_combo.update_items_cache(db.all_author_names())
@property
def qty_to_add(self):
return self.qty_spinbox.value()
@property
def selected_authors(self):
return string_to_authors(unicode(self.authors_combo.text()))
if __name__ == '__main__':
app = QApplication([])
d = AddEmptyBookDialog()
d.exec_()

View File

@ -0,0 +1,53 @@
__license__ = 'GPL v3'
__copyright__ = '2011, John Schember <john@nachtimwald.com>'
from PyQt4.Qt import QDialog, QTreeWidgetItem, QIcon, SIGNAL
from calibre.gui2 import file_icon_provider
from calibre.gui2.dialogs.choose_format_device_ui import Ui_ChooseFormatDeviceDialog
class ChooseFormatDeviceDialog(QDialog, Ui_ChooseFormatDeviceDialog):
def __init__(self, window, msg, formats):
'''
formats is a list of tuples: [(format, exists, convertible)].
format: Lower case format identifier. E.G. mobi
exists: String representing the number of books that
exist in the format.
convertible: True if the format is a convertible format.
formats should be ordered in the device's preferred format ordering.
'''
QDialog.__init__(self, window)
Ui_ChooseFormatDeviceDialog.__init__(self)
self.setupUi(self)
self.connect(self.formats, SIGNAL('activated(QModelIndex)'),
self.activated_slot)
self.msg.setText(msg)
for i, (format, exists, convertible) in enumerate(formats):
t_item = QTreeWidgetItem()
t_item.setIcon(0, file_icon_provider().icon_from_ext(format.lower()))
t_item.setText(0, format.upper())
t_item.setText(1, exists)
if convertible:
t_item.setIcon(2, QIcon(I('ok.png')))
self.formats.addTopLevelItem(t_item)
if i == 0:
self.formats.setCurrentItem(t_item)
t_item.setSelected(True)
self.formats.resizeColumnToContents(2)
self.formats.resizeColumnToContents(1)
self.formats.resizeColumnToContents(0)
self.formats.header().resizeSection(0, self.formats.header().sectionSize(0) * 2)
self._format = None
def activated_slot(self, *args):
self.accept()
def format(self):
return self._format
def accept(self):
self._format = unicode(self.formats.currentItem().text(0))
return QDialog.accept(self)

View File

@ -0,0 +1,111 @@
<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
<class>ChooseFormatDeviceDialog</class>
<widget class="QDialog" name="ChooseFormatDeviceDialog">
<property name="geometry">
<rect>
<x>0</x>
<y>0</y>
<width>507</width>
<height>377</height>
</rect>
</property>
<property name="windowTitle">
<string>Choose Format</string>
</property>
<property name="windowIcon">
<iconset resource="../../../../resources/images.qrc">
<normaloff>:/images/mimetypes/unknown.png</normaloff>:/images/mimetypes/unknown.png</iconset>
</property>
<layout class="QVBoxLayout">
<item>
<widget class="QLabel" name="msg">
<property name="text">
<string/>
</property>
</widget>
</item>
<item>
<widget class="QTreeWidget" name="formats">
<property name="alternatingRowColors">
<bool>true</bool>
</property>
<property name="iconSize">
<size>
<width>64</width>
<height>64</height>
</size>
</property>
<property name="allColumnsShowFocus">
<bool>true</bool>
</property>
<column>
<property name="text">
<string>Format</string>
</property>
</column>
<column>
<property name="text">
<string>Existing</string>
</property>
<property name="textAlignment">
<set>AlignLeft|AlignVCenter</set>
</property>
</column>
<column>
<property name="text">
<string>Convertible</string>
</property>
</column>
</widget>
</item>
<item>
<widget class="QDialogButtonBox" name="buttonBox">
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
<property name="standardButtons">
<set>QDialogButtonBox::Cancel|QDialogButtonBox::Ok</set>
</property>
</widget>
</item>
</layout>
</widget>
<resources>
<include location="../../../../resources/images.qrc"/>
</resources>
<connections>
<connection>
<sender>buttonBox</sender>
<signal>accepted()</signal>
<receiver>ChooseFormatDeviceDialog</receiver>
<slot>accept()</slot>
<hints>
<hint type="sourcelabel">
<x>248</x>
<y>254</y>
</hint>
<hint type="destinationlabel">
<x>157</x>
<y>274</y>
</hint>
</hints>
</connection>
<connection>
<sender>buttonBox</sender>
<signal>rejected()</signal>
<receiver>ChooseFormatDeviceDialog</receiver>
<slot>reject()</slot>
<hints>
<hint type="sourcelabel">
<x>316</x>
<y>260</y>
</hint>
<hint type="destinationlabel">
<x>286</x>
<y>274</y>
</hint>
</hints>
</connection>
</connections>
</ui>

View File

@ -7,8 +7,7 @@ import re, os
from PyQt4.Qt import Qt, QDialog, QGridLayout, QVBoxLayout, QFont, QLabel, \ from PyQt4.Qt import Qt, QDialog, QGridLayout, QVBoxLayout, QFont, QLabel, \
pyqtSignal, QDialogButtonBox, QInputDialog, QLineEdit, \ pyqtSignal, QDialogButtonBox, QInputDialog, QLineEdit, \
QMessageBox QMessageBox, QDate, QLineEdit
from PyQt4 import QtGui
from calibre.gui2.dialogs.metadata_bulk_ui import Ui_MetadataBulkDialog from calibre.gui2.dialogs.metadata_bulk_ui import Ui_MetadataBulkDialog
from calibre.gui2.dialogs.tag_editor import TagEditor from calibre.gui2.dialogs.tag_editor import TagEditor
@ -303,6 +302,7 @@ class MetadataBulkDialog(ResizableDialog, Ui_MetadataBulkDialog):
self.pubdate.setSpecialValueText(_('Undefined')) self.pubdate.setSpecialValueText(_('Undefined'))
self.clear_pubdate_button.clicked.connect(self.clear_pubdate) self.clear_pubdate_button.clicked.connect(self.clear_pubdate)
self.pubdate.dateChanged.connect(self.do_apply_pubdate) self.pubdate.dateChanged.connect(self.do_apply_pubdate)
self.adddate.setDate(QDate.currentDate())
self.adddate.setMinimumDate(UNDEFINED_QDATE) self.adddate.setMinimumDate(UNDEFINED_QDATE)
self.adddate.setSpecialValueText(_('Undefined')) self.adddate.setSpecialValueText(_('Undefined'))
self.clear_adddate_button.clicked.connect(self.clear_adddate) self.clear_adddate_button.clicked.connect(self.clear_adddate)
@ -366,16 +366,16 @@ class MetadataBulkDialog(ResizableDialog, Ui_MetadataBulkDialog):
offset = 10 offset = 10
self.s_r_number_of_books = min(10, len(self.ids)) self.s_r_number_of_books = min(10, len(self.ids))
for i in range(1,self.s_r_number_of_books+1): for i in range(1,self.s_r_number_of_books+1):
w = QtGui.QLabel(self.tabWidgetPage3) w = QLabel(self.tabWidgetPage3)
w.setText(_('Book %d:')%i) w.setText(_('Book %d:')%i)
self.testgrid.addWidget(w, i+offset, 0, 1, 1) self.testgrid.addWidget(w, i+offset, 0, 1, 1)
w = QtGui.QLineEdit(self.tabWidgetPage3) w = QLineEdit(self.tabWidgetPage3)
w.setReadOnly(True) w.setReadOnly(True)
name = 'book_%d_text'%i name = 'book_%d_text'%i
setattr(self, name, w) setattr(self, name, w)
self.book_1_text.setObjectName(name) self.book_1_text.setObjectName(name)
self.testgrid.addWidget(w, i+offset, 1, 1, 1) self.testgrid.addWidget(w, i+offset, 1, 1, 1)
w = QtGui.QLineEdit(self.tabWidgetPage3) w = QLineEdit(self.tabWidgetPage3)
w.setReadOnly(True) w.setReadOnly(True)
name = 'book_%d_result'%i name = 'book_%d_result'%i
setattr(self, name, w) setattr(self, name, w)

View File

@ -775,7 +775,7 @@ class MetadataSingleDialog(ResizableDialog, Ui_MetadataSingleDialog):
self.original_tags = unicode(self.tags.text()) self.original_tags = unicode(self.tags.text())
else: else:
self.tags.setText(self.original_tags) self.tags.setText(self.original_tags)
d = TagEditor(self, self.db, self.row) d = TagEditor(self, self.db, self.id)
d.exec_() d.exec_()
if d.result() == QDialog.Accepted: if d.result() == QDialog.Accepted:
tag_string = ', '.join(d.tags) tag_string = ', '.join(d.tags)

View File

@ -10,13 +10,13 @@ from calibre.utils.icu import sort_key
class TagEditor(QDialog, Ui_TagEditor): class TagEditor(QDialog, Ui_TagEditor):
def __init__(self, window, db, index=None): def __init__(self, window, db, id_=None):
QDialog.__init__(self, window) QDialog.__init__(self, window)
Ui_TagEditor.__init__(self) Ui_TagEditor.__init__(self)
self.setupUi(self) self.setupUi(self)
self.db = db self.db = db
self.index = index self.index = db.row(id_)
if self.index is not None: if self.index is not None:
tags = self.db.tags(self.index) tags = self.db.tags(self.index)
else: else:
@ -79,6 +79,8 @@ class TagEditor(QDialog, Ui_TagEditor):
def apply_tags(self, item=None): def apply_tags(self, item=None):
items = self.available_tags.selectedItems() if item is None else [item] items = self.available_tags.selectedItems() if item is None else [item]
rows = [self.available_tags.row(i) for i in items]
row = max(rows)
for item in items: for item in items:
tag = unicode(item.text()) tag = unicode(item.text())
self.tags.append(tag) self.tags.append(tag)
@ -89,6 +91,12 @@ class TagEditor(QDialog, Ui_TagEditor):
for tag in self.tags: for tag in self.tags:
self.applied_tags.addItem(tag) self.applied_tags.addItem(tag)
if row >= self.available_tags.count():
row = self.available_tags.count() - 1
if row > 2:
item = self.available_tags.item(row)
self.available_tags.scrollToItem(item)
def unapply_tags(self, item=None): def unapply_tags(self, item=None):

View File

@ -356,6 +356,13 @@ class %(classname)s(%(base_class)s):
self.populate_options(AutomaticNewsRecipe) self.populate_options(AutomaticNewsRecipe)
self.source_code.setText('') self.source_code.setText('')
def reject(self):
if question_dialog(self, _('Are you sure?'),
_('You will lose any unsaved changes. To save your'
' changes, click the Add/Update recipe button.'
' Continue?'), show_copy_button=False):
ResizableDialog.reject(self)
if __name__ == '__main__': if __name__ == '__main__':
from calibre.gui2 import is_ok_to_use_qt from calibre.gui2 import is_ok_to_use_qt
is_ok_to_use_qt() is_ok_to_use_qt()

View File

@ -150,13 +150,13 @@ class GuiRunner(QObject):
if DEBUG: if DEBUG:
prints('Starting up...') prints('Starting up...')
def start_gui(self): def start_gui(self, db):
from calibre.gui2.ui import Main from calibre.gui2.ui import Main
main = Main(self.opts, gui_debug=self.gui_debug) main = Main(self.opts, gui_debug=self.gui_debug)
if self.splash_screen is not None: if self.splash_screen is not None:
self.splash_screen.showMessage(_('Initializing user interface...')) self.splash_screen.showMessage(_('Initializing user interface...'))
self.splash_screen.finish(main) self.splash_screen.finish(main)
main.initialize(self.library_path, self.db, self.listener, self.actions) main.initialize(self.library_path, db, self.listener, self.actions)
if DEBUG: if DEBUG:
prints('Started up in', time.time() - self.startup_time) prints('Started up in', time.time() - self.startup_time)
add_filesystem_book = partial(main.iactions['Add Books'].add_filesystem_book, allow_device=False) add_filesystem_book = partial(main.iactions['Add Books'].add_filesystem_book, allow_device=False)
@ -200,8 +200,7 @@ class GuiRunner(QObject):
det_msg=traceback.format_exc(), show=True) det_msg=traceback.format_exc(), show=True)
self.initialization_failed() self.initialization_failed()
self.db = db self.start_gui(db)
self.start_gui()
def initialize_db(self): def initialize_db(self):
db = None db = None

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,630 @@
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
__license__ = 'GPL v3'
__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'
import os
from functools import partial
from PyQt4.Qt import Qt, QVBoxLayout, QHBoxLayout, QWidget, QPushButton, \
QGridLayout, pyqtSignal, QDialogButtonBox, QScrollArea, QFont, \
QTabWidget, QIcon, QToolButton, QSplitter, QGroupBox, QSpacerItem, \
QSizePolicy, QPalette, QFrame, QSize
from calibre.ebooks.metadata import authors_to_string, string_to_authors
from calibre.gui2 import ResizableDialog, error_dialog, gprefs
from calibre.gui2.metadata.basic_widgets import TitleEdit, AuthorsEdit, \
AuthorSortEdit, TitleSortEdit, SeriesEdit, SeriesIndexEdit, ISBNEdit, \
RatingEdit, PublisherEdit, TagsEdit, FormatsManager, Cover, CommentsEdit, \
BuddyLabel, DateEdit, PubdateEdit
from calibre.gui2.custom_column_widgets import populate_metadata_page
from calibre.utils.config import tweaks
class MetadataSingleDialogBase(ResizableDialog):
view_format = pyqtSignal(object)
cc_two_column = tweaks['metadata_single_use_2_cols_for_custom_fields']
one_line_comments_toolbar = False
def __init__(self, db, parent=None):
self.db = db
self.changed = set([])
ResizableDialog.__init__(self, parent)
def setupUi(self, *args): # {{{
self.resize(990, 650)
self.button_box = QDialogButtonBox(
QDialogButtonBox.Ok|QDialogButtonBox.Cancel, Qt.Horizontal,
self)
self.button_box.accepted.connect(self.accept)
self.button_box.rejected.connect(self.reject)
self.next_button = QPushButton(QIcon(I('forward.png')), _('Next'),
self)
self.next_button.clicked.connect(partial(self.do_one, delta=1))
self.prev_button = QPushButton(QIcon(I('back.png')), _('Previous'),
self)
self.button_box.addButton(self.prev_button, self.button_box.ActionRole)
self.button_box.addButton(self.next_button, self.button_box.ActionRole)
self.prev_button.clicked.connect(partial(self.do_one, delta=-1))
self.scroll_area = QScrollArea(self)
self.scroll_area.setFrameShape(QScrollArea.NoFrame)
self.scroll_area.setWidgetResizable(True)
self.central_widget = QTabWidget(self)
self.scroll_area.setWidget(self.central_widget)
self.l = QVBoxLayout(self)
self.setLayout(self.l)
self.l.setMargin(0)
self.l.addWidget(self.scroll_area)
self.l.addWidget(self.button_box)
self.setWindowIcon(QIcon(I('edit_input.png')))
self.setWindowTitle(_('Edit Metadata'))
self.create_basic_metadata_widgets()
if len(self.db.custom_column_label_map):
self.create_custom_metadata_widgets()
self.do_layout()
geom = gprefs.get('metasingle_window_geometry3', None)
if geom is not None:
self.restoreGeometry(bytes(geom))
# }}}
def create_basic_metadata_widgets(self): # {{{
self.basic_metadata_widgets = []
self.title = TitleEdit(self)
self.title.textChanged.connect(self.update_window_title)
self.deduce_title_sort_button = QToolButton(self)
self.deduce_title_sort_button.setToolTip(
_('Automatically create the title sort entry based on the current '
'title entry.\nUsing this button to create title sort will '
'change title sort from red to green.'))
self.deduce_title_sort_button.setWhatsThis(
self.deduce_title_sort_button.toolTip())
self.title_sort = TitleSortEdit(self, self.title,
self.deduce_title_sort_button)
self.basic_metadata_widgets.extend([self.title, self.title_sort])
self.authors = AuthorsEdit(self)
self.deduce_author_sort_button = QToolButton(self)
self.deduce_author_sort_button.setToolTip(_(
'Automatically create the author sort entry based on the current'
' author entry.\n'
'Using this button to create author sort will change author sort from'
' red to green.'))
self.author_sort = AuthorSortEdit(self, self.authors,
self.deduce_author_sort_button, self.db)
self.basic_metadata_widgets.extend([self.authors, self.author_sort])
self.swap_title_author_button = QToolButton(self)
self.swap_title_author_button.setIcon(QIcon(I('swap.png')))
self.swap_title_author_button.setToolTip(_(
'Swap the author and title'))
self.swap_title_author_button.clicked.connect(self.swap_title_author)
self.series = SeriesEdit(self)
self.remove_unused_series_button = QToolButton(self)
self.remove_unused_series_button.setToolTip(
_('Remove unused series (Series that have no books)') )
self.remove_unused_series_button.clicked.connect(self.remove_unused_series)
self.series_index = SeriesIndexEdit(self, self.series)
self.basic_metadata_widgets.extend([self.series, self.series_index])
self.formats_manager = FormatsManager(self)
self.basic_metadata_widgets.append(self.formats_manager)
self.formats_manager.metadata_from_format_button.clicked.connect(
self.metadata_from_format)
self.formats_manager.cover_from_format_button.clicked.connect(
self.cover_from_format)
self.cover = Cover(self)
self.basic_metadata_widgets.append(self.cover)
self.comments = CommentsEdit(self, self.one_line_comments_toolbar)
self.basic_metadata_widgets.append(self.comments)
self.rating = RatingEdit(self)
self.basic_metadata_widgets.append(self.rating)
self.tags = TagsEdit(self)
self.tags_editor_button = QToolButton(self)
self.tags_editor_button.setToolTip(_('Open Tag Editor'))
self.tags_editor_button.setIcon(QIcon(I('chapters.png')))
self.tags_editor_button.clicked.connect(self.tags_editor)
self.basic_metadata_widgets.append(self.tags)
self.isbn = ISBNEdit(self)
self.basic_metadata_widgets.append(self.isbn)
self.publisher = PublisherEdit(self)
self.basic_metadata_widgets.append(self.publisher)
self.timestamp = DateEdit(self)
self.pubdate = PubdateEdit(self)
self.basic_metadata_widgets.extend([self.timestamp, self.pubdate])
self.fetch_metadata_button = QPushButton(
_('&Fetch metadata from server'), self)
self.fetch_metadata_button.clicked.connect(self.fetch_metadata)
font = self.fmb_font = QFont()
font.setBold(True)
self.fetch_metadata_button.setFont(font)
# }}}
def create_custom_metadata_widgets(self): # {{{
self.custom_metadata_widgets_parent = w = QWidget(self)
layout = QGridLayout()
w.setLayout(layout)
self.custom_metadata_widgets, self.__cc_spacers = \
populate_metadata_page(layout, self.db, None, parent=w, bulk=False,
two_column=self.cc_two_column)
self.__custom_col_layouts = [layout]
# }}}
def set_custom_metadata_tab_order(self, before=None, after=None): # {{{
sto = QWidget.setTabOrder
if getattr(self, 'custom_metadata_widgets', []):
ans = self.custom_metadata_widgets
for i in range(len(ans)-1):
if before is not None and i == 0:
pass# Do something
if len(ans[i+1].widgets) == 2:
sto(ans[i].widgets[-1], ans[i+1].widgets[1])
else:
sto(ans[i].widgets[-1], ans[i+1].widgets[0])
for c in range(2, len(ans[i].widgets), 2):
sto(ans[i].widgets[c-1], ans[i].widgets[c+1])
if after is not None:
pass # Do something
# }}}
def do_layout(self):
raise NotImplementedError()
def __call__(self, id_):
self.book_id = id_
for widget in self.basic_metadata_widgets:
widget.initialize(self.db, id_)
for widget in self.custom_metadata_widgets:
widget.initialize(id_)
# Commented out as it doesn't play nice with Next, Prev buttons
#self.fetch_metadata_button.setFocus(Qt.OtherFocusReason)
# Miscellaneous interaction methods {{{
def update_window_title(self, *args):
title = self.title.current_val
if len(title) > 50:
title = title[:50] + u'\u2026'
self.setWindowTitle(_('Edit Metadata') + ' - ' +
title)
def swap_title_author(self, *args):
title = self.title.current_val
self.title.current_val = authors_to_string(self.authors.current_val)
self.authors.current_val = string_to_authors(title)
self.title_sort.auto_generate()
self.author_sort.auto_generate()
def remove_unused_series(self, *args):
self.db.remove_unused_series()
idx = self.series.current_val
self.series.clear()
self.series.initialize(self.db, self.book_id)
if idx:
for i in range(self.series.count()):
if unicode(self.series.itemText(i)) == idx:
self.series.setCurrentIndex(i)
break
def tags_editor(self, *args):
self.tags.edit(self.db, self.book_id)
def metadata_from_format(self, *args):
mi, ext = self.formats_manager.get_selected_format_metadata(self.db,
self.book_id)
if mi is not None:
self.update_from_mi(mi)
def cover_from_format(self, *args):
mi, ext = self.formats_manager.get_selected_format_metadata(self.db,
self.book_id)
if mi is None:
return
cdata = None
if mi.cover and os.access(mi.cover, os.R_OK):
cdata = open(mi.cover).read()
elif mi.cover_data[1] is not None:
cdata = mi.cover_data[1]
if cdata is None:
error_dialog(self, _('Could not read cover'),
_('Could not read cover from %s format')%ext).exec_()
return
orig = self.cover.current_val
self.cover.current_val = cdata
if self.cover.current_val is None:
self.cover.current_val = orig
return error_dialog(self, _('Could not read cover'),
_('The cover in the %s format is invalid')%ext,
show=True)
return
def update_from_mi(self, mi):
if not mi.is_null('title'):
self.title.current_val = mi.title
if not mi.is_null('authors'):
self.authors.current_val = mi.authors
if not mi.is_null('author_sort'):
self.author_sort.current_val = mi.author_sort
if not mi.is_null('rating'):
try:
self.rating.current_val = mi.rating
except:
pass
if not mi.is_null('publisher'):
self.publisher.current_val = mi.publisher
if not mi.is_null('tags'):
self.tags.current_val = mi.tags
if not mi.is_null('isbn'):
self.isbn.current_val = mi.isbn
if not mi.is_null('pubdate'):
self.pubdate.current_val = mi.pubdate
if not mi.is_null('series') and mi.series.strip():
self.series.current_val = mi.series
if mi.series_index is not None:
self.series_index.current_val = float(mi.series_index)
if mi.comments and mi.comments.strip():
self.comments.current_val = mi.comments
def fetch_metadata(self, *args):
pass # TODO: fetch metadata
# }}}
def apply_changes(self):
self.changed.add(self.book_id)
for widget in self.basic_metadata_widgets:
try:
if not widget.commit(self.db, self.book_id):
return False
except IOError, err:
if err.errno == 13: # Permission denied
import traceback
fname = err.filename if err.filename else 'file'
error_dialog(self, _('Permission denied'),
_('Could not open %s. Is it being used by another'
' program?')%fname, det_msg=traceback.format_exc(),
show=True)
return False
raise
for widget in getattr(self, 'custom_metadata_widgets', []):
widget.commit(self.book_id)
self.db.commit()
return True
def accept(self):
self.save_state()
if not self.apply_changes():
return
ResizableDialog.accept(self)
def reject(self):
self.save_state()
ResizableDialog.reject(self)
def save_state(self):
gprefs['metasingle_window_geometry3'] = bytearray(self.saveGeometry())
# Dialog use methods {{{
def start(self, row_list, current_row, view_slot=None):
self.row_list = row_list
self.current_row = current_row
if view_slot is not None:
self.view_format.connect(view_slot)
self.do_one()
ret = self.exec_()
self.break_cycles()
return ret
def do_one(self, delta=0):
self.current_row += delta
prev = next_ = None
if self.current_row > 0:
prev = self.db.title(self.row_list[self.current_row-1])
if self.current_row < len(self.row_list) - 1:
next_ = self.db.title(self.row_list[self.current_row+1])
if next_ is not None:
tip = _('Save changes and edit the metadata of %s')%next_
self.next_button.setToolTip(tip)
self.next_button.setVisible(next_ is not None)
if prev is not None:
tip = _('Save changes and edit the metadata of %s')%prev
self.prev_button.setToolTip(tip)
self.prev_button.setVisible(prev is not None)
self(self.db.id(self.row_list[self.current_row]))
def break_cycles(self):
# Break any reference cycles that could prevent python
# from garbage collecting this dialog
def disconnect(signal):
try:
signal.disconnect()
except:
pass # Fails if view format was never connected
disconnect(self.view_format)
for b in ('next_button', 'prev_button'):
x = getattr(self, b, None)
if x is not None:
disconnect(x.clicked)
# }}}
class MetadataSingleDialog(MetadataSingleDialogBase): # {{{
def do_layout(self):
if len(self.db.custom_column_label_map) == 0:
self.central_widget.tabBar().setVisible(False)
self.central_widget.clear()
self.tabs = []
self.labels = []
self.tabs.append(QWidget(self))
self.central_widget.addTab(self.tabs[0], _("&Basic metadata"))
self.tabs[0].l = l = QVBoxLayout()
self.tabs[0].tl = tl = QGridLayout()
self.tabs[0].setLayout(l)
w = getattr(self, 'custom_metadata_widgets_parent', None)
if w is not None:
self.tabs.append(w)
self.central_widget.addTab(w, _('&Custom metadata'))
l.addLayout(tl)
l.addItem(QSpacerItem(10, 15, QSizePolicy.Expanding,
QSizePolicy.Fixed))
sto = QWidget.setTabOrder
sto(self.button_box, self.fetch_metadata_button)
sto(self.fetch_metadata_button, self.title)
def create_row(row, one, two, three, col=1, icon='forward.png'):
ql = BuddyLabel(one)
tl.addWidget(ql, row, col+0, 1, 1)
self.labels.append(ql)
tl.addWidget(one, row, col+1, 1, 1)
if two is not None:
tl.addWidget(two, row, col+2, 1, 1)
two.setIcon(QIcon(I(icon)))
ql = BuddyLabel(three)
tl.addWidget(ql, row, col+3, 1, 1)
self.labels.append(ql)
tl.addWidget(three, row, col+4, 1, 1)
sto(one, two)
sto(two, three)
tl.addWidget(self.swap_title_author_button, 0, 0, 2, 1)
create_row(0, self.title, self.deduce_title_sort_button, self.title_sort)
sto(self.title_sort, self.authors)
create_row(1, self.authors, self.deduce_author_sort_button, self.author_sort)
sto(self.author_sort, self.series)
create_row(2, self.series, self.remove_unused_series_button,
self.series_index, icon='trash.png')
sto(self.series_index, self.swap_title_author_button)
tl.addWidget(self.formats_manager, 0, 6, 3, 1)
self.splitter = QSplitter(Qt.Horizontal, self)
self.splitter.addWidget(self.cover)
l.addWidget(self.splitter)
self.tabs[0].gb = gb = QGroupBox(_('Change cover'), self)
gb.l = l = QGridLayout()
gb.setLayout(l)
sto(self.swap_title_author_button, self.cover.buttons[0])
for i, b in enumerate(self.cover.buttons[:3]):
l.addWidget(b, 0, i, 1, 1)
sto(b, self.cover.buttons[i+1])
gb.hl = QHBoxLayout()
for b in self.cover.buttons[3:]:
gb.hl.addWidget(b)
sto(self.cover.buttons[-2], self.cover.buttons[-1])
l.addLayout(gb.hl, 1, 0, 1, 3)
self.tabs[0].middle = w = QWidget(self)
w.l = l = QGridLayout()
w.setLayout(w.l)
l.setMargin(0)
self.splitter.addWidget(w)
def create_row2(row, widget, button=None):
row += 1
ql = BuddyLabel(widget)
l.addWidget(ql, row, 0, 1, 1)
l.addWidget(widget, row, 1, 1, 2 if button is None else 1)
if button is not None:
l.addWidget(button, row, 2, 1, 1)
if button is not None:
sto(widget, button)
l.addWidget(gb, 0, 0, 1, 3)
self.tabs[0].spc_one = QSpacerItem(10, 10, QSizePolicy.Expanding,
QSizePolicy.Expanding)
l.addItem(self.tabs[0].spc_one, 1, 0, 1, 3)
sto(self.cover.buttons[-1], self.rating)
create_row2(1, self.rating)
sto(self.rating, self.tags)
create_row2(2, self.tags, self.tags_editor_button)
sto(self.tags_editor_button, self.isbn)
create_row2(3, self.isbn)
sto(self.isbn, self.timestamp)
create_row2(4, self.timestamp, self.timestamp.clear_button)
sto(self.timestamp.clear_button, self.pubdate)
create_row2(5, self.pubdate, self.pubdate.clear_button)
sto(self.pubdate.clear_button, self.publisher)
create_row2(6, self.publisher)
self.tabs[0].spc_two = QSpacerItem(10, 10, QSizePolicy.Expanding,
QSizePolicy.Expanding)
l.addItem(self.tabs[0].spc_two, 8, 0, 1, 3)
l.addWidget(self.fetch_metadata_button, 9, 0, 1, 3)
self.tabs[0].gb2 = gb = QGroupBox(_('Co&mments'), self)
gb.l = l = QVBoxLayout()
gb.setLayout(l)
l.addWidget(self.comments)
self.splitter.addWidget(gb)
self.set_custom_metadata_tab_order()
# }}}
class MetadataSingleDialogAlt(MetadataSingleDialogBase): # {{{
cc_two_column = False
one_line_comments_toolbar = True
def do_layout(self):
self.central_widget.clear()
self.tabs = []
self.labels = []
sto = QWidget.setTabOrder
self.tabs.append(QWidget(self))
self.central_widget.addTab(self.tabs[0], _("&Metadata"))
self.tabs[0].l = QGridLayout()
self.tabs[0].setLayout(self.tabs[0].l)
self.tabs.append(QWidget(self))
self.central_widget.addTab(self.tabs[1], _("&Cover and formats"))
self.tabs[1].l = QGridLayout()
self.tabs[1].setLayout(self.tabs[1].l)
# Tab 0
tab0 = self.tabs[0]
tl = QGridLayout()
gb = QGroupBox(_('&Basic metadata'), self.tabs[0])
self.tabs[0].l.addWidget(gb, 0, 0, 1, 1)
gb.setLayout(tl)
sto(self.button_box, self.title)
def create_row(row, widget, tab_to, button=None, icon=None, span=1):
ql = BuddyLabel(widget)
tl.addWidget(ql, row, 1, 1, 1)
tl.addWidget(widget, row, 2, 1, 1)
if button is not None:
tl.addWidget(button, row, 3, span, 1)
if icon is not None:
button.setIcon(QIcon(I(icon)))
if tab_to is not None:
if button is not None:
sto(widget, button)
sto(button, tab_to)
else:
sto(widget, tab_to)
tl.addWidget(self.swap_title_author_button, 0, 0, 2, 1)
create_row(0, self.title, self.title_sort,
button=self.deduce_title_sort_button, span=2,
icon='auto_author_sort.png')
create_row(1, self.title_sort, self.authors)
create_row(2, self.authors, self.author_sort,
button=self.deduce_author_sort_button,
span=2, icon='auto_author_sort.png')
create_row(3, self.author_sort, self.series)
create_row(4, self.series, self.series_index,
button=self.remove_unused_series_button, icon='trash.png')
create_row(5, self.series_index, self.tags)
create_row(6, self.tags, self.rating, button=self.tags_editor_button)
create_row(7, self.rating, self.pubdate)
create_row(8, self.pubdate, self.publisher,
button=self.pubdate.clear_button, icon='trash.png')
create_row(9, self.publisher, self.timestamp)
create_row(10, self.timestamp, self.isbn,
button=self.timestamp.clear_button, icon='trash.png')
create_row(11, self.isbn, self.comments)
tl.addItem(QSpacerItem(1, 1, QSizePolicy.Fixed, QSizePolicy.Expanding),
12, 1, 1 ,1)
w = getattr(self, 'custom_metadata_widgets_parent', None)
if w is not None:
gb = QGroupBox(_('C&ustom metadata'), tab0)
gbl = QVBoxLayout()
gb.setLayout(gbl)
sr = QScrollArea(tab0)
sr.setWidgetResizable(True)
sr.setBackgroundRole(QPalette.Base)
sr.setFrameStyle(QFrame.NoFrame)
sr.setWidget(w)
gbl.addWidget(sr)
self.tabs[0].l.addWidget(gb, 0, 1, 1, 1)
sto(self.isbn, gb)
w = QGroupBox(_('&Comments'), tab0)
sp = QSizePolicy()
sp.setVerticalStretch(10)
sp.setHorizontalPolicy(QSizePolicy.Expanding)
sp.setVerticalPolicy(QSizePolicy.Expanding)
w.setSizePolicy(sp)
l = QHBoxLayout()
w.setLayout(l)
l.addWidget(self.comments)
tab0.l.addWidget(w, 1, 0, 1, 2)
# Tab 1
tab1 = self.tabs[1]
wsp = QWidget(tab1)
wgl = QVBoxLayout()
wsp.setLayout(wgl)
# right-hand side of splitter
gb = QGroupBox(_('Change cover'), tab1)
l = QGridLayout()
gb.setLayout(l)
sto(self.swap_title_author_button, self.cover.buttons[0])
for i, b in enumerate(self.cover.buttons[:3]):
l.addWidget(b, 0, i, 1, 1)
sto(b, self.cover.buttons[i+1])
hl = QHBoxLayout()
for b in self.cover.buttons[3:]:
hl.addWidget(b)
sto(self.cover.buttons[-2], self.cover.buttons[-1])
l.addLayout(hl, 1, 0, 1, 3)
wgl.addWidget(gb)
wgl.addItem(QSpacerItem(10, 10, QSizePolicy.Expanding,
QSizePolicy.Expanding))
wgl.addWidget(self.fetch_metadata_button)
wgl.addItem(QSpacerItem(10, 10, QSizePolicy.Expanding,
QSizePolicy.Expanding))
wgl.addWidget(self.formats_manager)
self.splitter = QSplitter(Qt.Horizontal, tab1)
tab1.l.addWidget(self.splitter)
self.splitter.addWidget(self.cover)
self.splitter.addWidget(wsp)
self.formats_manager.formats.setMaximumWidth(10000)
self.formats_manager.formats.setIconSize(QSize(64, 64))
# }}}
def edit_metadata(db, row_list, current_row, parent=None, view_slot=None):
d = MetadataSingleDialog(db, parent)
d.start(row_list, current_row, view_slot=view_slot)
return d.changed
if __name__ == '__main__':
from PyQt4.Qt import QApplication
app = QApplication([])
from calibre.library import db as db_
db = db_()
row_list = list(range(len(db.data)))
edit_metadata(db, row_list, 0)

View File

@ -12,6 +12,8 @@ from calibre.ebooks.conversion.plumber import Plumber
from calibre.utils.logging import Log from calibre.utils.logging import Log
from calibre.gui2.preferences.conversion_ui import Ui_Form from calibre.gui2.preferences.conversion_ui import Ui_Form
from calibre.gui2.convert.look_and_feel import LookAndFeelWidget from calibre.gui2.convert.look_and_feel import LookAndFeelWidget
from calibre.gui2.convert.heuristics import HeuristicsWidget
from calibre.gui2.convert.search_and_replace import SearchAndReplaceWidget
from calibre.gui2.convert.page_setup import PageSetupWidget from calibre.gui2.convert.page_setup import PageSetupWidget
from calibre.gui2.convert.structure_detection import StructureDetectionWidget from calibre.gui2.convert.structure_detection import StructureDetectionWidget
from calibre.gui2.convert.toc import TOCWidget from calibre.gui2.convert.toc import TOCWidget
@ -82,8 +84,9 @@ class Base(ConfigWidgetBase, Ui_Form):
class CommonOptions(Base): class CommonOptions(Base):
def load_conversion_widgets(self): def load_conversion_widgets(self):
self.conversion_widgets = [LookAndFeelWidget, PageSetupWidget, self.conversion_widgets = [LookAndFeelWidget, HeuristicsWidget,
StructureDetectionWidget, TOCWidget] PageSetupWidget,
StructureDetectionWidget, TOCWidget, SearchAndReplaceWidget,]
class InputOptions(Base): class InputOptions(Base):

View File

@ -114,6 +114,9 @@ class TagsView(QTreeView): # {{{
def set_database(self, db, tag_match, sort_by): def set_database(self, db, tag_match, sort_by):
self.hidden_categories = config['tag_browser_hidden_categories'] self.hidden_categories = config['tag_browser_hidden_categories']
old = getattr(self, '_model', None)
if old is not None:
old.break_cycles()
self._model = TagsModel(db, parent=self, self._model = TagsModel(db, parent=self,
hidden_categories=self.hidden_categories, hidden_categories=self.hidden_categories,
search_restriction=None, search_restriction=None,
@ -371,6 +374,9 @@ class TagsView(QTreeView): # {{{
# model. Reason: it is much easier than reconstructing the browser tree. # model. Reason: it is much easier than reconstructing the browser tree.
def set_new_model(self, filter_categories_by=None): def set_new_model(self, filter_categories_by=None):
try: try:
old = getattr(self, '_model', None)
if old is not None:
old.break_cycles()
self._model = TagsModel(self.db, parent=self, self._model = TagsModel(self.db, parent=self,
hidden_categories=self.hidden_categories, hidden_categories=self.hidden_categories,
search_restriction=self.search_restriction, search_restriction=self.search_restriction,
@ -509,8 +515,8 @@ class TagsModel(QAbstractItemModel): # {{{
QAbstractItemModel.__init__(self, parent) QAbstractItemModel.__init__(self, parent)
# must do this here because 'QPixmap: Must construct a QApplication # must do this here because 'QPixmap: Must construct a QApplication
# before a QPaintDevice'. The ':' in front avoids polluting either the # before a QPaintDevice'. The ':' at the end avoids polluting either of
# user-defined categories (':' at end) or columns namespaces (no ':'). # the other namespaces (alpha, '#', or '@')
iconmap = {} iconmap = {}
for key in category_icon_map: for key in category_icon_map:
iconmap[key] = QIcon(I(category_icon_map[key])) iconmap[key] = QIcon(I(category_icon_map[key]))
@ -544,6 +550,9 @@ class TagsModel(QAbstractItemModel): # {{{
tooltip=tt, category_key=r) tooltip=tt, category_key=r)
self.refresh(data=data) self.refresh(data=data)
def break_cycles(self):
self.db = self.root_item = None
def mimeTypes(self): def mimeTypes(self):
return ["application/calibre+from_library"] return ["application/calibre+from_library"]
@ -681,7 +690,7 @@ class TagsModel(QAbstractItemModel): # {{{
tb_cats = self.db.field_metadata tb_cats = self.db.field_metadata
for user_cat in sorted(self.db.prefs.get('user_categories', {}).keys(), for user_cat in sorted(self.db.prefs.get('user_categories', {}).keys(),
key=sort_key): key=sort_key):
cat_name = user_cat+':' # add the ':' to avoid name collision cat_name = '@' + user_cat # add the '@' to avoid name collision
tb_cats.add_user_category(label=cat_name, name=user_cat) tb_cats.add_user_category(label=cat_name, name=user_cat)
if len(saved_searches().names()): if len(saved_searches().names()):
tb_cats.add_search_category(label='search', name=_('Searches')) tb_cats.add_search_category(label='search', name=_('Searches'))
@ -988,7 +997,7 @@ class TagsModel(QAbstractItemModel): # {{{
if self.hidden_categories and self.categories[i] in self.hidden_categories: if self.hidden_categories and self.categories[i] in self.hidden_categories:
continue continue
row_index += 1 row_index += 1
if key.endswith(':'): if key.startswith('@'):
# User category, so skip it. The tag will be marked in its real category # User category, so skip it. The tag will be marked in its real category
continue continue
category_item = self.root_item.children[row_index] category_item = self.root_item.children[row_index]
@ -1007,7 +1016,7 @@ class TagsModel(QAbstractItemModel): # {{{
ans.append('%s%s:"=%s"'%(prefix, category, tag.name)) ans.append('%s%s:"=%s"'%(prefix, category, tag.name))
return ans return ans
def find_node(self, key, txt, start_path): def find_item_node(self, key, txt, start_path):
''' '''
Search for an item (a node) in the tags browser list that matches both Search for an item (a node) in the tags browser list that matches both
the key (exact case-insensitive match) and txt (contains case- the key (exact case-insensitive match) and txt (contains case-
@ -1061,6 +1070,22 @@ class TagsModel(QAbstractItemModel): # {{{
break break
return self.path_found return self.path_found
def find_category_node(self, key):
'''
Search for an category node (a top-level node) in the tags browser list
that matches the key (exact case-insensitive match). Returns the path to
the node. Paths are as in find_item_node.
'''
if not key:
return None
for i in xrange(self.rowCount(QModelIndex())):
idx = self.index(i, 0, QModelIndex())
ckey = idx.internalPointer().category_key
if strcmp(ckey, key) == 0:
return self.path_for_index(idx)
return None
def show_item_at_path(self, path, box=False): def show_item_at_path(self, path, box=False):
''' '''
Scroll the browser and open categories to show the item referenced by Scroll the browser and open categories to show the item referenced by
@ -1109,8 +1134,7 @@ class TagBrowserMixin(object): # {{{
def __init__(self, db): def __init__(self, db):
self.library_view.model().count_changed_signal.connect(self.tags_view.recount) self.library_view.model().count_changed_signal.connect(self.tags_view.recount)
self.tags_view.set_database(self.library_view.model().db, self.tags_view.set_database(db, self.tag_match, self.sort_by)
self.tag_match, self.sort_by)
self.tags_view.tags_marked.connect(self.search.set_search_string) self.tags_view.tags_marked.connect(self.search.set_search_string)
self.tags_view.tag_list_edit.connect(self.do_tags_list_edit) self.tags_view.tag_list_edit.connect(self.do_tags_list_edit)
self.tags_view.user_category_edit.connect(self.do_user_categories_edit) self.tags_view.user_category_edit.connect(self.do_user_categories_edit)
@ -1347,15 +1371,15 @@ class TagBrowserWidget(QWidget): # {{{
self.search_button.setFocus(True) self.search_button.setFocus(True)
self.item_search.lineEdit().blockSignals(False) self.item_search.lineEdit().blockSignals(False)
colon = txt.find(':')
key = None key = None
colon = txt.rfind(':') if len(txt) > 2 else 0
if colon > 0: if colon > 0:
key = self.parent.library_view.model().db.\ key = self.parent.library_view.model().db.\
field_metadata.search_term_to_field_key(txt[:colon]) field_metadata.search_term_to_field_key(txt[:colon])
txt = txt[colon+1:] txt = txt[colon+1:]
self.current_find_position = model.find_node(key, txt, self.current_find_position = \
self.current_find_position) model.find_item_node(key, txt, self.current_find_position)
if self.current_find_position: if self.current_find_position:
model.show_item_at_path(self.current_find_position, box=True) model.show_item_at_path(self.current_find_position, box=True)
elif self.item_search.text(): elif self.item_search.text():

View File

@ -75,7 +75,7 @@ def convert_single_ebook(parent, db, book_ids, auto_conversion=False, out_format
temp_files.append(d.cover_file) temp_files.append(d.cover_file)
args = [in_file, out_file.name, recs] args = [in_file, out_file.name, recs]
temp_files.append(out_file) temp_files.append(out_file)
jobs.append(('gui_convert', args, desc, d.output_format.upper(), book_id, temp_files)) jobs.append(('gui_convert_override', args, desc, d.output_format.upper(), book_id, temp_files))
changed = True changed = True
d.break_cycles() d.break_cycles()
@ -185,7 +185,7 @@ class QueueBulk(QProgressDialog):
args = [in_file, out_file.name, lrecs] args = [in_file, out_file.name, lrecs]
temp_files.append(out_file) temp_files.append(out_file)
self.jobs.append(('gui_convert', args, desc, self.output_format.upper(), book_id, temp_files)) self.jobs.append(('gui_convert_override', args, desc, self.output_format.upper(), book_id, temp_files))
self.changed = True self.changed = True
self.setValue(self.i) self.setValue(self.i)

View File

@ -16,7 +16,7 @@ from PyQt4.Qt import Qt, SIGNAL, QTimer, \
QPixmap, QMenu, QIcon, pyqtSignal, \ QPixmap, QMenu, QIcon, pyqtSignal, \
QDialog, \ QDialog, \
QSystemTrayIcon, QApplication, QKeySequence, \ QSystemTrayIcon, QApplication, QKeySequence, \
QMessageBox, QHelpEvent QMessageBox, QHelpEvent, QAction
from calibre import prints from calibre import prints
from calibre.constants import __appname__, isosx from calibre.constants import __appname__, isosx
@ -198,6 +198,10 @@ class Main(MainWindow, MainWindowMixin, DeviceMixin, EmailMixin, # {{{
self.system_tray_icon.activated.connect( self.system_tray_icon.activated.connect(
self.system_tray_icon_activated) self.system_tray_icon_activated)
self.esc_action = QAction(self)
self.addAction(self.esc_action)
self.esc_action.setShortcut(QKeySequence(Qt.Key_Escape))
self.esc_action.triggered.connect(self.esc)
####################### Start spare job server ######################## ####################### Start spare job server ########################
QTimer.singleShot(1000, self.add_spare_server) QTimer.singleShot(1000, self.add_spare_server)
@ -294,6 +298,8 @@ class Main(MainWindow, MainWindowMixin, DeviceMixin, EmailMixin, # {{{
'the file: %s<p>The ' 'the file: %s<p>The '
'log will be displayed automatically.')%self.gui_debug, show=True) 'log will be displayed automatically.')%self.gui_debug, show=True)
def esc(self, *args):
self.search.clear()
def start_content_server(self): def start_content_server(self):
from calibre.library.server.main import start_threaded_server from calibre.library.server.main import start_threaded_server
@ -305,7 +311,6 @@ class Main(MainWindow, MainWindowMixin, DeviceMixin, EmailMixin, # {{{
self.content_server.state_callback(True) self.content_server.state_callback(True)
self.test_server_timer = QTimer.singleShot(10000, self.test_server) self.test_server_timer = QTimer.singleShot(10000, self.test_server)
def resizeEvent(self, ev): def resizeEvent(self, ev):
MainWindow.resizeEvent(self, ev) MainWindow.resizeEvent(self, ev)
self.search.setMaximumWidth(self.width()-150) self.search.setMaximumWidth(self.width()-150)
@ -440,6 +445,7 @@ class Main(MainWindow, MainWindowMixin, DeviceMixin, EmailMixin, # {{{
except: except:
import traceback import traceback
traceback.print_exc() traceback.print_exc()
olddb.break_cycles()
if self.device_connected: if self.device_connected:
self.set_books_in_library(self.booklists(), reset=True) self.set_books_in_library(self.booklists(), reset=True)
self.refresh_ondevice() self.refresh_ondevice()
@ -449,7 +455,7 @@ class Main(MainWindow, MainWindowMixin, DeviceMixin, EmailMixin, # {{{
def set_window_title(self): def set_window_title(self):
self.setWindowTitle(__appname__ + u' - ||%s||'%self.iactions['Choose Library'].library_name()) self.setWindowTitle(__appname__ + u' - || %s ||'%self.iactions['Choose Library'].library_name())
def location_selected(self, location): def location_selected(self, location):
''' '''

View File

@ -123,6 +123,8 @@ IMAGE_EXTENSIONS = ['jpg', 'jpeg', 'gif', 'png', 'bmp']
class FormatList(QListWidget): class FormatList(QListWidget):
DROPABBLE_EXTENSIONS = BOOK_EXTENSIONS DROPABBLE_EXTENSIONS = BOOK_EXTENSIONS
formats_dropped = pyqtSignal(object, object)
delete_format = pyqtSignal()
@classmethod @classmethod
def paths_from_event(cls, event): def paths_from_event(cls, event):
@ -146,15 +148,14 @@ class FormatList(QListWidget):
def dropEvent(self, event): def dropEvent(self, event):
paths = self.paths_from_event(event) paths = self.paths_from_event(event)
event.setDropAction(Qt.CopyAction) event.setDropAction(Qt.CopyAction)
self.emit(SIGNAL('formats_dropped(PyQt_PyObject,PyQt_PyObject)'), self.formats_dropped.emit(event, paths)
event, paths)
def dragMoveEvent(self, event): def dragMoveEvent(self, event):
event.acceptProposedAction() event.acceptProposedAction()
def keyPressEvent(self, event): def keyPressEvent(self, event):
if event.key() == Qt.Key_Delete: if event.key() == Qt.Key_Delete:
self.emit(SIGNAL('delete_format()')) self.delete_format.emit()
else: else:
return QListWidget.keyPressEvent(self, event) return QListWidget.keyPressEvent(self, event)
@ -162,6 +163,7 @@ class FormatList(QListWidget):
class ImageView(QWidget): class ImageView(QWidget):
BORDER_WIDTH = 1 BORDER_WIDTH = 1
cover_changed = pyqtSignal(object)
def __init__(self, parent=None): def __init__(self, parent=None):
QWidget.__init__(self, parent) QWidget.__init__(self, parent)
@ -201,8 +203,7 @@ class ImageView(QWidget):
if not pmap.isNull(): if not pmap.isNull():
self.setPixmap(pmap) self.setPixmap(pmap)
event.accept() event.accept()
self.emit(SIGNAL('cover_changed(PyQt_PyObject)'), open(path, self.cover_changed.emit(open(path, 'rb').read())
'rb').read())
break break
def dragMoveEvent(self, event): def dragMoveEvent(self, event):
@ -271,7 +272,7 @@ class ImageView(QWidget):
pmap = cb.pixmap(cb.Selection) pmap = cb.pixmap(cb.Selection)
if not pmap.isNull(): if not pmap.isNull():
self.setPixmap(pmap) self.setPixmap(pmap)
self.emit(SIGNAL('cover_changed(PyQt_PyObject)'), self.cover_changed.emit(
pixmap_to_data(pmap)) pixmap_to_data(pmap))
# }}} # }}}
@ -311,32 +312,6 @@ class FontFamilyModel(QAbstractListModel):
def index_of(self, family): def index_of(self, family):
return self.families.index(family.strip()) return self.families.index(family.strip())
class BasicComboModel(QAbstractListModel):
def __init__(self, items, *args):
QAbstractListModel.__init__(self, *args)
self.items = [i for i in items]
self.items.sort()
def rowCount(self, *args):
return len(self.items)
def data(self, index, role):
try:
item = self.items[index.row()]
except:
traceback.print_exc()
return NONE
if role == Qt.DisplayRole:
return QVariant(item)
if role == Qt.FontRole:
return QVariant(QFont(item))
return NONE
def index_of(self, item):
return self.items.index(item.strip())
class BasicListItem(QListWidgetItem): class BasicListItem(QListWidgetItem):
def __init__(self, text, user_data=None): def __init__(self, text, user_data=None):
@ -527,7 +502,7 @@ class EnComboBox(QComboBox):
def __init__(self, *args): def __init__(self, *args):
QComboBox.__init__(self, *args) QComboBox.__init__(self, *args)
self.setLineEdit(EnLineEdit(self)) self.setLineEdit(EnLineEdit(self))
self.setAutoCompletionCaseSensitivity(Qt.CaseSensitive) self.setAutoCompletionCaseSensitivity(Qt.CaseInsensitive)
self.setMinimumContentsLength(20) self.setMinimumContentsLength(20)
def text(self): def text(self):

View File

@ -42,6 +42,9 @@ class MetadataBackup(Thread): # {{{
def stop(self): def stop(self):
self.keep_running = False self.keep_running = False
# Break cycles so that this object doesn't hold references to db
self.do_write = self.get_metadata_for_dump = self.clear_dirtied = \
self.set_dirtied = self.db = None
def run(self): def run(self):
while self.keep_running: while self.keep_running:
@ -185,6 +188,11 @@ class ResultCache(SearchQueryParser): # {{{
self.build_date_relop_dict() self.build_date_relop_dict()
self.build_numeric_relop_dict() self.build_numeric_relop_dict()
def break_cycles(self):
self._data = self.field_metadata = self.FIELD_MAP = \
self.numeric_search_relops = self.date_search_relops = \
self.all_search_locations = None
def __getitem__(self, row): def __getitem__(self, row):
return self._data[self._map_filtered[row]] return self._data[self._map_filtered[row]]

View File

@ -29,7 +29,6 @@ FIELDS = ['all', 'author_sort', 'authors', 'comments',
'series_index', 'series', 'size', 'tags', 'timestamp', 'title', 'series_index', 'series', 'size', 'tags', 'timestamp', 'title',
'uuid'] 'uuid']
#Allowed fields for template #Allowed fields for template
TEMPLATE_ALLOWED_FIELDS = [ 'author_sort', 'authors', 'id', 'isbn', 'pubdate', TEMPLATE_ALLOWED_FIELDS = [ 'author_sort', 'authors', 'id', 'isbn', 'pubdate',
'publisher', 'series_index', 'series', 'tags', 'timestamp', 'title', 'uuid' ] 'publisher', 'series_index', 'series', 'tags', 'timestamp', 'title', 'uuid' ]
@ -581,7 +580,7 @@ class EPUB_MOBI(CatalogPlugin):
"pipeline to the specified " "pipeline to the specified "
"directory. Useful if you are unsure at which stage " "directory. Useful if you are unsure at which stage "
"of the conversion process a bug is occurring.\n" "of the conversion process a bug is occurring.\n"
"Default: '%default'None\n" "Default: '%default'\n"
"Applies to: ePub, MOBI output formats")), "Applies to: ePub, MOBI output formats")),
Option('--exclude-book-marker', Option('--exclude-book-marker',
default=':', default=':',
@ -605,43 +604,42 @@ class EPUB_MOBI(CatalogPlugin):
"Default: '%default'\n" "Default: '%default'\n"
"Applies to: ePub, MOBI output formats")), "Applies to: ePub, MOBI output formats")),
Option('--generate-authors', Option('--generate-authors',
default=True, default=False,
dest='generate_authors', dest='generate_authors',
action = 'store_true', action = 'store_true',
help=_("Include 'Authors' section in catalog." help=_("Include 'Authors' section in catalog.\n"
"This switch is ignored - Books By Author section is always generated."
"Default: '%default'\n" "Default: '%default'\n"
"Applies to: ePub, MOBI output formats")), "Applies to: ePub, MOBI output formats")),
Option('--generate-descriptions', Option('--generate-descriptions',
default=True, default=False,
dest='generate_descriptions', dest='generate_descriptions',
action = 'store_true', action = 'store_true',
help=_("Include book descriptions in catalog.\n" help=_("Include 'Descriptions' section in catalog.\n"
"Default: '%default'\n" "Default: '%default'\n"
"Applies to: ePub, MOBI output formats")), "Applies to: ePub, MOBI output formats")),
Option('--generate-genres', Option('--generate-genres',
default=True, default=False,
dest='generate_genres', dest='generate_genres',
action = 'store_true', action = 'store_true',
help=_("Include 'Genres' section in catalog.\n" help=_("Include 'Genres' section in catalog.\n"
"Default: '%default'\n" "Default: '%default'\n"
"Applies to: ePub, MOBI output formats")), "Applies to: ePub, MOBI output formats")),
Option('--generate-titles', Option('--generate-titles',
default=True, default=False,
dest='generate_titles', dest='generate_titles',
action = 'store_true', action = 'store_true',
help=_("Include 'Titles' section in catalog.\n" help=_("Include 'Titles' section in catalog.\n"
"Default: '%default'\n" "Default: '%default'\n"
"Applies to: ePub, MOBI output formats")), "Applies to: ePub, MOBI output formats")),
Option('--generate-series', Option('--generate-series',
default=True, default=False,
dest='generate_series', dest='generate_series',
action = 'store_true', action = 'store_true',
help=_("Include 'Series' section in catalog.\n" help=_("Include 'Series' section in catalog.\n"
"Default: '%default'\n" "Default: '%default'\n"
"Applies to: ePub, MOBI output formats")), "Applies to: ePub, MOBI output formats")),
Option('--generate-recently-added', Option('--generate-recently-added',
default=True, default=False,
dest='generate_recently_added', dest='generate_recently_added',
action = 'store_true', action = 'store_true',
help=_("Include 'Recently Added' section in catalog.\n" help=_("Include 'Recently Added' section in catalog.\n"
@ -976,7 +974,7 @@ class EPUB_MOBI(CatalogPlugin):
self.__thumbWidth = 0 self.__thumbWidth = 0
self.__thumbHeight = 0 self.__thumbHeight = 0
self.__title = opts.catalog_title self.__title = opts.catalog_title
self.__totalSteps = 8.0 self.__totalSteps = 6.0
self.__useSeriesPrefixInTitlesSection = False self.__useSeriesPrefixInTitlesSection = False
self.__verbose = opts.verbose self.__verbose = opts.verbose
@ -1014,17 +1012,21 @@ class EPUB_MOBI(CatalogPlugin):
(self.__archive_path, float(cached_thumb_width))) (self.__archive_path, float(cached_thumb_width)))
# Tweak build steps based on optional sections: 1 call for HTML, 1 for NCX # Tweak build steps based on optional sections: 1 call for HTML, 1 for NCX
incremental_jobs = 0
if self.opts.generate_authors:
incremental_jobs += 2
if self.opts.generate_titles: if self.opts.generate_titles:
self.__totalSteps += 2 incremental_jobs += 2
if self.opts.generate_recently_added: if self.opts.generate_recently_added:
self.__totalSteps += 2 incremental_jobs += 2
if self.generateRecentlyRead: if self.generateRecentlyRead:
self.__totalSteps += 2 incremental_jobs += 2
if self.opts.generate_series: if self.opts.generate_series:
self.__totalSteps += 2 incremental_jobs += 2
if self.opts.generate_descriptions: if self.opts.generate_descriptions:
# +1 thumbs # +1 thumbs
self.__totalSteps += 3 incremental_jobs += 3
self.__totalSteps += incremental_jobs
# Load section list templates # Load section list templates
templates = [] templates = []
@ -1358,13 +1360,23 @@ class EPUB_MOBI(CatalogPlugin):
if self.opts.generate_descriptions: if self.opts.generate_descriptions:
self.generateThumbnails() self.generateThumbnails()
self.generateHTMLDescriptions() self.generateHTMLDescriptions()
self.generateHTMLByAuthor() if self.opts.generate_authors:
self.generateHTMLByAuthor()
if self.opts.generate_titles: if self.opts.generate_titles:
self.generateHTMLByTitle() self.generateHTMLByTitle()
if self.opts.generate_series: if self.opts.generate_series:
self.generateHTMLBySeries() self.generateHTMLBySeries()
if self.opts.generate_genres: if self.opts.generate_genres:
self.generateHTMLByTags() self.generateHTMLByTags()
# If this is the only Section, and there are no genres, bail
if self.opts.section_list == ['Genres'] and not self.genres:
error_msg = _("No enabled genres found to catalog.\n")
if not self.opts.cli_environment:
error_msg += "Check 'Excluded genres'\nin E-book options.\n"
self.opts.log.error(error_msg)
self.error.append(_('No books available to catalog'))
self.error.append(error_msg)
return False
if self.opts.generate_recently_added: if self.opts.generate_recently_added:
self.generateHTMLByDateAdded() self.generateHTMLByDateAdded()
if self.generateRecentlyRead: if self.generateRecentlyRead:
@ -1372,7 +1384,8 @@ class EPUB_MOBI(CatalogPlugin):
self.generateOPF() self.generateOPF()
self.generateNCXHeader() self.generateNCXHeader()
self.generateNCXByAuthor("Authors") if self.opts.generate_authors:
self.generateNCXByAuthor("Authors")
if self.opts.generate_titles: if self.opts.generate_titles:
self.generateNCXByTitle("Titles") self.generateNCXByTitle("Titles")
if self.opts.generate_series: if self.opts.generate_series:
@ -1508,7 +1521,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
for tag in exclude_tags: for tag in exclude_tags:
search_terms.append("tag:=%s" % tag) search_terms.append("tag:=%s" % tag)
search_phrase = "not (%s)" % " or ".join(search_terms) search_phrase = "not (%s)" % " or ".join(search_terms)
# If a list of ids are provided, don't use search_text # If a list of ids are provided, don't use search_text
if self.opts.ids: if self.opts.ids:
self.opts.search_text = search_phrase self.opts.search_text = search_phrase
@ -1879,7 +1891,8 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
# Link to author # Link to author
emTag = Tag(soup, "em") emTag = Tag(soup, "em")
aTag = Tag(soup, "a") aTag = Tag(soup, "a")
aTag['href'] = "%s.html#%s" % ("ByAlphaAuthor", self.generateAuthorAnchor(book['author'])) if self.opts.generate_authors:
aTag['href'] = "%s.html#%s" % ("ByAlphaAuthor", self.generateAuthorAnchor(book['author']))
aTag.insert(0, NavigableString(book['author'])) aTag.insert(0, NavigableString(book['author']))
emTag.insert(0,aTag) emTag.insert(0,aTag)
pBookTag.insert(ptc, emTag) pBookTag.insert(ptc, emTag)
@ -2149,7 +2162,8 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
pAuthorTag = Tag(soup, "p") pAuthorTag = Tag(soup, "p")
pAuthorTag['class'] = "author_index" pAuthorTag['class'] = "author_index"
aTag = Tag(soup, "a") aTag = Tag(soup, "a")
aTag['name'] = "%s" % self.generateAuthorAnchor(current_author) if self.opts.generate_authors:
aTag['name'] = "%s" % self.generateAuthorAnchor(current_author)
aTag.insert(0,NavigableString(current_author)) aTag.insert(0,NavigableString(current_author))
pAuthorTag.insert(0,aTag) pAuthorTag.insert(0,aTag)
divTag.insert(dtc,pAuthorTag) divTag.insert(dtc,pAuthorTag)
@ -2276,7 +2290,8 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
# Link to author # Link to author
emTag = Tag(soup, "em") emTag = Tag(soup, "em")
aTag = Tag(soup, "a") aTag = Tag(soup, "a")
aTag['href'] = "%s.html#%s" % ("ByAlphaAuthor", self.generateAuthorAnchor(new_entry['author'])) if self.opts.generate_authors:
aTag['href'] = "%s.html#%s" % ("ByAlphaAuthor", self.generateAuthorAnchor(new_entry['author']))
aTag.insert(0, NavigableString(new_entry['author'])) aTag.insert(0, NavigableString(new_entry['author']))
emTag.insert(0,aTag) emTag.insert(0,aTag)
pBookTag.insert(ptc, emTag) pBookTag.insert(ptc, emTag)
@ -2425,7 +2440,8 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
# Link to author # Link to author
emTag = Tag(soup, "em") emTag = Tag(soup, "em")
aTag = Tag(soup, "a") aTag = Tag(soup, "a")
aTag['href'] = "%s.html#%s" % ("ByAlphaAuthor", self.generateAuthorAnchor(new_entry['author'])) if self.opts.generate_authors:
aTag['href'] = "%s.html#%s" % ("ByAlphaAuthor", self.generateAuthorAnchor(new_entry['author']))
aTag.insert(0, NavigableString(new_entry['author'])) aTag.insert(0, NavigableString(new_entry['author']))
emTag.insert(0,aTag) emTag.insert(0,aTag)
pBookTag.insert(ptc, emTag) pBookTag.insert(ptc, emTag)
@ -2473,7 +2489,8 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
# Link to author # Link to author
emTag = Tag(soup, "em") emTag = Tag(soup, "em")
aTag = Tag(soup, "a") aTag = Tag(soup, "a")
aTag['href'] = "%s.html#%s" % ("ByAlphaAuthor", self.generateAuthorAnchor(new_entry['author'])) if self.opts.generate_authors:
aTag['href'] = "%s.html#%s" % ("ByAlphaAuthor", self.generateAuthorAnchor(new_entry['author']))
aTag.insert(0, NavigableString(new_entry['author'])) aTag.insert(0, NavigableString(new_entry['author']))
emTag.insert(0,aTag) emTag.insert(0,aTag)
pBookTag.insert(ptc, emTag) pBookTag.insert(ptc, emTag)
@ -2692,7 +2709,8 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
# Link to author # Link to author
aTag = Tag(soup, "a") aTag = Tag(soup, "a")
aTag['href'] = "%s.html#%s" % ("ByAlphaAuthor", if self.opts.generate_authors:
aTag['href'] = "%s.html#%s" % ("ByAlphaAuthor",
self.generateAuthorAnchor(escape(' & '.join(book['authors'])))) self.generateAuthorAnchor(escape(' & '.join(book['authors']))))
aTag.insert(0, NavigableString(' &amp; '.join(book['authors']))) aTag.insert(0, NavigableString(' &amp; '.join(book['authors'])))
pBookTag.insert(ptc, aTag) pBookTag.insert(ptc, aTag)
@ -2776,14 +2794,16 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
genre_list.append(tag_list) genre_list.append(tag_list)
if self.opts.verbose: if self.opts.verbose:
self.opts.log.info(" Genre summary: %d active genre tags used in generating catalog with %d titles" % if len(genre_list):
self.opts.log.info(" Genre summary: %d active genre tags used in generating catalog with %d titles" %
(len(genre_list), len(self.booksByTitle))) (len(genre_list), len(self.booksByTitle)))
for genre in genre_list: for genre in genre_list:
for key in genre: for key in genre:
self.opts.log.info(" %s: %d %s" % (self.getFriendlyGenreTag(key), self.opts.log.info(" %s: %d %s" % (self.getFriendlyGenreTag(key),
len(genre[key]), len(genre[key]),
'titles' if len(genre[key]) > 1 else 'title')) 'titles' if len(genre[key]) > 1 else 'title'))
# Write the results # Write the results
# genre_list = [ {friendly_tag:[{book},{book}]}, {friendly_tag:[{book},{book}]}, ...] # genre_list = [ {friendly_tag:[{book},{book}]}, {friendly_tag:[{book},{book}]}, ...]
@ -3074,10 +3094,36 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
textTag.insert(0, NavigableString(self.title)) textTag.insert(0, NavigableString(self.title))
navLabelTag.insert(0, textTag) navLabelTag.insert(0, textTag)
navPointTag.insert(0, navLabelTag) navPointTag.insert(0, navLabelTag)
contentTag = Tag(soup, 'content')
#contentTag['src'] = "content/book_%d.html" % int(self.booksByTitle[0]['id']) if self.opts.generate_authors:
contentTag['src'] = "content/ByAlphaAuthor.html" contentTag = Tag(soup, 'content')
navPointTag.insert(1, contentTag) contentTag['src'] = "content/ByAlphaAuthor.html"
navPointTag.insert(1, contentTag)
elif self.opts.generate_titles:
contentTag = Tag(soup, 'content')
contentTag['src'] = "content/ByAlphaTitle.html"
navPointTag.insert(1, contentTag)
elif self.opts.generate_series:
contentTag = Tag(soup, 'content')
contentTag['src'] = "content/BySeries.html"
navPointTag.insert(1, contentTag)
elif self.opts.generate_genres:
contentTag = Tag(soup, 'content')
#contentTag['src'] = "content/ByGenres.html"
contentTag['src'] = "%s" % self.genres[0]['file']
navPointTag.insert(1, contentTag)
elif self.opts.generate_recently_added:
contentTag = Tag(soup, 'content')
contentTag['src'] = "content/ByDateAdded.html"
navPointTag.insert(1, contentTag)
else:
# Descriptions only
sort_descriptions_by = self.booksByAuthor if self.opts.sort_descriptions_by_author \
else self.booksByTitle
contentTag = Tag(soup, 'content')
contentTag['src'] = "content/book_%d.html" % int(sort_descriptions_by[0]['id'])
navPointTag.insert(1, contentTag)
cmiTag = Tag(soup, '%s' % 'calibre:meta-img') cmiTag = Tag(soup, '%s' % 'calibre:meta-img')
cmiTag['name'] = "mastheadImage" cmiTag['name'] = "mastheadImage"
cmiTag['src'] = "images/mastheadImage.gif" cmiTag['src'] = "images/mastheadImage.gif"
@ -3085,7 +3131,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
navMapTag.insert(0,navPointTag) navMapTag.insert(0,navPointTag)
ncx.insert(0,navMapTag) ncx.insert(0,navMapTag)
self.ncxSoup = soup self.ncxSoup = soup
def generateNCXDescriptions(self, tocTitle): def generateNCXDescriptions(self, tocTitle):
@ -3871,7 +3916,6 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
# Add this section to the body # Add this section to the body
body.insert(btc, navPointTag) body.insert(btc, navPointTag)
btc += 1 btc += 1
self.ncxSoup = ncx_soup self.ncxSoup = ncx_soup
def writeNCX(self): def writeNCX(self):
@ -4015,12 +4059,34 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
# Remove the special marker tags from the database's tag list, # Remove the special marker tags from the database's tag list,
# return sorted list of normalized genre tags # return sorted list of normalized genre tags
def format_tag_list(tags, indent=5, line_break=70, header='Tag list'):
def next_tag(sorted_tags):
for (i, tag) in enumerate(sorted_tags):
if i < len(tags) - 1:
yield tag + ", "
else:
yield tag
ans = '%s%d %s:\n' % (' ' * indent, len(tags), header)
ans += ' ' * (indent + 1)
out_str = ''
sorted_tags = sorted(tags)
for tag in next_tag(sorted_tags):
out_str += tag
if len(out_str) >= line_break:
ans += out_str + '\n'
out_str = ' ' * (indent + 1)
return ans + out_str
normalized_tags = [] normalized_tags = []
friendly_tags = [] friendly_tags = []
excluded_tags = []
for tag in tags: for tag in tags:
if tag[0] in self.markerTags: if tag in self.markerTags:
excluded_tags.append(tag)
continue continue
if re.search(self.opts.exclude_genre, tag): if re.search(self.opts.exclude_genre, tag):
excluded_tags.append(tag)
continue continue
if tag == ' ': if tag == ' ':
continue continue
@ -4039,32 +4105,8 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
if genre_tags_dict[key] == normalized: if genre_tags_dict[key] == normalized:
self.opts.log.warn(" %s" % key) self.opts.log.warn(" %s" % key)
if self.verbose: if self.verbose:
def next_tag(tags): self.opts.log.info('%s' % format_tag_list(genre_tags_dict, header="enabled genre tags in database"))
for (i, tag) in enumerate(tags): self.opts.log.info('%s' % format_tag_list(excluded_tags, header="excluded genre tags"))
if i < len(tags) - 1:
yield tag + ", "
else:
yield tag
self.opts.log.info(u' %d genre tags in database (excluding genres matching %s):' % \
(len(genre_tags_dict), self.opts.exclude_genre))
# Display friendly/normalized genres
# friendly => normalized
if False:
sorted_tags = ['%s => %s' % (key, genre_tags_dict[key]) for key in sorted(genre_tags_dict.keys())]
for tag in next_tag(sorted_tags):
self.opts.log(u' %s' % tag)
else:
sorted_tags = ['%s' % (key) for key in sorted(genre_tags_dict.keys())]
out_str = ''
line_break = 70
for tag in next_tag(sorted_tags):
out_str += tag
if len(out_str) >= line_break:
self.opts.log.info(' %s' % out_str)
out_str = ''
self.opts.log.info(' %s' % out_str)
return genre_tags_dict return genre_tags_dict
@ -4140,7 +4182,8 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
pAuthorTag = Tag(soup, "p") pAuthorTag = Tag(soup, "p")
pAuthorTag['class'] = "author_index" pAuthorTag['class'] = "author_index"
aTag = Tag(soup, "a") aTag = Tag(soup, "a")
aTag['href'] = "%s.html#%s" % ("ByAlphaAuthor", self.generateAuthorAnchor(book['author'])) if self.opts.generate_authors:
aTag['href'] = "%s.html#%s" % ("ByAlphaAuthor", self.generateAuthorAnchor(book['author']))
aTag.insert(0, book['author']) aTag.insert(0, book['author'])
pAuthorTag.insert(0,aTag) pAuthorTag.insert(0,aTag)
divTag.insert(dtc,pAuthorTag) divTag.insert(dtc,pAuthorTag)
@ -4371,7 +4414,8 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
# Insert the author link (always) # Insert the author link (always)
aTag = body.find('a', attrs={'class':'author'}) aTag = body.find('a', attrs={'class':'author'})
aTag['href'] = "%s.html#%s" % ("ByAlphaAuthor", if self.opts.generate_authors:
aTag['href'] = "%s.html#%s" % ("ByAlphaAuthor",
self.generateAuthorAnchor(book['author'])) self.generateAuthorAnchor(book['author']))
if publisher == ' ': if publisher == ' ':
@ -4860,6 +4904,8 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
opts.basename = "Catalog" opts.basename = "Catalog"
opts.cli_environment = not hasattr(opts,'sync') opts.cli_environment = not hasattr(opts,'sync')
# Hard-wired to always sort descriptions by author, with series after non-series
opts.sort_descriptions_by_author = True opts.sort_descriptions_by_author = True
build_log = [] build_log = []
@ -4898,14 +4944,13 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
if opts_dict['ids']: if opts_dict['ids']:
build_log.append(" book count: %d" % len(opts_dict['ids'])) build_log.append(" book count: %d" % len(opts_dict['ids']))
'''
sections_list = [] sections_list = []
if opts.generate_authors: if opts.generate_authors:
sections_list.append('Authors') sections_list.append('Authors')
'''
sections_list = ['Authors']
if opts.generate_titles: if opts.generate_titles:
sections_list.append('Titles') sections_list.append('Titles')
if opts.generate_series:
sections_list.append('Series')
if opts.generate_genres: if opts.generate_genres:
sections_list.append('Genres') sections_list.append('Genres')
if opts.generate_recently_added: if opts.generate_recently_added:
@ -4913,7 +4958,27 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
if opts.generate_descriptions: if opts.generate_descriptions:
sections_list.append('Descriptions') sections_list.append('Descriptions')
build_log.append(u" Sections: %s" % ', '.join(sections_list)) if not sections_list:
if opts.cli_environment:
opts.log.warn('*** No Section switches specified, enabling all Sections ***')
opts.generate_authors = True
opts.generate_titles = True
opts.generate_series = True
opts.generate_genres = True
opts.generate_recently_added = True
opts.generate_descriptions = True
sections_list = ['Authors','Titles','Series','Genres','Recently Added','Descriptions']
else:
opts.log.warn('\n*** No enabled Sections, terminating catalog generation ***')
return ["No Included Sections","No enabled Sections.\nCheck E-book options tab\n'Included sections'\n"]
if opts.fmt == 'mobi' and sections_list == ['Descriptions']:
warning = _("\n*** Adding 'By Authors' Section required for MOBI output ***")
opts.log.warn(warning)
sections_list.insert(0,'Authors')
opts.generate_authors = True
opts.log(u" Sections: %s" % ', '.join(sections_list))
opts.section_list = sections_list
# Limit thumb_width to 1.0" - 2.0" # Limit thumb_width to 1.0" - 2.0"
try: try:
@ -4948,6 +5013,7 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
# Launch the Catalog builder # Launch the Catalog builder
catalog = self.CatalogBuilder(db, opts, self, report_progress=notification) catalog = self.CatalogBuilder(db, opts, self, report_progress=notification)
if opts.verbose: if opts.verbose:
log.info(" Begin catalog source generation") log.info(" Begin catalog source generation")
catalog.createDirectoryStructure() catalog.createDirectoryStructure()
@ -4959,7 +5025,7 @@ then rebuild the catalog.\n''').format(author[0],author[1],current_author[1])
if catalog_source_built: if catalog_source_built:
log.info(" Completed catalog source generation\n") log.info(" Completed catalog source generation\n")
else: else:
log.warn(" *** Errors during catalog generation, check log for details ***") log.error(" *** Terminated catalog generation, check log for details ***")
if catalog_source_built: if catalog_source_built:
recommendations = [] recommendations = []

View File

@ -319,7 +319,7 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
self.field_metadata.remove_dynamic_categories() self.field_metadata.remove_dynamic_categories()
tb_cats = self.field_metadata tb_cats = self.field_metadata
for user_cat in sorted(self.prefs.get('user_categories', {}).keys(), key=sort_key): for user_cat in sorted(self.prefs.get('user_categories', {}).keys(), key=sort_key):
cat_name = user_cat+':' # add the ':' to avoid name collision cat_name = '@' + user_cat # add the '@' to avoid name collision
tb_cats.add_user_category(label=cat_name, name=user_cat) tb_cats.add_user_category(label=cat_name, name=user_cat)
if len(saved_searches().names()): if len(saved_searches().names()):
tb_cats.add_search_category(label='search', name=_('Searches')) tb_cats.add_search_category(label='search', name=_('Searches'))
@ -361,6 +361,10 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
self.refresh() self.refresh()
self.last_update_check = self.last_modified() self.last_update_check = self.last_modified()
def break_cycles(self):
self.data.break_cycles()
self.data = self.field_metadata = self.prefs = self.listeners = \
self.refresh_ondevice = None
def initialize_database(self): def initialize_database(self):
metadata_sqlite = open(P('metadata_sqlite.sql'), 'rb').read() metadata_sqlite = open(P('metadata_sqlite.sql'), 'rb').read()
@ -1239,7 +1243,7 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
if category in icon_map: if category in icon_map:
icon = icon_map[label] icon = icon_map[label]
else: else:
icon = icon_map[':custom'] icon = icon_map['custom:']
icon_map[category] = icon icon_map[category] = icon
datatype = cat['datatype'] datatype = cat['datatype']
@ -1335,20 +1339,19 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
if label in taglist and name in taglist[label]: if label in taglist and name in taglist[label]:
items.append(taglist[label][name]) items.append(taglist[label][name])
# else: do nothing, to not include nodes w zero counts # else: do nothing, to not include nodes w zero counts
if len(items): cat_name = '@' + user_cat # add the '@' to avoid name collision
cat_name = user_cat+':' # add the ':' to avoid name collision # Not a problem if we accumulate entries in the icon map
# Not a problem if we accumulate entries in the icon map if icon_map is not None:
if icon_map is not None: icon_map[cat_name] = icon_map['user:']
icon_map[cat_name] = icon_map[':user'] if sort == 'popularity':
if sort == 'popularity': categories[cat_name] = \
categories[cat_name] = \ sorted(items, key=lambda x: x.count, reverse=True)
sorted(items, key=lambda x: x.count, reverse=True) elif sort == 'name':
elif sort == 'name': categories[cat_name] = \
categories[cat_name] = \ sorted(items, key=lambda x: sort_key(x.sort))
sorted(items, key=lambda x: sort_key(x.sort)) else:
else: categories[cat_name] = \
categories[cat_name] = \ sorted(items, key=lambda x:x.avg_rating, reverse=True)
sorted(items, key=lambda x:x.avg_rating, reverse=True)
#### Finally, the saved searches category #### #### Finally, the saved searches category ####
items = [] items = []

View File

@ -16,7 +16,7 @@ class TagsIcons(dict):
''' '''
category_icons = ['authors', 'series', 'formats', 'publisher', 'rating', category_icons = ['authors', 'series', 'formats', 'publisher', 'rating',
'news', 'tags', ':custom', ':user', 'search',] 'news', 'tags', 'custom:', 'user:', 'search',]
def __init__(self, icon_dict): def __init__(self, icon_dict):
for a in self.category_icons: for a in self.category_icons:
if a not in icon_dict: if a not in icon_dict:
@ -31,8 +31,8 @@ category_icon_map = {
'rating' : 'rating.png', 'rating' : 'rating.png',
'news' : 'news.png', 'news' : 'news.png',
'tags' : 'tags.png', 'tags' : 'tags.png',
':custom' : 'column.png', 'custom:' : 'column.png',
':user' : 'drawer.png', 'user:' : 'drawer.png',
'search' : 'search.png' 'search' : 'search.png'
} }

View File

@ -255,6 +255,100 @@ you are producing are meant for a particular device type, choose the correspondi
The Output profile also controls the screen size. This will cause, for example, images to be auto-resized to be fit to the screen in some output formats. So choose a profile of a device that has a screen size similar to your device. The Output profile also controls the screen size. This will cause, for example, images to be auto-resized to be fit to the screen in some output formats. So choose a profile of a device that has a screen size similar to your device.
.. _heuristic-processing:
Heuristic Processing
---------------------
Heuristic Processing provides a variety of functions which can be used to try and detect and correct
common problems in poorly formatted input documents. Use these functions if your input document suffers
from poor formatting. Because these functions rely on common patterns, be aware that in some cases an
option may lead to worse results, so use with care. As an example, several of these options will
remove all non-breaking-space entities, or may include false positive matches relating to the function.
:guilabel:`Enable heuristic processing`
This option activates |app|'s Heuristic Processing stage of the conversion pipeline.
This must be enabled in order for various sub-functions to be applied
:guilabel:`Unwrap lines`
Enabling this option will cause |app| to attempt to detect and correct hard line breaks that exist
within a document using punctuation clues and line length. |app| will first attempt to detect whether
hard line breaks exist, if they do not appear to exist |app| will not attempt to unwrap lines. The
line-unwrap factor can be reduced if you want to 'force' |app| to unwrap lines.
:guilabel:`Line-unwrap factor`
This option controls the algorithm |app| uses to remove hard line breaks. For example, if the value of this
option is 0.4, that means calibre will remove hard line breaks from the end of lines whose lengths are less
than the length of 40% of all lines in the document. If your document only has a few line breaks which need
correction, then this value should be reduced to somewhere between 0.1 and 0.2.
:guilabel:`Detect and markup unformatted chapter headings and sub headings`
If your document does not have chapter headings and titles formatted differently from the rest of the text,
|app| can use this option to attempt detection them and surround them with heading tags. <h2> tags are used
for chapter headings; <h3> tags are used for any titles that are detected.
This function will not create a TOC, but in many cases it will cause |app|'s default chapter detection settings
to correctly detect chapters and build a TOC. Adjust the XPath under Structure Detection if a TOC is not automatically
created. If there are no other headings used in the document then setting "//h:h2" under Structure Detection would
be the easiest way to create a TOC for the document.
The inserted headings are not formatted, to apply formatting use the :guilabel:`Extra CSS` option under
the Look and Feel conversion settings. For example, to center heading tags, use the following::
h2, h3 { text-align: center }
:guilabel:`Renumber sequences of <h1> or <h2> tags`
Some publishers format chapter headings using multiple <h1> or <h2> tags sequentially.
|app|'s default conversion settings will cause such titles to be split into two pieces. This option
will re-number the heading tags to prevent splitting.
:guilabel:`Delete blank lines between paragraphs`
This option will cause |app| to analyze blank lines included within the document. If every paragraph is interleaved
with a blank line, then |app| will remove all those blank paragraphs. Sequences of multiple blank lines will be
considered scene breaks and retained as a single paragraph. This option differs from the 'Remove Paragraph Spacing'
option under 'Look and Feel' in that it actually modifies the HTML content, while the other option modifies the document
styles. This option can also remove paragraphs which were inserted using |app|'s 'Insert blank line' option.
:guilabel:`Ensure scene breaks are consistently formatted`
With this option |app| will attempt to detect common scene-break markers and ensure that they are center aligned.
It also attempts to detect scene breaks defined by white space and replace them with a horizontal rule 15% of the
page width. Some readers may find this desirable as these 'soft' scene breaks often become page breaks on readers, and
thus become difficult to distinguish.
:guilabel:`Remove unnecessary hyphens`
|app| will analyze all hyphenated content in the document when this option is enabled. The document itself is used
as a dictionary for analysis. This allows |app| to accurately remove hyphens for any words in the document in any language,
along with made-up and obscure scientific words. The primary drawback is words appearing only a single time in the document
will not be changed. Analysis happens in two passes, the first pass analyzes line endings. Lines are only unwrapped if the
word exists with or without a hyphen in the document. The second pass analyzes all hyphenated words throughout the document,
hyphens are removed if the word exists elsewhere in the document without a match.
:guilabel:`Italicize common words and patterns`
When enabled, |app| will look for common words and patterns that denote italics and italicize them. Examples are common text
conventions such as ~word~ or phrases that should generally be italicized, e.g. latin phrases like 'etc.' or 'et cetera'.
:guilabel:`Replace entity indents with CSS indents`
Some documents use a convention of defining text indents using non-breaking space entities. When this option is enabled |app| will
attempt to detect this sort of formatting and convert them to a 3% text indent using css.
.. _search-replace:
Search & Replace
---------------------
These options are useful primarily for conversion of PDF documents or OCR conversions, though they can
also be used to fix many document specific problems. As an example, some conversions can leaves behind page
headers and footers in the text. These options use regular expressions to try and detect headers, footers,
or other arbitrary text and remove or replace them. Remember that they operate on the intermediate XHTML produced
by the conversion pipeline. There is a wizard to help you customize the regular expressions for
your document. Click the magic wand beside the expression box, and click the 'Test' button after composing
your search expression. Successful matches will be highlighted in Yellow.
The search works by using a python regular expression. All matched text is simply removed from
the document or replaced using the replacement pattern. The replacement pattern is optional, if left blank
then text matching the search pattern will be deleted from the document. You can learn more about regular expressions
and their syntax at :ref:`regexptutorial`.
.. _structure-detection: .. _structure-detection:
Structure Detection Structure Detection
@ -298,21 +392,6 @@ which means that |app| will insert page breaks before every `<h1>` and `<h2>` ta
The default expressions may change depending on the input format you are converting. The default expressions may change depending on the input format you are converting.
Removing headers and footers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
These options are useful primarily for conversion of PDF documents. Often, the conversion leaves
behind page headers and footers in the text. These options use regular expressions to try and detect
the headers and footers and remove them. Remember that they operate on the intermediate XHTML produced
by the conversion pipeline. There is also a wizard to help you customize the regular expressions for
your document.
The header and footer regular expressions are used in conjunction with the remove header and footer options.
If the remove option is not enabled the regular expression will not be applied to remove the matched text.
The removal works by using a python regular expression. All matched text is simply removed from
the document. You can learn more about regular expressions and their syntax at
http://docs.python.org/library/re.html.
Miscellaneous Miscellaneous
~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~
@ -330,16 +409,6 @@ There are a few more options in this section.
two covers. This option will simply remove the first image from the source document, thereby two covers. This option will simply remove the first image from the source document, thereby
ensuring that the converted book has only one cover, the one specified in |app|. ensuring that the converted book has only one cover, the one specified in |app|.
:guilabel:`Preprocess input`
This option activates various algorithms that try to detect and correct common cases of
badly formatted input documents. Things like hard line breaks, large blocks of text with no formatting, etc.
Turn this option on if your input document suffers from bad formatting. But be aware that in
some cases, this option can lead to worse results, so use with care.
:guilabel:`Line-unwrap factor`
This option control the algorithm |app| uses to remove hard line breaks. For example, if the value of this
option is 0.4, that means calibre will remove hard line breaks from the end of lines whose lengths are less
than the length of 40% of all lines in the document.
Table of Contents Table of Contents
------------------ ------------------
@ -488,26 +557,33 @@ at `mobileread <http://www.mobileread.com/forums/showthread.php?t=28313>`_.
Convert TXT documents Convert TXT documents
~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
TXT documents have no well defined way to specify formatting like bold, italics, etc, or document structure like paragraphs, headings, sections and so on. TXT documents have no well defined way to specify formatting like bold, italics, etc, or document
Since TXT documents provide no way to explicitly mark parts of structure like paragraphs, headings, sections and so on, but there are a variety of conventions commonly
the text, by default |app| only groups lines in the input document into paragraphs. The default is to assume one or used. By default |app| attempts automatic detection of the correct formatting and markup based on those
more blank lines are a paragraph boundary:: conventions.
This is the first.
This is the
second paragraph.
TXT input supports a number of options to differentiate how paragraphs are detected. TXT input supports a number of options to differentiate how paragraphs are detected.
:guilabel:`Treat each line as a paragraph` :guilabel:`Paragraph Style: Auto`
Analyzes the text file and attempts to automatically determine how paragraphs are defined. This
option will generally work fine, if you achieve undesirable results try one of the manual options.
:guilabel:`Paragraph Style: Block`
Assumes one or more blank lines are a paragraph boundary::
This is the first.
This is the
second paragraph.
:guilabel:`Paragraph Style: Single`
Assumes that every line is a paragraph:: Assumes that every line is a paragraph::
This is the first. This is the first.
This is the second. This is the second.
This is the third. This is the third.
:guilabel:`Assume print formatting` :guilabel:`Paragraph Style: Print`
Assumes that every paragraph starts with an indent (either a tab or 2+ spaces). Paragraphs end when Assumes that every paragraph starts with an indent (either a tab or 2+ spaces). Paragraphs end when
the next line that starts with an indent is reached:: the next line that starts with an indent is reached::
@ -518,13 +594,28 @@ TXT input supports a number of options to differentiate how paragraphs are detec
This is the This is the
third. third.
:guilabel:`Process using markdown` :guilabel:`Paragraph Style: Unformatted`
Assumes that the document has no formatting, but does use hard line breaks. Punctuation
and median line length are used to attempt to re-create paragraphs.
:guilabel:`Formatting Style: Auto`
Attemtps to detect the type of formatting markup being used. If no markup is used then heuristic
formatting will be applied.
:guilabel:`Formatting Style: Heuristic`
Analyses the document for common chapter headings, scene breaks, and italicized words and applies the
appropriate html markup during conversion.
:guilabel:`Formatting Style: Markdown`
|app| also supports running TXT input though a transformation preprocessor known as markdown. Markdown |app| also supports running TXT input though a transformation preprocessor known as markdown. Markdown
allows for basic formatting to be added to TXT documents, such as bold, italics, section headings, tables, allows for basic formatting to be added to TXT documents, such as bold, italics, section headings, tables,
lists, a Table of Contents, etc. Marking chapter headings with a leading # and setting the chapter XPath detection lists, a Table of Contents, etc. Marking chapter headings with a leading # and setting the chapter XPath detection
expression to "//h:h1" is the easiest way to have a proper table of contents generated from a TXT document. expression to "//h:h1" is the easiest way to have a proper table of contents generated from a TXT document.
You can learn more about the markdown syntax at `daringfireball <http://daringfireball.net/projects/markdown/syntax>`_. You can learn more about the markdown syntax at `daringfireball <http://daringfireball.net/projects/markdown/syntax>`_.
:guilabel:`Formatting Style: None`
Applies no special formatting to the text, the document is converted to html with no other changes.
Convert PDF documents Convert PDF documents
~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@ -107,10 +107,10 @@ My device is not being detected by |app|?
Follow these steps to find the problem: Follow these steps to find the problem:
* Make sure that you are connecting only a single device to your computer at a time. Do not have another |app| supported device like an iPhone/iPad etc. at the same time. * Make sure that you are connecting only a single device to your computer at a time. Do not have another |app| supported device like an iPhone/iPad etc. at the same time.
* Make sure you are running the latest version of |app|. The latest version can always be downloaded from `http://calibre-ebook.com/download`_. * Make sure you are running the latest version of |app|. The latest version can always be downloaded from `the calibre website <http://calibre-ebook.com/download>`_.
* Ensure your operating system is seeing the device. That is, the device should be mounted as a disk that you can access using Windows explorer or whatever the file management program on your computer is * Ensure your operating system is seeing the device. That is, the device should be mounted as a disk that you can access using Windows explorer or whatever the file management program on your computer is.
* In calibre, go to Preferences->Plugins->Device Interface plugin and make sure the plugin for your device is enabled. * In calibre, go to Preferences->Plugins->Device Interface plugin and make sure the plugin for your device is enabled, the plugin icon next to it should be green when it is enabled.
* If all the above steps fail, go to Preferences->Miscellaneous and click debug device detection with your device attached and post the output as a ticket on `http://bugs.calibre-ebook.com`_. * If all the above steps fail, go to Preferences->Miscellaneous and click debug device detection with your device attached and post the output as a ticket on `the calibre bug tracker <http://bugs.calibre-ebook.com>`_.
How does |app| manage collections on my SONY reader? How does |app| manage collections on my SONY reader?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -441,7 +441,7 @@ menu, choose "Validate fonts".
I downloaded the installer, but it is not working? I downloaded the installer, but it is not working?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Downloading from the internet can sometimes result in a corrupted download. If the |app| installer you downloaded is not opening, try downloading it again. If re-downloading it does not work, download it from `an alternate location <http://sourceforge.net/projects/calibre/files/>`_. If the installer still doesn't work, then something on your computer is preventing it from running. Best place to ask for more help is in the `forums <http://www.mobileread.com/forums/usercp.php>`_. Downloading from the internet can sometimes result in a corrupted download. If the |app| installer you downloaded is not opening, try downloading it again. If re-downloading it does not work, download it from `an alternate location <http://sourceforge.net/projects/calibre/files/>`_. If the installer still doesn't work, then something on your computer is preventing it from running. Try rebooting your computer and running a registry cleaner like `Wise registry cleaner <http://www.wisecleaner.com>`_. Best place to ask for more help is in the `forums <http://www.mobileread.com/forums/usercp.php>`_.
My antivirus program claims |app| is a virus/trojan? My antivirus program claims |app| is a virus/trojan?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@ -478,6 +478,8 @@ Calibre has several keyboard shortcuts to save you time and mouse movement. Thes
- Focus the search bar - Focus the search bar
* - :kbd:`Shift+Ctrl+F` * - :kbd:`Shift+Ctrl+F`
- Open the advanced search dialog - Open the advanced search dialog
* - :kbd:`Esc`
- Clear the current search
* - :kbd:`N or F3` * - :kbd:`N or F3`
- Find the next book that matches the current search (only works if the highlight checkbox next to the search bar is checked) - Find the next book that matches the current search (only works if the highlight checkbox next to the search bar is checked)
* - :kbd:`Shift+N or Shift+F3` * - :kbd:`Shift+N or Shift+F3`
@ -486,6 +488,8 @@ Calibre has several keyboard shortcuts to save you time and mouse movement. Thes
- Download metadata and shortcuts - Download metadata and shortcuts
* - :kbd:`Ctrl+R` * - :kbd:`Ctrl+R`
- Restart calibre - Restart calibre
* - :kbd:`Shift+Ctrl+E`
- Add empty books to calibre
* - :kbd:`Ctrl+Q` * - :kbd:`Ctrl+Q`
- Quit calibre - Quit calibre

View File

@ -21,7 +21,7 @@ This is, inevitably, going to be somewhat technical- after all, regular expressi
Where in |app| can you use regular expressions? Where in |app| can you use regular expressions?
--------------------------------------------------- ---------------------------------------------------
There are a few places |app| uses regular expressions. There's the header/footer removal in conversion options, metadata detection from filenames in the import settings and, since last version, there's the option to use regular expressions to search and replace in metadata of multiple books. There are a few places |app| uses regular expressions. There's the Search & Replace in conversion options, metadata detection from filenames in the import settings and Search & Replace when editing the metadata of books in bulk.
What on earth *is* a regular expression? What on earth *is* a regular expression?
------------------------------------------------ ------------------------------------------------
@ -94,7 +94,7 @@ I think I'm beginning to understand these regular expressions now... how do I us
Conversions Conversions
^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
Let's begin with the conversion settings, which is really neat. In the structure detection part, you can input a regexp (short for regular expression) that describes the header or footer string that will be removed during the conversion. The neat part is the wizard. Click on the wizard staff and you get a preview of what |app| "sees" during the conversion process. Scroll down to the header or footer you want to remove, select and copy it, paste it into the regexp field on top of the window. If there are variable parts, like page numbers or so, use sets and quantifiers to cover those, and while you're at it, remember to escape special characters, if there are some. Hit the button labeled :guilabel:`Test` and |app| highlights the parts it would remove were you to use the regexp. Once you're satisfied, hit OK and convert. Be careful if your conversion source has tags like this example:: Let's begin with the conversion settings, which is really neat. In the Search and Replace part, you can input a regexp (short for regular expression) that describes the string that will be replaced during the conversion. The neat part is the wizard. Click on the wizard staff and you get a preview of what |app| "sees" during the conversion process. Scroll down to the string you want to remove, select and copy it, paste it into the regexp field on top of the window. If there are variable parts, like page numbers or so, use sets and quantifiers to cover those, and while you're at it, remember to escape special characters, if there are some. Hit the button labeled :guilabel:`Test` and |app| highlights the parts it would replace were you to use the regexp. Once you're satisfied, hit OK and convert. Be careful if your conversion source has tags like this example::
Maybe, but the cops feel like you do, Anita. What's one more dead vampire? Maybe, but the cops feel like you do, Anita. What's one more dead vampire?
New laws don't change that. </p> New laws don't change that. </p>
@ -104,7 +104,7 @@ Let's begin with the conversion settings, which is really neat. In the structure
<p class="calibre4"> It had only been two years since Addison v. Clark. <p class="calibre4"> It had only been two years since Addison v. Clark.
The court case gave us a revised version of what life was The court case gave us a revised version of what life was
(shamelessly ripped out of `this thread <http://www.mobileread.com/forums/showthread.php?t=75594">`_). You'd have to remove some of the tags as well. In this example, I'd recommend beginning with the tag ``<b class="calibre2">``, now you have to end with the corresponding closing tag (opening tags are ``<tag>``, closing tags are ``</tag>``), which is simply the next ``</b>`` in this case. (Refer to a good HTML manual or ask in the forum if you are unclear on this point.) The opening tag can be described using ``<b.*?>``, the closing tag using ``</b>``, thus we could remove everything between those tags using ``<b.*?>.*?</b>``. But using this expression would be a bad idea, because it removes everything enclosed by <b>- tags (which, by the way, render the enclosed text in bold print), and it's a fair bet that we'll remove portions of the book in this way. Instead, include the beginning of the enclosed string as well, making the regular expression ``<b.*?>\s*Generated\s+by\s+ABC\s+Amber\s+LIT.*?</b>`` The ``\s`` with quantifiers are included here instead of explicitly using the spaces as seen in the string to catch any variations of the string that might occur. Remember to check what |app| will remove to make sure you don't remove any portions you want to keep if you test a new expression. If you only check one occurrence, you might miss a mismatch somewhere else in the text. Also note that should you accidentally remove more or fewer tags than you actually wanted to, |app| tries to repair the damaged code after doing the header/footer removal. (shamelessly ripped out of `this thread <http://www.mobileread.com/forums/showthread.php?t=75594">`_). You'd have to remove some of the tags as well. In this example, I'd recommend beginning with the tag ``<b class="calibre2">``, now you have to end with the corresponding closing tag (opening tags are ``<tag>``, closing tags are ``</tag>``), which is simply the next ``</b>`` in this case. (Refer to a good HTML manual or ask in the forum if you are unclear on this point.) The opening tag can be described using ``<b.*?>``, the closing tag using ``</b>``, thus we could remove everything between those tags using ``<b.*?>.*?</b>``. But using this expression would be a bad idea, because it removes everything enclosed by <b>- tags (which, by the way, render the enclosed text in bold print), and it's a fair bet that we'll remove portions of the book in this way. Instead, include the beginning of the enclosed string as well, making the regular expression ``<b.*?>\s*Generated\s+by\s+ABC\s+Amber\s+LIT.*?</b>`` The ``\s`` with quantifiers are included here instead of explicitly using the spaces as seen in the string to catch any variations of the string that might occur. Remember to check what |app| will remove to make sure you don't remove any portions you want to keep if you test a new expression. If you only check one occurrence, you might miss a mismatch somewhere else in the text. Also note that should you accidentally remove more or fewer tags than you actually wanted to, |app| tries to repair the damaged code after doing the removal.
Adding books Adding books
^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^

View File

@ -104,12 +104,12 @@ class cmd_commit(_cmd_commit):
def close_bug(self, bug, action, url, config): def close_bug(self, bug, action, url, config):
print 'Closing bug #%s'% bug print 'Closing bug #%s'% bug
nick = config.get_nickname() #nick = config.get_nickname()
suffix = config.get_user_option('bug_close_comment') suffix = config.get_user_option('bug_close_comment')
if suffix is None: if suffix is None:
suffix = 'The fix will be in the next release.' suffix = 'The fix will be in the next release.'
action = action+'ed' action = action+'ed'
msg = '%s in branch %s. %s'%(action, nick, suffix) msg = '%s in branch %s. %s'%(action, 'lp:calibre', suffix)
msg = msg.replace('Fixesed', 'Fixed') msg = msg.replace('Fixesed', 'Fixed')
server = xmlrpclib.ServerProxy(url) server = xmlrpclib.ServerProxy(url)
server.ticket.update(int(bug), msg, server.ticket.update(int(bug), msg,

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More