mirror of
https://github.com/kovidgoyal/calibre.git
synced 2025-07-09 03:04:10 -04:00
recipes: lwn_weekly: improve table handling
Site uses table layout a lot, both for page formatting and within article's text, yet we clean up all tags before & after article text, and remove what's left from tables in-between, also removing useful tables often embedded within articles. The better way seems to keep only parts we actually interested about: PageHeadline (article's title) and ArticleText and not linearize table within ArticleText tag, thus preserving useful tables. Signed-off-by: Sergiy Kibrik <sakib@meta.ua>
This commit is contained in:
parent
29fd4d5b2e
commit
c98eb806f5
@ -30,8 +30,7 @@ class WeeklyLWN(BasicNewsRecipe):
|
||||
# masthead_url = 'http://lwn.net/images/lcorner.png'
|
||||
publication_type = 'magazine'
|
||||
|
||||
remove_tags_before = dict(attrs={'class':'PageHeadline'})
|
||||
remove_tags_after = dict(attrs={'class':'ArticleText'})
|
||||
keep_only_tags = [dict(attrs={'class':['PageHeadline','ArticleText']})]
|
||||
remove_tags = [dict(name=['h2', 'form'])]
|
||||
|
||||
preprocess_regexps = [
|
||||
@ -40,7 +39,6 @@ class WeeklyLWN(BasicNewsRecipe):
|
||||
]
|
||||
|
||||
conversion_options = {
|
||||
'linearize_tables' : True,
|
||||
'no_inline_navbars': True,
|
||||
}
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user