As security section has no URLs in article titles, findNext() boldly returns whatever
next link is encounered after the anchor. This leads to downloading and including in generated
document of heavy CVE reports, as links to them usually placed after the article title.
Instead we'd better search under anchor tag only, this way we'll filter useful articles' links.
More resilient behavior if article title contains non-html-encoded
characters. E.g. "<Programming>" string in the title start is recognized
by beatiful soup as html tag, but it is not, making the title a
NoneType, and raising an exception on Article creation.
This is actually a bug in the news page itself, but it anyways breaks
the recipe; hence, adding here a fixed string if for any reason the
article title end up with a NoneType.
Site uses table layout a lot, both for page formatting
and within article's text, yet we clean up all tags
before & after article text, and remove what's left
from tables in-between, also removing useful tables
often embedded within articles. The better way seems
to keep only parts we actually interested about:
PageHeadline (article's title) and ArticleText and
not linearize table within ArticleText tag, thus
preserving useful tables.
Signed-off-by: Sergiy Kibrik <sakib@meta.ua>
Fill authors names in articles headers with bg color
to separate them from article body. The same for
quiz boxes, plus outline them with border line,
as they're put directly within article's text.
Signed-off-by: Sergiy Kibrik <sakib@meta.ua>
Quotes and block quotes are widely used in LWN articles and
distinguished from other text by quotation marks and color.
Grayscale displays of ebook readers can't highlight them with
color, so change text style to italic for better reading experience.
Signed-off-by: Sergiy Kibrik <sakib@meta.ua>