Pre-process not post-process

This commit is contained in:
Sophist 2017-07-08 12:39:51 +01:00 committed by GitHub
parent 6da24d32a0
commit ab24e12d80

View File

@ -236,7 +236,7 @@ It offers a unique blend of humour, social and political observations and invest
return self.page_index
def postprocess_html(self, soup, first):
def preprocess_html(self, soup):
for figure in soup.findAll('a', attrs = {'href' : lambda x: x and ('jpg' in x or 'png' in x or 'gif' in x)}):
# makes sure that the link points to the absolute web address
if figure['href'].startswith('/'):