[fix] yandex engine: capture captcha from header instead of url path (#5417)

Yandex engine will return parsing error instead of informing that a CAPTCHA was found. It is confusing for the admin and the users (#5415).


This patch fixes an issue where the CAPTCHA response from Yandex wouldn't be detected, resulting in `ParserError` when trying to parse the response to DOM.

In this fix, I replaced the url condition and instead is checking if the `x-yandex-captcha` header is set, and is equal to `captcha`.

Alternatively, maybe something like `resp.headers.get('Location', '').startswith("https://yandex.com/showcaptcha")` could be done instead. Lastly, setting `params['allow_redirects'] = True` can also work, but this will waste an extra request. Just let me know.

Closes: https://github.com/searxng/searxng/issues/5415
This commit is contained in:
Aadniz 2025-11-06 07:00:48 +01:00 committed by GitHub
parent 1be19f8b58
commit b1918dd121
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -35,7 +35,7 @@ content_xpath = './/div[@class="b-serp-item__content"]//div[@class="b-serp-item_
def catch_bad_response(resp): def catch_bad_response(resp):
if resp.url.path.startswith('/showcaptcha'): if resp.headers.get('x-yandex-captcha') == 'captcha':
raise SearxEngineCaptchaException() raise SearxEngineCaptchaException()