brave web:
xpath selectors needed to be justified
brave images & videos:
The JS code with the JS object was read incorrectly; not always, but quite
often, it led to exceptions when the Python data structure was created from it.
BTW: A complete review was conducted and corrections or additions were made to
the type definitions.
To test all brave engines in once::
!br !brimg !brvid !brnews weather
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch is based on PR #2792 (old PR from 2023)
- js_obj_str_to_python handle more cases
- bring tests from chompjs ..
- comment out tests do not pass
The tests from chompjs give some overview of what is not implemented.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- The three Yandex engines should use the same network context.
- There is no reason to set these engines inactive
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The code injection and monkey patching examine the names in the module of the
engine; if a variable there starts without an underscore and has the value None,
then this variable needs to be configured. This outdated concept does not fit
engines that may have multiple URLs. At least not as long as the value of the
base URL (list) is None.
The default is now an empty list instead of None
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- if engine load fails, set the engine to inactive
- dont' load a engine, when the config says its inactive
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The requests changed here all run outside of the network context timeout,
thereby preventing the engine's timeout from being applied (the engine's timeout
can become longer than it was configured).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The JS string, whose encoding will be corrupted if all single quotes (followed
by a comma) are replaced with double quotes. Bug was introduced in PR #4573.
Here is a simple example in which the list get corrupted::
>>> s = r"""[ 'foo\'', 'bar']"""
>>> print(s)
[ 'foo\'', 'bar']
>>> print(s.replace("',", "\","))
[ 'foo\'", 'bar']
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Presearch responds with a Cloudflare captcha on each request when using HTTP2.
Using HTTP1.1, everything seems to work fine.
- other engines with the same issue: pixabay, uxwing
- closes https://github.com/searxng/searxng/issues/5438
The results from the recoll engine were not displaying the usual
toggle for showing media previews. After the changes described bellow,
the toggle is displayed and works as expected.
In the JSON returned by recoll-webui, the field containing the
mimetype is actually `mtype`, not `mime`.
Furthermore, according to the documentation for the `File` class in
`searx/result_types/file.py`, `embedded` should contain the URL to the
media itself. The embedding of the media into the page for preview is
done in `searx/templates/simple/result_templates/file.html`.
- official website: https://devicon.dev/
- the engine contains a lot of icons of popular software frameworks (e.g. pytest),
so they could for example be useful for visualizing a diagram of the tech stack used in an app
Yandex engine will return parsing error instead of informing that a CAPTCHA was found. It is confusing for the admin and the users (#5415).
This patch fixes an issue where the CAPTCHA response from Yandex wouldn't be detected, resulting in `ParserError` when trying to parse the response to DOM.
In this fix, I replaced the url condition and instead is checking if the `x-yandex-captcha` header is set, and is equal to `captcha`.
Alternatively, maybe something like `resp.headers.get('Location', '').startswith("https://yandex.com/showcaptcha")` could be done instead. Lastly, setting `params['allow_redirects'] = True` can also work, but this will waste an extra request. Just let me know.
Closes: https://github.com/searxng/searxng/issues/5415
SourceHut uses a foss bot protection tool called `go-away` (which I can
recommend BTW). It blocks common crawler user agents, such as the standard
Firefox user agent. Hence, we're now using our custom SearXNG user agent to
clarify we're not a crawler.
Closes: https://github.com/searxng/searxng/issues/5270
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Fixes an issue where startpage engine would display parsing error
(`json.decoder.JSONDecodeError`) when returning CAPTCHA redirect page.
The fix simply checks if response header has `Location` set, and if it starts
with `https://www.startpage.com/sp/captcha`, it will raise a CAPTCHA exception
before trying to parse the data.