Yandex engine will return parsing error instead of informing that a CAPTCHA was found. It is confusing for the admin and the users (#5415).
This patch fixes an issue where the CAPTCHA response from Yandex wouldn't be detected, resulting in `ParserError` when trying to parse the response to DOM.
In this fix, I replaced the url condition and instead is checking if the `x-yandex-captcha` header is set, and is equal to `captcha`.
Alternatively, maybe something like `resp.headers.get('Location', '').startswith("https://yandex.com/showcaptcha")` could be done instead. Lastly, setting `params['allow_redirects'] = True` can also work, but this will waste an extra request. Just let me know.
Closes: https://github.com/searxng/searxng/issues/5415
SourceHut uses a foss bot protection tool called `go-away` (which I can
recommend BTW). It blocks common crawler user agents, such as the standard
Firefox user agent. Hence, we're now using our custom SearXNG user agent to
clarify we're not a crawler.
Closes: https://github.com/searxng/searxng/issues/5270
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Fixes an issue where startpage engine would display parsing error
(`json.decoder.JSONDecodeError`) when returning CAPTCHA redirect page.
The fix simply checks if response header has `Location` set, and if it starts
with `https://www.startpage.com/sp/captcha`, it will raise a CAPTCHA exception
before trying to parse the data.
The query argument for URLs like:
- 'http://example.org?q=' --> query_str is 'q='
- 'http://example.org?/foo/bar' --> query_str is 'foo/bar'
is a *simple string* and not a key/value dict. This string may only be removed
from the URL if one of the patterns matches.
BTW get_pretty_url(): keep such a *simple string* in the path element.
Closes: https://github.com/searxng/searxng/issues/5299
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This PR adds a new result type: File
Python class: searx/result_types/file.py
Jinja template: searx/templates/simple/result_templates/file.html
CSS (less) client/simple/src/less/result_types/file.less
Class 'File' (singular) replaces template 'files.html' (plural). The renaming
was carried out because there is only one file (singular) in a result. Not to be
confused with the category 'files' where in multiple results can exist.
As mentioned in issue [1], the class '.category-files' was removed from the CSS
and the stylesheet was adopted in result_types/file.less (there based on the
templates and no longer based on the category).
[1] https://github.com/searxng/searxng/issues/5198
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The size of the full-size images from ``thumbnail.url`` is usually several
MB. By reducing the full-size image to 80 pixels, the data size for a thumb is
reduced from MB to a few KB.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
It doesn't matter if you're using Mullvad's VPN and a proper browser, you'll
still get blocked for specific searches [1] with a 403 or 429 HTTP status code.
Mullvad only blocks the search request and doesn't prevent you from doing more
searches.
The logic should handle the blocked requests (403, 429), but not put the engine
on a cooldown.
[1] https://leta.mullvad.net/search?q=site%3Afoo+bar&engine=brave
Closes: https://github.com/searxng/searxng/issues/5328
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Apparently, in China, Bing redirects from `www.bing.com` to `cn.bing.com`.
So in order to make Bing work for chinese users by default, we have to follow that redirect.
related: https://github.com/searxng/searxng/issues/5243
Adds a new engine `searx/engines/azure.py` to search cloud resources on Azure.
A lot of enterprise users have to deal with Azure Public Cloud. This helps them
easily search for cloud resources without logging in to the Portal first
How to test this PR locally?
You should create an App Registration on Azure Entra Id with Reader access on
the resources you want to search for. You should create a Secret for the App
Registration. After that, you should set up appropriate values in the
`settings.yml` file [1]::
- name: azure
engine: azure
...
azure_tenant_id: "your_tenant_id"
azure_client_id: "your_client_id"
azure_client_secret: "your_client_secret"
azure_token_expiration_seconds: 5000
[1] https://github.com/searxng/searxng/pull/5235#issuecomment-3397664928
Co-authored-by: Bnyro <bnyro@tutanota.com>
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
The class method ``Compass.point`` is converted into an instance method to
circumvent the problem described in [1] (without understanding the cause).
[1] https://github.com/searxng/searxng/issues/5304#issuecomment-3394140820
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Previously, when using a search url copied from the cookies tab, clicking
at the settings icon at the top right would show the browser preferences
and not the preferences that were set and used with the search url.
Please see https://github.com/searxng/searxng/issues/5227 for more information.
To test:
- change some preferences
- copy the preferences search url in the settings' cookies tab
- reset the preferences or clear cookies
- paste the copied search url into the search bar to search for something
- press the settings icon
- you can now see/preview the actual settings that were used for the search
- by pressing 'save', you can keep these preferences
closes#5227
The settings are currently an untyped key/value structure, whose types are
dynamically built at runtime. The construction process of this structure
is *hand-crafted*.
In the long term, we want a static typing of this structure, based on a standard
tool. The ``msgspec.Struct`` structures are suitable as a standard tool.
This patch makes a first step towards static typing and implements the "brand"
section using ``msgspec.Struct`` structures.
BTW: searx/settings_defaults.py - ``git_url`` and ``git_branch`` had been
removed in aee613d256, this is a leftover.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The Name of the option is *disable_family_filter* -> we have to reverse the
meaning of the ascending safe-search filter level.
Closes: https://github.com/searxng/searxng/issues/5287
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Starting with Python 3.14 msgspec reports::
File "/share/searxng/searx/weather.py", line 261, in <module>
class Temperature(msgspec.Struct, kw_only=True):
...<60 lines>...
return template.format(value=val_str, unit=unit)
TypeError: Using a non-empty mutable collection (['°C', '°F', 'K']) \
as a default value is unsafe.\
Instead configure a `default_factory` for this field.
The problem is solved by the fact that there are now global constants for the
units (BTW singular/plural names of the type definitions are fixed):
- TEMPERATURE_UNITS
- PRESSURE_UNITS
- WIND_SPEED_UNITS
- RELATIVE_HUMIDITY_UNITS
- COMPASS_POINTS
- COMPASS_UNITS
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In the case of .. response, for example, an HTTP 302 is returned by Google
Scholar::
Our systems have detected unusual traffic from your computer
network. Please try again later.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
When installing SearXNG (e.g.):
pip install --use-pep517 --no-build-isolation -e .
An import exception is raised:
ModuleNotFoundError: No module named 'lxml'
The ``setup.py`` file imports ``searx``, which in turn triggers various other
imports. However, the name XPath is only needed for type checking.
Closes: https://github.com/searxng/searxng/issues/5177
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>