The previous implementation for determining the description of an engine did not
take into account that the UI languages can also have a region tag and/or a
script tag:
el-GR: Ελληνικά, Ελλάδα (Greek, Greece)
fa-IR: فارسی, ایران (Persian, Iran)
nb-NO: Norsk bokmål, Norge (Norwegian bokmål, Norway)
nl-BE: Nederlands, België (Dutch, Belgium)
pt-BR: Português, Brasil (Portuguese, Brazil)
zh-HK: 中文, 中國香港特別行政區 (Chinese, Hong Kong SAR China)
zh-Hans-CN: 中文, 中国 (Chinese, China)
zh-Hant-TW: 中文, 台灣 (Chinese, Taiwan)
Closes: https://github.com/searxng/searxng/issues/4842
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Sometimes (e.g. when ddg does not have a result item) there is no content and
the engine will fail with an IndexError:
* Error: IndexError
* Percentage: 10
* Parameters: `()`
* File name: `searx/engines/duckduckgo.py:375`
* Function: `response`
* Code: `item["content"] = extract_text(eval_xpath(div_result, './/a[contains(@class, "result__snippet")]')[0])`
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Changes:
- Add trailing slash to base URL to prevent potential redirects
- Remove advanced search syntax filtering (no longer guarantees a CAPTCHA)
- Correct pagination offset calculation: Page 2 now starts at offset 10,
subsequent pages use 10 + (n-2)*15 formula instead of the previous
broken 20 + (n-2)*50 calculation that caused CAPTCHAs
- Restructure request parameter building to better match a real request
- "kt" cookie is no longer an empty string if the language/region is "all"
- Group related parameter assignments together
- Add header logging to debugging output
Related:
- https://github.com/searxng/searxng/issues/4824
There is currently no known z-library, and all known URLs are dead [1]. To avoid
interrupting automated updates, a connection error to a z-library is treated as
a *known error*, and the old properties of the z-library are retained.
[1] https://github.com/searxng/searxng/issues/3610
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In the previous implementation, all databases were loaded into memory when
importing the searx.data package, regardless of whether they were ever needed.
Regardless of this, it is an antipattern to load entire databases into memory
when importing a package or module; databases should be loaded when needed.
Lazy loading is a first step toward improving memory usage and also improves
performance when setting up the runtime environment. Building on this,
subsequent PRs will be able to further optimize memory behavior, e.g., by using
a real database application such as the one already available via
searx.cache.ExpireCache
Related:
- https://github.com/searxng/searxng/discussions/1892
- https://github.com/searxng/searxng/pull/3458
- https://github.com/searxng/searxng/pull/4650
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Temporarily remove the -e flag from set to prevent entrypoint.sh from stopping execution if any command returns a non-zero status. This doesn't solve anything but relaxes the script checks.
Related https://github.com/searxng/searxng/issues/4818
- apparently the API now requires a `X-Pinterest-PWS-Handler` in order to
properly function (extracted from their web UI)
- the other `X-Pinterest` headers here are added in case they become mandatory
too
Closes: https://github.com/searxng/searxng/issues/4812
Icons category makes sense because it allows to quickly search for free SVG
icons to use for websites / other designs with a quick `!icons` query
Icons don't seem to fit into the normal images category that well because icons
are quite a special type of images
The global variable CACHE is not initialized when DDG images or DDG videos
import the get_vqd() function (please remember: the engine modules are imported
using the importlib method and not via the `import` keyword).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
That entrypoint is prone to screw things up, especially with permission handling. The new script handles initialization better and fixes some issues like delayed settings update via ENVs and timestamp overwriting, also adjusts what should be copied into the container.
Related https://github.com/searxng/searxng/pull/4721#issuecomment-2850272129
The wolfi-base metapackage includes busybox, ca-certificates-bundle and the package manager. The change is to make the use of base-builder image more flexible.
Instead of using Wolfi base images from cgr.dev and making that mess on the Dockerfile, why don't we build the base images ourselves from Wolfi repos with apko? The intention of this is to simplify the main Dockerfile and avoid having to patch the base image every time, it also simplifies some steps like image ownership management and provides extremely fast builds.
Wolfi OS images are specifically designed for container use. Using a specially designed base image for containers not only reduces maintenance burdens, but improves overall experience for developers (fewer packages we have to track) and end users (smaller images).
Discussion here: https://github.com/searxng/searxng/issues/4753
If the workflow is executed with the "workflow_dispatch" trigger, the user who executed the workflow becomes the author of the commit on the PR, this is not intended.
It also reverts the body param so that the default text of the action does not appear.
`checker.yml` and `integration.yml` are the only workflows that are currently safe to be executed simultaneously, the others present a risk that the order of completion may not be expected. The ones that are chained from `integration.yml` can be called as many times as `integration.yml` workflows are running at that moment, the same with the trigger "workflow_dispatch".
This can be fatal for workflows like `container.yml` that use a centralized cache to store and load the candidate images in a common tag called "searxng-<arch>".
* For example, a `container.yml` workflow is executed after being chained from `integration.yml` (called "~1"), and seconds later it may be triggered again because another PR merged some breaking changes (called "~2"). While "~1" has already passed the test job successfully and is about to start the release job, "~2" finishes building the container and overwrites the references on the common tag. When "~1" in the release job loads the images using the common tag, it will load the container of "~2" instead of "~1" having skipped the whole test job process.
The example is only set for the container workflow, but the other workflows might occur in a similar way.
Currently, we have 1100~ cache images uploaded to GHCR that weigh more than 300 MB each (most of them are layers from the second phase of the Dockerfile that were uploaded by mistake, read below). To avoid problems, I have set up a new job in a new workflow to be run weekly purging all images older than 1 week, but leaving always the 100 most recent ones.
Only the builder images should be uploaded to cache, the actual behaviour not only slows down the time for building the container, but also wastes lots of space by saving large and useless layers to GHCR that will never be used again.
When making the container rework, I unknowingly deleted the section where an env with the same name as the secret was defined on the job scope, making it look like it was originally defined as an organization env.
Since we can't validate the secrets in a condition directly, it's better to let docker/login-action take care of failing the entire job if the credentials are invalid.
Reported in: https://github.com/searxng/searxng/issues/4777
Chapter on the purpose of (git) commits
The commits and their messages are elementary for the traceability of changes
and are unfortunately still too often given too little attention.
It therefore seems necessary to dedicate a chapter to this topic in the context
of development.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
env.DOCKERHUB_USERNAME shouldn't be an empty string as it's defined and set (I think, I can't see this). Even if wasn't defined, GitHub Org/Repo wide envs/secrets should return an empty string (?)
container.yml will run after integration.yml COMPLETES successfully and in master branch.
Style changes, cleanup and improved integration with CI by leveraging the use of
shared cache between all workflows.
* Podman is now supported to build the container images (Docker also received a refactor, merging both build and buildx)
* Container images are being built by Buildah instead of Docker BuildKit.
* Container images are tested before release.
* Splitting "modern" (amd64 & arm64) and "legacy" (armv7) arches on different Dockerfiles allowing future optimizations.
l10n.yml will run after integration.yml finishes successfully (will defer anything depending on integration.yml until heavy loads like container building are moved to separate workflows) and in master branch.
* After every integration.yml workflow completes successfully, only the `update` job runs.
* Dispatch and Crontab triggers only the `pr` job.
Style changes, cleanup and improved integration with CI by leveraging the use of shared cache between all workflows (not functional until all workflows have been refactored).
Instead of executing the workflow after integration.yml completes correctly, let's run this workflow parallel to integration.yml restoring the original behaviour.
Starting with ionicons-8.0.8 the SVG already contains a class attribute and
instaed of using SVGO plugin ``addAttributesToSVGElement`` we habve to use
``addClassesToSVGElement`` to add out ``__jinja_class_placeholder__``.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>