mirror of
https://github.com/immich-app/immich.git
synced 2025-06-23 15:30:51 -04:00
comparison with arm nn in docs
This commit is contained in:
parent
1785d0916b
commit
678cb8938a
@ -20,6 +20,7 @@ You do not need to redo any machine learning jobs after enabling hardware accele
|
|||||||
- Only Linux and Windows (through WSL2) servers are supported.
|
- Only Linux and Windows (through WSL2) servers are supported.
|
||||||
- ARM NN is only supported on devices with Mali GPUs. Other Arm devices are not supported.
|
- ARM NN is only supported on devices with Mali GPUs. Other Arm devices are not supported.
|
||||||
- Some models may not be compatible with certain backends. CUDA is the most reliable.
|
- Some models may not be compatible with certain backends. CUDA is the most reliable.
|
||||||
|
- Search latency isn't improved by ARM NN due to model compatibility issues preventing its use. However, smart search jobs do make use of ARM NN.
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
@ -34,6 +35,7 @@ You do not need to redo any machine learning jobs after enabling hardware accele
|
|||||||
- The `hwaccel.ml.yml` file assumes the path to it is `/usr/lib/libmali.so`, so update accordingly if it is elsewhere
|
- The `hwaccel.ml.yml` file assumes the path to it is `/usr/lib/libmali.so`, so update accordingly if it is elsewhere
|
||||||
- The `hwaccel.ml.yml` file assumes an additional file `/lib/firmware/mali_csffw.bin`, so update accordingly if your device's driver does not require this file
|
- The `hwaccel.ml.yml` file assumes an additional file `/lib/firmware/mali_csffw.bin`, so update accordingly if your device's driver does not require this file
|
||||||
- Optional: Configure your `.env` file, see [environment variables](/docs/install/environment-variables) for ARM NN specific settings
|
- Optional: Configure your `.env` file, see [environment variables](/docs/install/environment-variables) for ARM NN specific settings
|
||||||
|
- In particular, the `MACHINE_LEARNING_ANN_FP16_TURBO` can significantly improve performance at the cost of very slightly lower accuracy
|
||||||
|
|
||||||
#### CUDA
|
#### CUDA
|
||||||
|
|
||||||
@ -50,12 +52,13 @@ You do not need to redo any machine learning jobs after enabling hardware accele
|
|||||||
|
|
||||||
#### RKNN
|
#### RKNN
|
||||||
|
|
||||||
- You must have a supported Rockchip SoC, only RK3566 and RK3588 are supported at this moment.
|
- You must have a supported Rockchip SoC: only RK3566, RK3576 and RK3588 are supported at this moment.
|
||||||
- Make sure you have the appropriate linux kernel driver installed
|
- Make sure you have the appropriate linux kernel driver installed
|
||||||
- This is usually pre-installed on the device vendor's Linux images
|
- This is usually pre-installed on the device vendor's Linux images
|
||||||
- RKNPU driver V0.9.8 or later must be available in the host server
|
- RKNPU driver V0.9.8 or later must be available in the host server
|
||||||
- You may confirm this by running `cat /sys/kernel/debug/rknpu/version` to check the version
|
- You may confirm this by running `cat /sys/kernel/debug/rknpu/version` to check the version
|
||||||
- Optional: Configure your `.env` file, see [environment variables](/docs/install/environment-variables) for RKNN specific settings
|
- Optional: Configure your `.env` file, see [environment variables](/docs/install/environment-variables) for RKNN specific settings
|
||||||
|
- In particular, setting `MACHINE_LEARNING_RKNN_THREADS` to 2 or 3 can *dramatically* improve performance for RK3576 and RK3588 compared to the default of 1, at the expense of multiplying the amount of RAM each model uses by that amount.
|
||||||
|
|
||||||
## Setup
|
## Setup
|
||||||
|
|
||||||
@ -137,3 +140,12 @@ Note that you should increase job concurrencies to increase overall utilization
|
|||||||
- If you encounter an error when a model is running, try a different model to see if the issue is model-specific.
|
- If you encounter an error when a model is running, try a different model to see if the issue is model-specific.
|
||||||
- You may want to increase concurrency past the default for higher utilization. However, keep in mind that this will also increase VRAM consumption.
|
- You may want to increase concurrency past the default for higher utilization. However, keep in mind that this will also increase VRAM consumption.
|
||||||
- Larger models benefit more from hardware acceleration, if you have the VRAM for them.
|
- Larger models benefit more from hardware acceleration, if you have the VRAM for them.
|
||||||
|
- Compared to ARM NN, RKNPU has:
|
||||||
|
- Wider model support (including for search, which ARM NN does not accelerate)
|
||||||
|
- Less heat generation
|
||||||
|
- Very slightly lower accuracy (RKNPU always uses FP16, while ARM NN by default uses higher precision FP32 unless `MACHINE_LEARNING_ANN_FP16_TURBO` is enabled)
|
||||||
|
- Varying speed:
|
||||||
|
- If `MACHINE_LEARNING_RKNN_THREADS` is at the default of 1, RKNPU will be substantially slower than ARM NN in most cases
|
||||||
|
- If `MACHINE_LEARNING_RKNN_THREADS` is set to 3, it will be somewhat faster than ARM NN at FP32, but somewhat slower than ARM NN if `MACHINE_LEARNING_ANN_FP16_TURBO` is enabled
|
||||||
|
- When other tasks also use the GPU (like transcoding), RKNPU has a significant advantage over ARM NN as it uses the otherwise idle NPU instead of competing for GPU usage
|
||||||
|
- Lower RAM usage if `MACHINE_LEARNING_RKNN_THREADS` is at the default of 1, but significantly higher if greater than 1 (which is necessary for it to fully utilize the NPU and hence be comparable in speed to ARM NN)
|
||||||
|
Loading…
x
Reference in New Issue
Block a user