Bonjour,
Je souhaite utiliser tensorflow2.0, mais après son installation, le module ne fonctionne pas correctement. J'ai suivi l'installation ici :
https://www.tensorflow.org/install/pip
Lors de la vérification voici le résultat :
(env) rouyrrerodolphe@adm1:~$ python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
2020-11-28 11:02:22.971809: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-11-28 11:02:23.043239: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-11-28 11:02:23.043280: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (adm1): /proc/driver/nvidia/version does not exist
2020-11-28 11:02:23.043776: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-11-28 11:02:23.050347: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2304000000 Hz
2020-11-28 11:02:23.050583: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f6468000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-28 11:02:23.050601: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
tf.Tensor(-1710.3865, shape=(), dtype=float32)
L'erreur à la deuxième ligne me laisse penser que mon installation de nvidia/cuda est défectueuse.
La commande nvidia-smi ne fonctionne pas non plus :
rouyrrerodolphe@adm1:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Le problème vient certainement du driver car il n'apparait pas avec la commande ubuntu-drivers devices :
rouyrrerodolphe@adm1:~$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0 ==
modalias : pci:v00008086d000024F3sv00008086sd00000050bc02sc80i00
vendor : Intel Corporation
model : Wireless 8260
manual_install: True
driver : backport-iwlwifi-dkms - distro free
Mon secure boot est désactivé, j'ai déjà essayé de désintaller et de réinstaller cuda et nvidia. Je ne pouvais pas intallé le pilote propriétaire à partir du dépôt officiel et je n'arrivais pas non plus avec le ppa donc il me semble l'avoir fait via le site de nvidia en suivant ce tuto :
https://www.tensorflow.org/install/gpu?hl=fr
Quelques retours utiles :
Carte graphique :
rouyrrerodolphe@adm1:~$ lspci -vnn | grep -A 12 '\''[030[02]\]' | grep -Ei "vga|3d|display|kernel"
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 530 [8086:191b] (rev 06) (prog-if 00 [VGA controller])
Kernel driver in use: i915
Kernel modules: i915
Cuda version :
rouyrrerodolphe@adm1:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
Nvidia version :
rouyrrerodolphe@adm1:~$ whereis nvidia
nvidia: /usr/lib/x86_64-linux-gnu/nvidia /usr/lib/nvidia /usr/share/nvidia /usr/src/nvidia-455.45.01/nvidia
Noyau linux :
rouyrrerodolphe@adm1:~$ uname -a
Linux adm1 4.15.0-041500-generic #201802011154 SMP Thu Feb 1 11:55:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Paquets de nvidia :
rouyrrerodolphe@adm1:~$ dpkg -l | grep nvidia
ii libnvidia-cfg1-455:amd64 455.45.01-0ubuntu1 amd64 NVIDIA binary OpenGL/GLX configuration library
ii libnvidia-common-455 455.45.01-0ubuntu1 all Shared files used by the NVIDIA libraries
rc libnvidia-compute-440:amd64 440.100-0ubuntu0.18.04.1 amd64 NVIDIA libcompute package
rc libnvidia-compute-450:amd64 450.80.02-0ubuntu1 amd64 NVIDIA libcompute package
ii libnvidia-compute-455:amd64 455.45.01-0ubuntu1 amd64 NVIDIA libcompute package
ii libnvidia-container-tools 1.3.0-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.3.0-1 amd64 NVIDIA container runtime library
ii libnvidia-decode-455:amd64 455.45.01-0ubuntu1 amd64 NVIDIA Video Decoding runtime libraries
ii libnvidia-encode-455:amd64 455.45.01-0ubuntu1 amd64 NVENC Video Encoding runtime library
ii libnvidia-extra-455:amd64 455.45.01-0ubuntu1 amd64 Extra libraries for the NVIDIA driver
ii libnvidia-fbc1-455:amd64 455.45.01-0ubuntu1 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-gl-455:amd64 455.45.01-0ubuntu1 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii libnvidia-ifr1-455:amd64 455.45.01-0ubuntu1 amd64 NVIDIA OpenGL-based Inband Frame Readback runtime library
rc nvidia-compute-utils-450 450.80.02-0ubuntu1 amd64 NVIDIA compute utilities
ii nvidia-compute-utils-455 455.45.01-0ubuntu1 amd64 NVIDIA compute utilities
ii nvidia-container-runtime 3.4.0-1 amd64 NVIDIA container runtime
ii nvidia-container-toolkit 1.3.0-1 amd64 NVIDIA container runtime hook
rc nvidia-dkms-450 450.80.02-0ubuntu1 amd64 NVIDIA DKMS package
ii nvidia-dkms-455 455.45.01-0ubuntu1 amd64 NVIDIA DKMS package
ii nvidia-docker2 2.5.0-1 all nvidia-docker CLI wrapper
ii nvidia-driver-455 455.45.01-0ubuntu1 amd64 NVIDIA driver metapackage
rc nvidia-kernel-common-450 450.80.02-0ubuntu1 amd64 Shared files used with the kernel module
ii nvidia-kernel-common-455 455.45.01-0ubuntu1 amd64 Shared files used with the kernel module
ii nvidia-kernel-source-455 455.45.01-0ubuntu1 amd64 NVIDIA kernel source package
ii nvidia-machine-learning-repo-ubuntu1804 1.0.0-1 amd64 nvidia-machine-learning repository configuration files
ii nvidia-modprobe 455.45.01-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files
ii nvidia-prime 0.8.8.2 all Tools to enable NVIDIA's Prime
ii nvidia-settings 455.45.01-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver
ii nvidia-utils-455 455.45.01-0ubuntu1 amd64 NVIDIA driver support binaries
ii xserver-xorg-video-nvidia-455 455.45.01-0ubuntu1 amd64 NVIDIA binary Xorg driver
Autre chose surprenante, la commande suivante ne marche plus depuis ma réinstallation de nvidia alors qu'elle fonctionnait très bien avant :
rouyrrerodolphe@adm1:~$ dpkg -l *nvidia* | grep ^ii
dpkg-query: aucun paquet ne correspond à nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
dpkg-query: aucun paquet ne correspond à nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb.1
rouyrrerodolphe@adm1:~$ sudo apt-get remove ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
Lecture des listes de paquets... Fait
Construction de l'arbre des dépendances
Lecture des informations d'état... Fait
Note : sélection de « nvidia-machine-learning-repo-ubuntu1804 » au lieu de « ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb »
nvidia-machine-learning-repo-ubuntu1804 est déjà la version la plus récente (1.0.0-1).
0 mis à jour, 0 nouvellement installés, 0 à enlever et 16 non mis à jour.
rouyrrerodolphe@adm1:~$ dpkg -l *nvidia* | grep ^ii
dpkg-query: aucun paquet ne correspond à nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
dpkg-query: aucun paquet ne correspond à nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb.1