Recently the school gave a server account for training neural network use, the server itself is configured as a ten-way titan V, and then installed tensorflow2.2 on it, the corresponding python version is 3.6.2, after installing the .is_gpu_available() to see if it is possible to invoke the gpu, the result of the return result is false, specific The result is false, as follows:
Here tensorflow should have detected the gpu, but because some libraries can not be opened and cause tensorflow can not be called, returned false, detailed view of the error message can be seen in one line:
You can see that the above files are opened successfully, but the last .7 file can not be opened, not such file or directory. cuda is suspected to have a problem, the server itself is installed cuda10.1, and tensorflow2.2 should match, but has not been able to call, so at first I want to reinstall cuda So at first I wanted to reinstall cuda to overwrite the server's original cuda, after I downloaded the installation package, because I am not an administrator, I do not have root privileges, so it always fails. But during the installation process, I learned that the system's cuda installation directory is located under /usr/local/cuda, and this .7 is supposed to be a library file, so it should be placed under the cuda installation directory, specifically under /usr/local/cuda/lib64, which had to be downloaded before when installing cuda on my windows local machine. Then copy the file under the corresponding directory of cuda, I suspect that there is something wrong with this .7 file under the lib64 directory, because the .7 file is visible in the linux version of cudnn.
So, I opened the lib64 directory to find out if there is a .7 file, and the result is that I didn't find this file, which is strange, there is no cudnn file under the cuda10.1 directory, and I don't have permission to modify /usr/local, so I thought that since this file is missing, is it a good idea to load the corresponding file in another directory, and guide the tensorflow to look for this .7 file in another directory that I can manipulate, and that will solve the problem? I wget download cudnn with an experimental mindset, and it turns out that I can't download cudnn at the command line, because cudnn requires a login to download, and there's no way to download it at the command line. So I downloaded the linux version of cudnn on my local machine, and then used the scp command to send the file to the server, and unpacked it out, I can see that there is a .7 under ~/cudnn/cuda/lib64. The next step is to add environment variables, so that tensorflow not only looks for the file under /usr/local/cuda/lib64, but also can look for them in my directory, adding commands:
export PATH=$PATH:/usr/local/cuda-10.1/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.1/lib64 export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda-10.1/lib64
These lines add the libraries for the system cuda
Then add:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/users/cudnn/cuda/lib64 export LIBRARY_PATH=$LIBRARY_PATH:/home/users/cudnn/cuda/lib64
Here /home/users/cudnn/cuda/lib64 depending on your own .7 file directory, and finally the update:
source /etc/profile
If tensorflow is following in the anaconda virtual environment, executing these commands will automatically exit the virtual environment. Remember to re-enter:
source activate environmental name
At this point, reenter python, import tensorflow, and run .is_gpu_available():
At this point it was already showing that the .7 file had been successfully opened, indicating that tf had successfully found the file according to the directory I provided, and after opening these library files, True was returned at the bottom as well:
Another method .list_physical_devices('GPU') can also be used to see the currently available gpu:
All ten gpu's are showing.
Note that adding these commands is only useful on the current connection; if you disconnect from the server and reconnect, you will need to re-enter these commands.
This method is just as a reference, it happens to be on tensorflow2.2 this .7 file can not be opened, so experimental try, the result is successful. If the same problem occurs on other machines, this method may not be able to solve it, but just to provide an idea. In tensorflow2.1, the same gpu can not call the problem, but the error message printed not only .7 file can not be opened, there are several other files can not be opened, these files are basically the beginning of the lib, you can check whether these files are in the lib64 directory of the cuda, if you find these files, it may be the environment variables are set wrong, you can If you can find these files, it is possible that the environment variables are set incorrectly, so you can try the commands above:
export PATH=$PATH:/usr/local/cuda-10.1/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.1/lib64 export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda-10.1/lib64
If you can't find them, then you can try to do what I did, which is to first download these files on top of the machine, add environment variables to the system to the directory that corresponds to these files, and direct tf to them. Of course, this is just a guess, tensorflow2.1 and 2.2 should use cuda10.1, but I'm not sure why 2.2 only one file can't be opened, while 2.1 has several files that can't be opened, and on version 1.9, since 1.9 seems to use a different version of cuda than version 10.1, there are more reasons for the error, so it's important to differentiate here.
To this point this article on the detailed version of the inability to call gpu a solution to this article, more related to the inability to call gpu content please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!