====== Tensorflow install adventures ====== {{tag>devel deep_learning}} So I've been reading the book **Fundamentals of Deep Learning (2017)** by **Nikhil Buduma and al.** and this made me realize I definitely needs to get started on Deep Learning and Machine Learning in general. I mean, I've already taken a few courses, and I was lucky enough to follow those courses from some of the fathers of machine learning, like **Andrew Ng** and **Geoffrey Hinton**, so I already have a correct understanding of the base concepts at play. I built some networks at some point so I already have some minimal experience too, but then I simply moved to other stuff, so it's high time I get back to it. The idea here is thus to start playing with the MNIST dataset, to experiment with the state of the art deep learning models in TensorFlow. ====== ====== ===== Installing TensorFlow ===== Since **TensorFlow** is available as a Python library it can easily be installed with **pip**: pip install --upgrade tensorflow-gpu After the tensorflow installation is successful, we run a minimal test to check if the library is properly installed: import tensorflow as tf deep_learning = tf.constant('Deep Learning') session = tf.Session() session.run(deep_learning) But of course, this gave me an error: ImportError: DLL load failed: The specified module could not be found. Failed to load the native TensorFlow runtime. The error message wasn't really providing any clear detail, so I tried to investigate this, and found this [[https://github.com/tensorflow/tensorflow/issues/13715|bug report page]], and from that page I downloaded the [[https://gist.github.com/mrry/ee5dbcfdd045fa48a27d56664411d41c|tensorflow_self_check.py]] script and tried to run it: $ nv_call_python tensorflow_self_check.py ERROR: Failed to import the TensorFlow module. WARNING! This script is no longer maintained! ============================================= Since TensorFlow 1.4, the self-check has been integrated with TensorFlow itself, and any missing DLLs will be reported when you execute the `import tensorflow` statement. The error messages printed below refer to TensorFlow 1.3 and earlier, and are inaccurate for later versions of TensorFlow. - Python version is 3.6. - TensorFlow is installed at: D:\Projects\NervSeed\tools\windows\python-3.6\bin\lib\site-packages\tensorflow - Could not load 'cudnn64_5.dll'. The GPU version of TensorFlow requires that this DLL be installed in a directory that is named in your %PATH% environment variable. Note that installing cuDNN is a separate step from installing CUDA, and it is often found in a different directory from the CUDA DLLs. You may install the necessary DLL by downloading cuDNN 5.1 from this URL: https://developer.nvidia.com/cudnn - Could not find cuDNN. So, it would seem that I don't have **cudNN** installed, which is absolutely true :-), but it also seems that this script I used above is not maintained anymore, so it might be a good idea to simply try the tensorflow import as suggested above: $ nv_call_python -i Python 3.6.3 (v3.6.3:2c5fed8, Oct 3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow Traceback (most recent call last): File "D:\Projects\NervSeed\tools\windows\python-3.6\bin\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in from tensorflow.python.pywrap_tensorflow_internal import * File "D:\Projects\NervSeed\tools\windows\python-3.6\bin\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in ... ImportError: DLL load failed: The specified module could not be found. Failed to load the native TensorFlow runtime. See https://www.tensorflow.org/install/errors for some common reasons and solutions. Include the entire stack trace above this error message when asking for help. => This is the same message as in the initial test script used above, and it is certainly not as clear as the self_check script... So let's focus on the hint we got from that one and try to install **cudNN** before trying anything else. cudNN can be downloaded from: https://developer.nvidia.com/rdp/cudnn-download (note that registration on nvidia is required here) => So I downloaded cudNN **v7.4.2 for Cuda 10.0**, which means, I also need to upgrade Cuda itself. CUDA 10.0 can be downloaded from: https://developer.nvidia.com/cuda-downloads Note that for the linux installation we should run the shell command: sudo sh cuda_10.0.130_410.48_linux.run /* Before installing CUDA on windows, we uninstall the previous version: - CUDA Samples 7.5 - CUDA Toolkit 7.5 - CUDA Visual Studio Integration 7.5 On Linux the cuda toolkit will be installed in /usr/local/cuda-#.# On Linux, we must stop the X server before trying to install Cuda. Should not try to install the drivers version 410.48 in the process: this is not working. */ /* Linux Cuda installation summary: =========== = Summary = =========== Driver: Not Selected Toolkit: Installed in /usr/local/cuda-10.0 Samples: Installed in /usr/local/cuda-10.0/samples Please make sure that - PATH includes /usr/local/cuda-10.0/bin - LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA. WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 10.0 functionality to work. To install the driver using this installer, run the following command, replacing with the name of this run file: sudo .run -silent -driver Logfile is /tmp/cuda_install_1899.log */ ==== cudNN installation on linux ==== cudNN install instructions can be found on this page: https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html Following the instructions from the link above we do: $ tar xvzf /mnt/array1/softwares/Dev/cudnn-10.0-linux-x64-v7.4.2.24.tgz $ sudo cp cuda/include/cudnn.h /usr/local/cuda/include $ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 $ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn* Then we should ensure that the cudNN library is properly installed, so we try to build the test MNIST sample as suggested: The cudNN samples package can be downloaded as a .deb file for Ubuntu 14.04 from the download site mentioned above. But since we didn't use the .deb file for the library installation, this package will give us an error, and leave the doc package unconfigured (which is not critical from my perspective) $ cp -r /usr/src/cudnn_samples_v7 /home/kenshin/build/ $ cd cudnn_samples_v7/mnistCUDNN/ $ make clean && make $ export LD_LIBRARY_PATH=/usr/local/cuda/lib64/ $ ./mnistCUDNN => And we get the **test passed!** result as expected. The cudNN installation on windows is similar: we simply need to copy the files extracted from the zip in the CUDA bin/include/bin folders. ==== Testing TensorFlow again ==== Both on Windows and Linux we obviously should have the CUDA/cudNN libraries available for TensorFlow, so we need to update our path accordingly: * On Windows, we add to our path the bin folder, for instance in my case: **D:\Apps\Cuda-10.0\bin** * On Linux we would change the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=/usr/local/cuda/lib64/ => But... this will give me exactly the same error when trying to execute the **import tensorflow** statement in a simple python script :-(. So, it now rather seems that tensorflow depends on a specific version of CUDA and cudNN. In the process I also tried to install the **tensorflow** package instead of **tensorflow-gpu**, only to get an error also when trying to import the module: $ nv_call_python -i Python 3.6.3 (v3.6.3:2c5fed8, Oct 3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf 2018-12-28 20:33:47.357524: F tensorflow/python/lib/core/bfloat16.cc:675] Check failed: PyBfloat16_Type.tp_base != nullptr So, it seems that training our little MNIST network will have to wait a little longer, and I will first have to clarify how to setup tensorflow properly with CUDA 10.0 and cudNN 7.4.2, which are the current versions I have installed. => Actually I found [[https://mc.ai/install-tensorflow-gpu-with-cuda-10-0-and-cudnn-7-4-for-python-on-windows-10/|this tutorial]] on how to build tensorflow from sources on windows: maybe I won't have the choice and will have to go this way ? /* Other link for build from sources: https://www.tensorflow.org/install/source_windows */ And of course, now I just found [[https://www.tensorflow.org/install/source#tested_build_configurations|this page]]: listing the valid combinations of TensorFlow/CUDA/cudNN versions: and thus version 1.12.0 is meant for CUDA v9 + cudNN v7. Maybe I should simply install cuda v9 then instead of going into building from sources...