首页手记来自官方文档的Ubuntu 16.04 +...

来自官方文档的Ubuntu 16.04 + tensorflow-GPU 配置

标签：

Python 深度学习

摘要: 官方文档是最好的参考说明书，本文都是根据官方文档亲测安装有效，记录下来希望对您有帮助。网上的一些攻略错误给我带来了比较大的修复工作量，比如安装cuda时候remove了一些依赖包（不用删），鼠标键盘失灵，安装时选择的安全模式，导致ubuntu登录界面循环，这里不赘述。由于操作系统、cuda版本等多样性，读者安装时不要嫌麻烦，遇到比较奇葩错误或者拿不准的命令时（尤其是删除命令），再三查看多个网址，确认无误之后再执行，否则一句错误的命令可能导致一天时间的修复。本文在书写时尽量避免了错误和误导，希望对您有所帮助，同时建议您对于自己的平台，不要仅信赖这一个文档，反复查看官方文档更佳。

I Preprare for CUDA installation
官方文档：http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html 这个官方文档是针对cuda 9.1.5的，而我们安装的是cuda 8.0，所以在安装cuda的语句中版本号会稍有不同，其它都是可放心参照的方法。

本节是一些准备工作，查看操作系统版本号、GPU型号等。

1.1 Verify You Have a CUDA-Capable GPU 查看本机是否有GPU

To verify that your GPU is CUDA-capable, go to your distribution's equivalent of System Properties, or, from the command line, enter:

$ lspci | grep -i nvidia

cuda 目前支持的GPU版本型号和大类包括：https://developer.nvidia.com/cuda-gpus

1.2 Verify You Have a Supported Version of Linux 查看Linux版本
The CUDA Development Tools are only supported on some specific distributions of Linux. These are listed in the CUDA Toolkit release notes. To determine which distribution and release number you're running, type the following at the command line:
$ uname -m && cat /etc/*release

1.3 Verify the System Has gcc Installed 确认gcc是否安装，并查看gcc版本号.

The gcc compiler is required for development using the CUDA Toolkit. gcc 是GNU编译器套装（英语：GNU Compiler Collection，缩写为GCC），指一套编程语言编译器. 编译器版本可处理多种语言：比如Java,Ada, C, C++等等. It is not required for running CUDA applications. It is generally installed as part of the Linux installation, and in most cases the version of gcc installed with a supported version of Linux will work correctly. To verify the version of gcc installed on your system, type the following on the command line:

$ gcc --version

1.4 Verify the System has the Correct Kernel Headers and Development Packages Installed 查看系统内核headers和development packages，与内核版本保持一致即可。
The CUDA Driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 3.17.4-301, the 3.17.4-301 kernel headers and development packages must also be installed.

While the Runfile installation performs no package validation, the RPM and Deb installations of the driver will make an attempt to install the kernel header and development packages if no version of these packages is currently installed. However, it will install the latest version of these packages, which may or may not match the version of the kernel your system is using. Therefore, it is best to manually ensure the correct version of the kernel headers and development packages are installed prior to installing the CUDA Drivers, as well as whenever you change the kernel version.

The version of the kernel your system is running can be found by running the following command:

手动查看kernel版本

$ uname -r

The kernel headers and development packages for the currently running kernel can be installed with:

安装与系统kernel版本对应的headers 和development packages.

$ sudo apt-get install linux-headers-$(uname -r)

II. Download CUDA toolkit 8.0 and Installation

(注意：目前tensorflow 1.3 只支持CUDA toolkit 8.0+cudnn 6.0 )

建议读者在安装时，请check 实时的tensorflow官网上支持的CUDA 版本以及cudnn版本，否则装了最新版本，不被tensorflow支持，还得卸载重新来过。

tensorflow 官网： https://www.tensorflow.org/install/install_linux?hl=zh-cn#prepare_your_environment 支持的版本信息如下，更高版本不行：

CUDA® Toolkit 8.0. For details, see NVIDIA's documentation. Ensure that you append the relevant Cuda pathnames to the LD_LIBRARY_PATH environment variable as described in the NVIDIA documentation.
The NVIDIA drivers associated with CUDA Toolkit 8.0.
cuDNN v6.0. For details, see NVIDIA's documentation. Ensure that you create the CUDA_HOMEenvironment variable as described in the NVIDIA documentation.

2.1 Download cuda toolkit 下载cuda toolkit，注意下载cuda 8.0
https://developer.nvidia.com/cuda-80-ga2-download-archive

选择 Linux> x86_64> ubuntu> 16.04> deb(local)

2.2 install cuda toolkit 8.0 安装
在terminal 窗口依次输入以下Installation Instructions

cd命令进入到下载文件的文件夹，然后输入以下命令，安装cuda

`$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb`
`$ sudo apt-get update`
`$ sudo apt-get install cuda`

＊＊＊＊＊＊＊＊如果上述命令为你安装的不是cude-8-0而是新版cuda-9-0等，解决方案如下＊＊＊＊＊＊＊＊＊＊

因为我之前安装过高版本的cuda－9.1，发现tensorflow不支持，因此卸载并请清除过cuda-9.1。用上面三句话重新安装cuda最后还是会自动安装cuda-9.0而不是我希望的cuda-8。

参考解决方案网址：https://devtalk.nvidia.com/default/topic/1024342/cuda-setup-and-installation/unable-to-uninstall-cuda-9-0-completely-and-install-8-0-instead/

归纳如下：

先卸载已经安装的高版本的cuda9.1

$ sudo apt-get --purge remove cuda

$ sudo apt autoremove

然后清理apt-cache

$ sudo apt-get clean

最后重新安装，并且cuda的指定版本号

$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb

$ sudo apt-get update

$ sudo apt-get install cuda-8-0

顺利完成！

2.3 environment setup 配置环境变量
打开\home目录下的.bashrc 文件（这是隐藏文件，因此需要先用ctrl+H 快捷键显示隐藏文件再打开），在.bashrc的最后追加如下语句：

export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}

export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

注意这里要路径要和Nvida驱动版本一致在终端输入 $cat /proc/driver/nvidia/version 可以查看驱动版本号

export LPATH=/usr/lib/nvidia-387:$LPATH

export LIBRARY_PATH=/usr/lib/nvidia-387:$LIBRARY_PATH

注意：上述语句中除了export后面的空格，不要有不必要的空格，否则会不识别，是空格敏感的

2.4 Test cuda是否安装成功，查看nvcc编译器的版本

$ nvcc -V

III. install cudnn (深度神经网络库 Deep Neural Network library)
官方文档：http://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html

3.1 download cudnn （注意下载cudnn 6.0）
读者别嫌麻烦，注册加入(join)一下，然后就可以免费下载，下载时注意选择与本机ubuntu版本，cuda版本号对应的cudnn 6.0

https://developer.nvidia.com/rdp/form/cudnn-download-survey

3.2 install cudnn
Navigate to your <cudnnpath> directory containing cuDNN Debian file. cd命令进入到下载这三个文件的目录，然后依次安装

$ sudo dpkg -i libcudnn6_6.0.3.11-1+cuda8.0_amd64.deb
# Install the developer library, for example:
$ sudo dpkg -i libcudnn6-dev_6.0.3.11-1+cuda8.0_amd64.deb
# Install the code samples and the cuDNN Library User Guide, for example:
$ sudo dpkg -i libcudnn6-doc_6.0.3.11-1+cuda8.0_amd64.deb

这里的sudo dpkg -i 后面的 ‘ libcudnn6-...’ 版本号以自己下载文件的命名为准。

小结：cuDNN is just installed by dropping files onto your system, 不用配置环境变量.

IV. install Tensorflow-gpu

参考官网文档： https://www.tensorflow.org/install/install_linux?hl=zh-cn#prepare_your_environment

4.1 prepare
The libcupti-dev library, which is the NVIDIA CUDA Profile Tools Interface. This library provides advanced profiling support. To install this library, issue the following command:

$ sudo apt-get install libcupti-dev

4.2 用native pip命令安装 tensorflow-gup

$ sudo apt-get install python3-pip python3-dev # for Python 3.n
$ pip3 install tensorflow-gpu # Python 3.n; GPU support

(Optional.) If above step ‘$ pip3 install tensor flow-gpu’ failed, install the latest version of TensorFlow by issuing a command of the following format:

$ sudo pip3 install --upgrade tfBinaryURL  
 # Python 3.n

where tfBinaryURL identifies the URL of the TensorFlow Python package. The appropriate value oftfBinaryURL depends on the operating system, Python version, and GPU support. Find the appropriate value for tfBinaryURL here. For example, to install TensorFlow for Linux, Python 3.4, and CPU-only support, issue the following command:

$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.4.0-cp34-cp34m-linux_x86_64.whl

4.3 类似2.3节提到的环境变量配置，在.bashrc文档中再追加环境变量

Tensorflow 要求的环境变量

export CUDA_HOME=/usr/local/cuda-8.0

4.4. Test tensorflow－gpu 是否配置成功, 跑一段代码

$ python3
# 进入Python 环境下
>>> import tensorflow as tf
>>> hello =tf.constant("hello, tensorflow")
>>> sess = tf.Session()
>>> print(sess.run(hello))

输出了"hello, tensorflow" ，运行成功，恭喜你。

附录：遇到过的错误及解决方案

我一切都安装好了，但是运行时报错，cannot load nativeruntime tensorflow:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 23, in <module>
    from tensorflow.python import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
    _pywrap_tensorflow = swig_import_helper()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)

ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

错误原因：I installed Cuda 9.0, but I realized that tensorflow 1.3 does not yet support it.

方法：

# I did following steps to remove cuda 9.0
$ sudo apt-get --purge remove cuda
$ sudo apt autoremove

# Then clear apt-cache
$ sudo apt-get clean

# Then I tried following steps to reinstall the cuda 8.0
$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
$ sudo apt-get update
$ sudo apt-get install cuda

再次遇到问题: I have tried uninstalling cuda v9.0 but when I try to uninstall v8.0, v9.0 keeps getting installed instead. However cuda 9.0 keeps getting installed instead. How do I prevent this from happening and install 8.0?

Nvidia ansuwer: 再卸载一遍，安装时上述三句话的最后一句指定cuda版本号

$ sudo apt-get install cuda-8-0

点击查看更多内容