Windows10下配置Keras + CNTK (GPU) + Tensorflow (GPU)

本文安装时的各软件版本为

  • Windows 10.0.15063
  • Visual Studio 2017 (VS15.2)
  • Anaconda3 4.3.25
  • CNTK 2.2
  • Tensorflow 1.4
  • Keras 2.0.6
  • NVIDIA Geforce Notebook Driver 376.54

配这一波环境又让我对微软爸爸的信仰值飙升一大截!特此记录一下安装过程,虽然步骤十分简单没有什么好写的(笑,但是本着能不再踩的坑就不要再踩的原则,还是记录下来吧~

安装 CNTK

不得不说我还是非常喜欢CNTK的,不仅因为它是微软爸爸的产品,能在Windows亲儿子上运行,而且性能很不错(关于性能的比较可以参见Github一篇评测一篇老文一篇新文),最重要的是它是所有深度学习库里面唯一提供Pre-built Installation的!!不知道多少人在编译Caffe和Tensorflow时被N卡驱动和CUDA、MKL等等安装坑过。。。占空间就罢了,还经常各种报错!而Windows下安装CNTK,由于已经编译好了,就不会碰到这种问题!所以当我得知几个月前Keras已经支持CNTK后端的时候,我立马准备在自己的电脑上装一份了!省的天天跟别人抢服务器= =

安装流程可以参考官方教程。首先就是下载CNTK,我当然是选择下载编译后的包啦,直接进入CNTK的Release页找到你需要的版本就可以~(不得再次感叹一声预编译版本好全。。连UWP都有)当然你如果不怕像Linux一样折腾= =那么你可以直接下载源码去编译。

下载后解压。注意,解压后的位置直接就是之后CNTK运行的位置,因此后文解压到的目录就称作“CNTK目录”了。然后cd进<CNTK目录>\Scripts\install\windows运行install.bat即可安装。为了避免cmd把路径识别成两个参数,请把压缩包解压到一个不含空格的路径中。

如果之前已经装了Anaconda,就可以通过AnacondaBasePath参数指定Anaconda的位置避免重复安装。这里如果你anaconda的安装位置有空格的话batchfile就会同样因为路径问题用不了了,只能上Powershell直接运行ps1脚本。用管理员模式打开Powershell,先在环境变量中添加ps1的模块位置$Env:PSModulePath=$Env:PSModulePath+";<CNTK目录>\Scripts\install\windows\ps\Modules",然后运行ps/install.ps1 -AnacondaBasePath "<Anaconda安装目录>",即可安装。如果遇到无法加载文件,因为在此系统中禁止执行脚本。的报错,可以运行Set-ExecutionPolicy Bypass命令(不区分大小写),在运行完脚本后运行Set-Execution Restricted设置回来。

此外还可以通过PyVersion参数来设置conda环境的Python版本,默认的版本是3.5。安装完成后Anaconda中会添加cntk-pyxx的环境,后两位是你指定的Python版本,后文就用默认的cntk-py35来表示。

安装完成后运行<CNTK目录>\Scripts\cntkpy35.bat设置环境变量,然后会进入cntk-py35环境。再cd到<CNTK目录>\Tutorials\HelloWorld-LogisticRegression,运行cntk configFile=lr_bs.cntk makeMode=false command=Train即可验证安装。

安装 Keras

安装CNTK后安装Keras就很简单了。首先运行activate cntk-py35进入环境,然后运行pip install keras即可。

然后是把Keras后端切换成CNTK,Keras文档有对应的说明,只需将%USERPROFILE%\.keras\keras.json中的backend值改为cntk

初步测试Keras只需测试import keras能否成功即可。进一步的测试可以通过运行Keras的样例脚本来完成。运行结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
(cntk-py35) C:\Users\Jacob>python C:\Users\Jacob\Downloads\mnist_mlp.py
Using CNTK backend
Selected GPU[0] GeForce GTX 850M as the process wide default device.
60000 train samples
10000 test samples
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 512) 401920
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_2 (Dropout) (None, 512) 0
_________________________________________________________________
dense_3 (Dense) (None, 10) 5130
=================================================================
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________
Train on 60000 samples, validate on 10000 samples
Epoch 1/20
D:\Program Files\Anaconda3\envs\cntk-py35\lib\site-packages\cntk\core.py:361: UserWarning: your data is of type "float64", but your input variable (uid "Input75") expects "<class 'numpy.float32'>". Please convert your data beforehand to speed up training.
(sample.dtype, var.uid, str(var.dtype)))
60000/60000 [==============================] - 4s - loss: 0.2464 - acc: 0.9248 - val_loss: 0.1097 - val_acc: 0.9653
Epoch 2/20
60000/60000 [==============================] - 3s - loss: 0.1035 - acc: 0.9681 - val_loss: 0.0843 - val_acc: 0.9744
Epoch 3/20
60000/60000 [==============================] - 3s - loss: 0.0742 - acc: 0.9776 - val_loss: 0.0872 - val_acc: 0.9764
Epoch 4/20
60000/60000 [==============================] - 3s - loss: 0.0607 - acc: 0.9822 - val_loss: 0.0688 - val_acc: 0.9804
Epoch 5/20
60000/60000 [==============================] - 3s - loss: 0.0502 - acc: 0.9851 - val_loss: 0.0863 - val_acc: 0.9781
Epoch 6/20
60000/60000 [==============================] - 3s - loss: 0.0424 - acc: 0.9874 - val_loss: 0.0828 - val_acc: 0.9801
Epoch 7/20
60000/60000 [==============================] - 3s - loss: 0.0369 - acc: 0.9890 - val_loss: 0.0758 - val_acc: 0.9812
Epoch 8/20
60000/60000 [==============================] - 3s - loss: 0.0367 - acc: 0.9895 - val_loss: 0.0840 - val_acc: 0.9825
Epoch 9/20
60000/60000 [==============================] - 3s - loss: 0.0338 - acc: 0.9903 - val_loss: 0.1029 - val_acc: 0.9782
Epoch 10/20
60000/60000 [==============================] - 3s - loss: 0.0313 - acc: 0.9912 - val_loss: 0.0837 - val_acc: 0.9827
Epoch 11/20
60000/60000 [==============================] - 3s - loss: 0.0273 - acc: 0.9919 - val_loss: 0.1013 - val_acc: 0.9808
Epoch 12/20
60000/60000 [==============================] - 3s - loss: 0.0277 - acc: 0.9925 - val_loss: 0.0921 - val_acc: 0.9827
Epoch 13/20
60000/60000 [==============================] - 3s - loss: 0.0252 - acc: 0.9931 - val_loss: 0.0900 - val_acc: 0.9832
Epoch 14/20
60000/60000 [==============================] - 3s - loss: 0.0268 - acc: 0.9931 - val_loss: 0.1035 - val_acc: 0.9830
Epoch 15/20
60000/60000 [==============================] - 3s - loss: 0.0233 - acc: 0.9940 - val_loss: 0.1280 - val_acc: 0.9797
Epoch 16/20
60000/60000 [==============================] - 3s - loss: 0.0243 - acc: 0.9942 - val_loss: 0.0998 - val_acc: 0.9830
Epoch 17/20
60000/60000 [==============================] - 3s - loss: 0.0222 - acc: 0.9942 - val_loss: 0.1011 - val_acc: 0.9849
Epoch 18/20
60000/60000 [==============================] - 3s - loss: 0.0205 - acc: 0.9946 - val_loss: 0.1111 - val_acc: 0.9830
Epoch 19/20
60000/60000 [==============================] - 3s - loss: 0.0201 - acc: 0.9951 - val_loss: 0.1302 - val_acc: 0.9801
Epoch 20/20
60000/60000 [==============================] - 3s - loss: 0.0206 - acc: 0.9952 - val_loss: 0.1181 - val_acc: 0.9827
Test loss: 0.11807218486
Test accuracy: 0.9827

安装 Tensorflow

Tensorflow虽然没有提供安装即可用的安装包,但是它可以通过pipconda来安装,也是非常方便的,直接pip install tensorflow-gpu即可。如果想避免CNTK和Tensorflow相互影响,可以专门为Tensorflow也开一个环境。

Tensorflow的GPU版本需要CUDA和CuDNN的支持,这个是pip无法安装的。独立安装CUDA和CuDNN需要从NVIDIA官网注册下载,也是很麻烦,但而由于CNTK里面是带这两个东西的,因此只需要把PATH设置到CNTK下面即可,非常方便!具体方式是在环境变量PYTHONPATH中添加<CNTK目录>\cntk,如果原本没有PYTHONPATH这个变量就新建一个。

这样,Tensorflow也很快就安装完成了,修改Keras的后端跑一个测试结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
(tf-py35) C:\Users\Jacob>python C:\Users\Jacob\Downloads\mnist_mlp.py
Using TensorFlow backend.
60000 train samples
10000 test samples
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 512) 401920
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_2 (Dropout) (None, 512) 0
_________________________________________________________________
dense_3 (Dense) (None, 10) 5130
=================================================================
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________
Train on 60000 samples, validate on 10000 samples
Epoch 1/20
2017-11-19 11:17:30.732744: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2017-11-19 11:17:31.701106: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 850M major: 5 minor: 0 memoryClockRate(GHz): 0.8625
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 1.65GiB
2017-11-19 11:17:31.701257: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 850M, pci bus id: 0000:01:00.0, compute capability: 5.0)
60000/60000 [==============================] - 122s 2ms/step - loss: 0.2460 - acc: 0.9233 - val_loss: 0.1317 - val_acc: 0.9590
Epoch 2/20
60000/60000 [==============================] - 6s 96us/step - loss: 0.1023 - acc: 0.9698 - val_loss: 0.0881 - val_acc: 0.9745
Epoch 3/20
60000/60000 [==============================] - 6s 95us/step - loss: 0.0762 - acc: 0.9768 - val_loss: 0.0823 - val_acc: 0.9744
Epoch 4/20
60000/60000 [==============================] - 5s 91us/step - loss: 0.0612 - acc: 0.9811 - val_loss: 0.0812 - val_acc: 0.9761
Epoch 5/20
60000/60000 [==============================] - 6s 98us/step - loss: 0.0514 - acc: 0.9845 - val_loss: 0.0734 - val_acc: 0.9813
Epoch 6/20
60000/60000 [==============================] - 6s 102us/step - loss: 0.0454 - acc: 0.9866 - val_loss: 0.0783 - val_acc: 0.9818
Epoch 7/20
60000/60000 [==============================] - 6s 93us/step - loss: 0.0388 - acc: 0.9884 - val_loss: 0.0871 - val_acc: 0.9797
Epoch 8/20
60000/60000 [==============================] - 6s 95us/step - loss: 0.0354 - acc: 0.9896 - val_loss: 0.0918 - val_acc: 0.9813
Epoch 9/20
60000/60000 [==============================] - 6s 96us/step - loss: 0.0330 - acc: 0.9906 - val_loss: 0.0859 - val_acc: 0.9800
Epoch 10/20
60000/60000 [==============================] - 6s 103us/step - loss: 0.0308 - acc: 0.9915 - val_loss: 0.0898 - val_acc: 0.9812
Epoch 11/20
60000/60000 [==============================] - 5s 89us/step - loss: 0.0261 - acc: 0.9924 - val_loss: 0.0975 - val_acc: 0.9824
Epoch 12/20
60000/60000 [==============================] - 6s 103us/step - loss: 0.0266 - acc: 0.9925 - val_loss: 0.0906 - val_acc: 0.9848
Epoch 13/20
60000/60000 [==============================] - 6s 102us/step - loss: 0.0248 - acc: 0.9934 - val_loss: 0.0907 - val_acc: 0.9834
Epoch 14/20
60000/60000 [==============================] - 6s 96us/step - loss: 0.0224 - acc: 0.9939 - val_loss: 0.1088 - val_acc: 0.9811
Epoch 15/20
60000/60000 [==============================] - 6s 97us/step - loss: 0.0217 - acc: 0.9938 - val_loss: 0.0973 - val_acc: 0.9818
Epoch 16/20
60000/60000 [==============================] - 6s 97us/step - loss: 0.0202 - acc: 0.9946 - val_loss: 0.1075 - val_acc: 0.9819
Epoch 17/20
60000/60000 [==============================] - 6s 99us/step - loss: 0.0197 - acc: 0.9947 - val_loss: 0.1218 - val_acc: 0.9815
Epoch 18/20
60000/60000 [==============================] - 6s 96us/step - loss: 0.0208 - acc: 0.9945 - val_loss: 0.1131 - val_acc: 0.9832
Epoch 19/20
60000/60000 [==============================] - 6s 93us/step - loss: 0.0190 - acc: 0.9949 - val_loss: 0.1285 - val_acc: 0.9805
Epoch 20/20
60000/60000 [==============================] - 6s 94us/step - loss: 0.0185 - acc: 0.9955 - val_loss: 0.1153 - val_acc: 0.9833
Test loss: 0.115294696707
Test accuracy: 0.9833

(可以看出Tensorflow比CNTK要慢不少)

Shoot me some coffee money XD
0%