Compiling TensorFlow on CentOS 8
I’m training the GPT2 1558M model with 2,300,000 tokens, I rented a server from Hetzner that has 12 threads and 256 GB.
Before playing with batch number, each step took 90 seconds, I reached step 1400, and the average loss didn’t drop below 2.3, since I’m using a CPU and not a GPU (memory usage is around 50GB) I knew I had to tweak the speed.
I was always getting this message: “Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA”, since I have experience with SIMD compilations, I knew it meant less performance.
I read that compiling TensorFlow for the current machine can improve performance by 300%, so I decided to give it a try, the only problem, that it’s complex and the current guides does not work for CentOS 8.
So here is my guide, I was using this guide as a reference, and added specific stuff for my build.
Install the environment
yum install python36 git gcc gcc-c++ unzip python3-devel pip3 install --upgrade pip pip3 install --upgrade setuptools
Install bazel
I installed bazel from yum, but it was version >=1 and the version of TensorFlow that supports GPT2 requires a maximum version of 0.26.1, so I had to install from source:
mkdir bazel cd bazel wget https://github.com/bazelbuild/bazel/releases/download/0.26.1/bazel-0.26.1-installer-linux-x86_64.sh chmod +x bazel-0.26.1-installer-linux-x86_64.sh ./bazel-0.26.1-installer-linux-x86_64.sh
Compile TensorFlow
I used version 1.15.3, there might be a new version, from testing, GPT-2 support TensorFlow up to version 1.7
cd ..
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout v1.15.3
pip3 install 'numpy<1.19.0'
pip3 install six wheel mock future>=0.17.1
pip3 install keras_applications==1.0.6 --no-deps
pip3 install keras_preprocessing==1.0.5 --no-deps ./configure
The numpy version limitation is needed, without it you’ll get the error:
C++ compilation of rule '//tensorflow/python:bfloat16_lib' failed (Exit 1)
Answer no to all the questions, and point to the correct path
bazel build -c opt --copt=-mavx --copt=-march=native //tensorflow/tools/pip_package:build_pip_package
Go drink cofee, it took my server 4000 seconds, the compiler did use all the CPU threads nicely.
On some compilations I got an error that Python was no found:
/usr/bin/env: ‘python’: No such file or directory compile tensorflow
In that case the solution was to create a symbolic link for python, make sure you adjust the directories as needed:
ln -s /usr/bin/python3 /usr/bin/python
And repeat compilation.
Build the Python package
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Install TensorFlow via PIP
pip3 install /tmp/tensorflow_pkg/tensorflow-1.15.3-cp36-cp36m-linux_x86_64.whl
Summary
I got a 200% improvement, which is a lot for 2 hours work, so I’m happy