Android with root Git for version control Lircd with Raspberry Pi for IR receiver and sender Tips for Windows Depolying your own password management tool -- KeeWeb Depoly your flask app into Heroku Fix shit IE code manually ISBN to Book Category by Scraping DangDang A Generic Makefile for C/C++ Program Configure Raspberry pi Remove watermark with PyPDF2 tips for docker Anaconda+TensorFlow+CUDA Snippets Configure Remote Mathematica Kernel Build your own ngrok server Access Array SSL VPN 使用Rstudio制作html5幻灯片 tips for Mac OS X system Tips for ipython notebook 配置Ubuntu server + Openbox (Obuntu) tips for Vimperator tips for Vim 安装CUDA My First Jekyll Blog rsync常见选项 在Linux中读取Ipod touch的文件 tip for texmacs 在VPS上建站的一些tip Gnuplot绘图札记 Samba系统和autofs自动挂载 Linux中alsamixer声卡无法录音 搭建自己的RSS订阅器——Tiny Tiny RSS Grub2引导安装Ubuntu awk tips 将Ubuntu系统装入U盘 The Great Rtorrent 编译GCC 再这样剁手!!!该死的libgd 使用ulimit进行资源限制 使用SSH代理上IPV6 使用RCurl抓取网页数据 修复Ubuntu Grub记 openbox中的文件关联 在Ubuntu 12.04下编译qtiplot 处理BCM4312网卡驱动纪实 配置我的Ubuntu Server记 Cygwin杂记 Linux 使普通用户具有以超级权限执行脚本 让firefox自定义地处理文件类型 WordPress优秀主题及插件 在phpcloud上搭建wordpress UBUNTU下用pptpd做VPN server ubuntu升级内核过后的一些问题 安装telnet服务 kubuntu札记 64位kubuntu札记 统计软件R Virtualbox stardict星际译王 Ubuntu重装windows系统后的grub引导修复 SSH服务及花生壳域名解析 采用cbp2make工具由code::blocks工程创建makefile文件 UBUNTU 札记

Anaconda+TensorFlow+CUDA

2016年11月07日

In this post we keep records for setting up procedure on Tensorflow custom-op development environment.

Conda

At first, we recommend using Conda to seperate the Python version and dependency.

Conda configuration file ~/.condarc

cat > ~/.condarc <<EOF
## setup proxy if needed
#proxy_servers:
#  http: http://172.17.0.1:8123
#  https: https://172.17.0.1:8123

report_errors: false
ssl_verify: false
## do not enter default conda environment
auto_activate_base: false
#offline: True
EOF

Conda init script inside like bashrc

Use like conda init zsh to setup Conda automatically. Or place following snippet in common shell rc file to support multiple shell:

cat <<'EOF'
SHELL_EXE=$(readlink /proc/$$/exe)
SHELL_TYPE=$(echo 'bash:bash,zsh:zsh,dash:sh,zsh5:zsh' | awk -F: -v RS=, -v s=$(basename ${SHELL_EXE:-zsh}) '$1==s{print $2}')

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/conda/bin/conda' shell.${SHELL_TYPE} 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/opt/conda/etc/profile.d/conda.sh" ]; then
        . "/opt/conda/etc/profile.d/conda.sh"
    else
        export PATH="/opt/conda/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<
EOF

Get faimilar with Conda command

CONDA_ENV_NAME=tf22_py38

# create environment with specfic package versions
CUDA_VERSION=$(nvcc --version | awk -F',' '/release/{split($(NF-1),v," "); print v[2]}')
conda create -p /opt/conda/envs/${CONDA_ENV_NAME} \
	python=3.8 'tensorflow<2.4' \
  cudatoolkit=${CUDA_VERSION:-10.1} cudnn

# enter Conda environment by name (or the full path)
conda activate ${CONDA_ENV_NAME}

# maybe save all dependencies for later reconstruct
conda env export > conda_env.yaml

# one day we may destroy the env and rebuild
conda deactivate
conda env remove -n ${CONDA_ENV_NAME}

# restore the environment from configuration file
conda create --file conda_env.yaml -n ${CONDA_ENV_NAME}

Conda with TF2.3 under CUDA 10

TF2.3 is the last version support CUDA 10, or upgrade Nvidia driver for >=TF2.4. But Conda do not provide TF2.3 out of box because of some issues.

Thought it is not recommended way, we use pip to install TF2.3 while conda to install the must have cudnn and cudatoolkit package.

CONDA_ENV_PATH=/opt/conda/envs/tf23_py36
## Please use following strict package version or strange things happened. 
## higher version of certifi introduce issue with pip
conda create -p ${CONDA_ENV_PATH} \
	python=3.6 \
	cudnn=7.6.5 cudatoolkit=10.1 \
	numpy=1.18.5 scipy=1.4.1 requests=2.27.1 h5py=2.10.0 \
	certifi=2021.5.30

# furthur install tensorflow 2.3 with pip
conda activate ${CONDA_ENV_PATH}
python -m pip install tensorflow-gpu==2.3

# maybe install some data science packages
conda install -y matplotlib scikit-learn scikit-image pandas

The envrionment configuration for TF2.3 and furthur data science were exported.

Tensorflow Custom-op

tfopgen is a scaffold project to generate custom-op start-up codebase.

The Tensorflow package root path can be found with TF_ROOT=$(python -c 'import pkgutil as p; from pathlib import Path; print(Path(p.get_loader("tensorflow").get_filename()).parent)').

Be careful with some inconsistence issues:

  • Missing header file under third_party. Do a link like ln -s "$(dirname $(command -v nvcc))/../targets/x86_64-linux/include" ${TF_ROOT}/include/third_party/gpus/cuda/include

  • cuda toolkit (e.g. nvcc) better be a fixed path or a lot of changes needed. Do a link with ln -s "$(dirname $(command -v nvcc))/.." /usr/local/cuda

  • Be careful with your GPU’s capcbility while spectifying nvcc options --gpu-architecture. Or weird things can happen in runtime, like triggering no kernel image is available for execution on the device hence a segment fault to crash the process. One can place GPU_ARCH := $(shell python3 -c "from tensorflow.python.client import device_lib; print(next(d.physical_device_desc for d in device_lib.list_local_devices() if d.device_type=='GPU'))" 2> /dev/null | awk -F: -v RS=, '/compute capability/{print "sm_"int($2*10)}') inside the Makefile and then use --gpu-architecture=$(GPU_ARCH) inside the NVCCFLAGS.