怎么和别人和谐共处的使用服务器上的GPU
师兄协调了一台服务器,给了我个Ubuntu账户,我应该怎么用而不打扰别人?
怎么只有一个$符号?
那是因为你正在用.sh
模式,只要在终端输入bash
,你所热爱的正是你的生活。
环境变量
请只修改你目录下的东西,因为你没有sudo权限不太可能会动到/etc/..底下的东西,但是千万记住,不要直接修改 /etc/profile
、/etc/bashrc
或其他全局的环境变量配置文件,因为这会影响所有用户。 你只能碰/home/username/..
里面的profile
、bashrc
。 请注意,安装Anaconda
没事,CUDA
小心一点,有点麻烦。
CUDA
这里同样你只能用师兄们给你装好的圣遗物 请创建文件touch switch-cuda.sh
通过source switch-cuda.sh
看可用的cuda,通过source switch-cuda.sh [VERSION]
切换,例如:
(base) user@root:~$ source switch-cuda.sh
The following CUDA installations have been found (in '/usr/local'):
* cuda-11.0
* cuda-12.4
(base) user@root:~$ source switch-cuda.sh 12.4
Switched to CUDA 12.4
文件内容(用Vim编辑):
#!/usr/bin/env bash
set -e
# ensure that the script has been sourced rather than just executed
if [[ "${BASH_SOURCE[0]}" = "${0}" ]]; then
echo "Please use 'source' to execute switch-cuda.sh!"
exit 1
fi
INSTALL_FOLDER="/usr/local" # the location to look for CUDA installations at
TARGET_VERSION=${1} # the target CUDA version to switch to (if provided)
# if no version to switch to has been provided, then just print all available CUDA installations
if [[ -z ${TARGET_VERSION} ]]; then
echo "The following CUDA installations have been found (in '${INSTALL_FOLDER}'):"
ls -l "${INSTALL_FOLDER}" | egrep -o "cuda-[0-9]+\\.[0-9]+$" | while read -r line; do
echo "* ${line}"
done
set +e
return
# otherwise, check whether there is an installation of the requested CUDA version
elif [[ ! -d "${INSTALL_FOLDER}/cuda-${TARGET_VERSION}" ]]; then
echo "No installation of CUDA ${TARGET_VERSION} has been found!"
set +e
return
fi
# the path of the installation to use
cuda_path="${INSTALL_FOLDER}/cuda-${TARGET_VERSION}"
# filter out those CUDA entries from the PATH that are not needed anymore
path_elements=(${PATH//:/ })
new_path="${cuda_path}/bin"
for p in "${path_elements[@]}"; do
if [[ ! ${p} =~ ^${INSTALL_FOLDER}/cuda ]]; then
new_path="${new_path}:${p}"
fi
done
# filter out those CUDA entries from the LD_LIBRARY_PATH that are not needed anymore
ld_path_elements=(${LD_LIBRARY_PATH//:/ })
new_ld_path="${cuda_path}/lib64:${cuda_path}/extras/CUPTI/lib64"
for p in "${ld_path_elements[@]}"; do
if [[ ! ${p} =~ ^${INSTALL_FOLDER}/cuda ]]; then
new_ld_path="${new_ld_path}:${p}"
fi
done
# update environment variables
export CUDA_HOME="${cuda_path}"
export CUDA_ROOT="${cuda_path}"
export LD_LIBRARY_PATH="${new_ld_path}"
export PATH="${new_path}"
echo "Switched to CUDA ${TARGET_VERSION}."
set +e
return
Conda
Anaconda
和 Miniconda
正常安装即可,包括换源
也是没问题的。目前Miniconda
挺爽的,小小的也很可爱! Miniconda
# 创建文件夹
mkdir miniconda3
cd miniconda3/
# 下载miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# 运行安装
bash Miniconda3-latest-Linux-x86_64.sh
# 让你的bash前面出现当前环境,例如:(base)
eval "$(/home/lvzhiwei/miniconda3/bin/conda shell.bash hook)"
# 让conda在你的环境变量里面打上烙印,不用担心,不会修改全局变量
conda init
APT?
这个时候问题来了,APT没换源啊? 求你的师兄去给你换源,没有sudo只能临时指定。 holyshit,原来大多数情况下你没有sudo
你根本没法apt install
,因为要写你没权限的文件
喂?谁在用显卡
这里介绍一些监看命令,帮你搞清楚谁在用显卡,并检验你能不能得罪得起kill
掉他们的进程 nvidia-smi
太基础了,这里介绍一种持续刷新的办法watch
~~视监你的师兄们~~
nvidia-smi
# 每隔2秒刷新一次,每次只在固定位置刷新
watch -n 2 -d nvidia-smi
根据上面nvidia-smi
查到的PID,查是谁的进程(PS命令)
# 查询是谁的进程以及开始时间
ps -p [PID] -o user=,lstart=
例如:
(base) you@your_lab:~$ ps -p 2662726 -o user=,lstart=
DaShiXiong+ Sun Jan 5 15:25:48 2025
我写了个程序,可以一键查询哪些人在用哪张显卡: touch check.sh
、./check.sh
#!/bin/bash
# 获取 nvidia-smi 输出中的 PID 列(排除表头)
pids=$(nvidia-smi --query-compute-apps=pid --format=csv,noheader)
# 检查是否有进程在 GPU 上运行
if [ -z "$pids" ]; then
echo "No processes are running on the GPU."
exit 1
fi
# 输出表头
printf "%-10s %-20s %-25s %-5s\n" "PID" "USER" "CREATED" "GPU"
# 遍历每个 PID,查找进程的用户、GPU 和显存使用情况
while read pid; do
# 获取进程所属的用户
user=$(ps -p $pid -o user=)
# 获取进程创建时间
created=$(ps -p $pid -o lstart=)
# 获取该进程正在使用的 GPU
gpu=$(nvidia-smi pmon -c 1 | grep $pid | awk '{print $1}')
# 使用 printf 格式化输出,确保对齐
printf "%-10s %-20s %-25s %-5s\n" "$pid" "$user" "$created" "$gpu"
done <<< "$pids"
Vim
修改文件必备
vim ~/.vnc/xstartup
i
编辑、 esc
退出、 :wq
保存
Enjoy Reading This Article?
Here are some more articles you might like to read next: