怎么和别人和谐共处的使用服务器上的GPU

师兄协调了一台服务器,给了我个Ubuntu账户,我应该怎么用而不打扰别人?

怎么只有一个$符号?

那是因为你正在用.sh模式,只要在终端输入bash,你所热爱的正是你的生活。

环境变量

请只修改你目录下的东西,因为你没有sudo权限不太可能会动到/etc/..底下的东西,但是千万记住,不要直接修改 /etc/profile/etc/bashrc 或其他全局的环境变量配置文件,因为这会影响所有用户。 你只能碰/home/username/..里面的profilebashrc。 请注意,安装Anaconda没事,CUDA小心一点,有点麻烦。

CUDA

这里同样你只能用师兄们给你装好的圣遗物 请创建文件touch switch-cuda.sh 通过source switch-cuda.sh看可用的cuda,通过source switch-cuda.sh [VERSION]切换,例如:

(base) user@root:~$ source switch-cuda.sh 
The following CUDA installations have been found (in '/usr/local'):
* cuda-11.0
* cuda-12.4

(base) user@root:~$ source switch-cuda.sh 12.4
Switched to CUDA 12.4

文件内容(用Vim编辑):

#!/usr/bin/env bash

set -e
# ensure that the script has been sourced rather than just executed
if [[ "${BASH_SOURCE[0]}" = "${0}" ]]; then
    echo "Please use 'source' to execute switch-cuda.sh!"
    exit 1
fi

INSTALL_FOLDER="/usr/local"  # the location to look for CUDA installations at
TARGET_VERSION=${1}          # the target CUDA version to switch to (if provided)

# if no version to switch to has been provided, then just print all available CUDA installations
if [[ -z ${TARGET_VERSION} ]]; then
    echo "The following CUDA installations have been found (in '${INSTALL_FOLDER}'):"
    ls -l "${INSTALL_FOLDER}" | egrep -o "cuda-[0-9]+\\.[0-9]+$" | while read -r line; do
        echo "* ${line}"
    done
    set +e
    return
# otherwise, check whether there is an installation of the requested CUDA version
elif [[ ! -d "${INSTALL_FOLDER}/cuda-${TARGET_VERSION}" ]]; then
    echo "No installation of CUDA ${TARGET_VERSION} has been found!"
    set +e
    return
fi

# the path of the installation to use
cuda_path="${INSTALL_FOLDER}/cuda-${TARGET_VERSION}"

# filter out those CUDA entries from the PATH that are not needed anymore
path_elements=(${PATH//:/ })
new_path="${cuda_path}/bin"
for p in "${path_elements[@]}"; do
    if [[ ! ${p} =~ ^${INSTALL_FOLDER}/cuda ]]; then
        new_path="${new_path}:${p}"
    fi
done

# filter out those CUDA entries from the LD_LIBRARY_PATH that are not needed anymore
ld_path_elements=(${LD_LIBRARY_PATH//:/ })
new_ld_path="${cuda_path}/lib64:${cuda_path}/extras/CUPTI/lib64"
for p in "${ld_path_elements[@]}"; do
    if [[ ! ${p} =~ ^${INSTALL_FOLDER}/cuda ]]; then
        new_ld_path="${new_ld_path}:${p}"
    fi
done

# update environment variables
export CUDA_HOME="${cuda_path}"
export CUDA_ROOT="${cuda_path}"
export LD_LIBRARY_PATH="${new_ld_path}"
export PATH="${new_path}"

echo "Switched to CUDA ${TARGET_VERSION}."

set +e
return

Conda

AnacondaMiniconda 正常安装即可,包括换源也是没问题的。目前Miniconda挺爽的,小小的也很可爱! Miniconda

# 创建文件夹
mkdir miniconda3
cd miniconda3/
# 下载miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# 运行安装
bash Miniconda3-latest-Linux-x86_64.sh
# 让你的bash前面出现当前环境,例如:(base)
eval "$(/home/lvzhiwei/miniconda3/bin/conda shell.bash hook)"
# 让conda在你的环境变量里面打上烙印,不用担心,不会修改全局变量
conda init

APT?

这个时候问题来了,APT没换源啊? 求你的师兄去给你换源,没有sudo只能临时指定。 holyshit,原来大多数情况下你没有sudo你根本没法apt install,因为要写你没权限的文件

喂?谁在用显卡

这里介绍一些监看命令,帮你搞清楚谁在用显卡,并检验你能不能得罪得起kill掉他们的进程 nvidia-smi太基础了,这里介绍一种持续刷新的办法watch ~~视监你的师兄们~~

nvidia-smi

# 每隔2秒刷新一次,每次只在固定位置刷新
watch -n 2 -d nvidia-smi

根据上面nvidia-smi查到的PID,查是谁的进程(PS命令)

# 查询是谁的进程以及开始时间
ps -p [PID] -o user=,lstart=

例如:

(base) you@your_lab:~$ ps -p 2662726 -o user=,lstart=
DaShiXiong+ Sun Jan  5 15:25:48 2025

我写了个程序,可以一键查询哪些人在用哪张显卡: touch check.sh./check.sh

#!/bin/bash

# 获取 nvidia-smi 输出中的 PID 列(排除表头)
pids=$(nvidia-smi --query-compute-apps=pid --format=csv,noheader)

# 检查是否有进程在 GPU 上运行
if [ -z "$pids" ]; then
    echo "No processes are running on the GPU."
    exit 1
fi

# 输出表头
printf "%-10s %-20s %-25s %-5s\n" "PID" "USER" "CREATED" "GPU"

# 遍历每个 PID,查找进程的用户、GPU 和显存使用情况
while read pid; do
    # 获取进程所属的用户
    user=$(ps -p $pid -o user=)
    
    # 获取进程创建时间
    created=$(ps -p $pid -o lstart=)
    
    # 获取该进程正在使用的 GPU
    gpu=$(nvidia-smi pmon -c 1 | grep $pid | awk '{print $1}')
    
    # 使用 printf 格式化输出,确保对齐
    printf "%-10s %-20s %-25s %-5s\n" "$pid" "$user" "$created" "$gpu"
done <<< "$pids"

Vim

修改文件必备

vim ~/.vnc/xstartup

i编辑、 esc退出、 :wq保存




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • AutoDL最佳实践
  • 服务器使用——Tmux 保活进程
  • 【FunHPC服务器远程桌面】安装x11、桌面环境和vncserver
  • ISAAC SIM 安装
  • 记录我在怎么调查文献/综述