UniGoal

UniGoal

在机器人导航领域,现有方法通常针对特定任务(如物体导航、图像导航或文本导航)设计独立框架,导致模型泛化能力受限且难以应对真实场景中复杂的多模态指令。为此,本研究提出UniGoal ——首个零样本学习的通用目标导航框架,通过统一的图表示与大语言模型(LLM)推理 ,实现了跨物体类别、图像和文本描述三类任务的零样本导航。其核心创新在于:(1)构建动态场景图与目标图的统一表示,将环境感知与目标描述转化为结构化图,保留丰富的空间与语义关系;(2)设计多阶段探索策略,根据图匹配程度动态调整探索策略,在零匹配阶段通过子图分解逐步探索未知区域,部分匹配时利用坐标投影与锚点对齐推断目标位置,完全匹配时通过场景图修正与验证确保定位准确。此外,引入黑名单机制避免重复探索失败区域,显著提升效率。

UniGoal的推理流程以图匹配为核心,结合LLM的推理能力实现高效决策。在场景图构建中,实时融合RGB-D观测中的语义信息,形成拓扑结构。目标图则根据输入类型(物体类别、图像或文本)通过LLM/VLM处理生成,确保与场景图表示的一致性。实验表明,UniGoal在MatterPort3DHM3D等数据集上全面超越现有零样本的专用任务方法(如Mod-IIN[1]、SG-Nav[2])及需训练的通用方法(如GOAT[3]),在物体导航(成功率41.0%)、图像实例导航(成功率60.2%)和文本导航(成功率20.2%)任务中均达最优性能。其无需训练的特性与多模态通用性,为机器人在未知环境中的灵活导航提供了全新范式,拥有在真实场景落地的潜力。

——[CVPR2025 UniGoal:通用零样本目标导航,Navigate to Any Goal! - 知乎 (zhihu.com)](https://zhuanlan.zhihu.com/p/30973430092)

环境配置

一、Conda

PyTorch  2.5.1
Python  3.12(ubuntu22.04)
CUDA  12.4

git clone https://github.com/bagh2178/UniGoal.git
cd UniGoal
conda create -n unigoal python==3.8
# conda init
conda activate unigoal

# AutoDL的学术加速
source /etc/network_turbo
unset http_proxy && unset https_proxy

二、Habitat-sim-0.2.3 & Habitat-lab

请注意,这里的Python一定要是==3.8

conda install habitat-sim==0.2.3 -c conda-forge -c aihabitat
 withbullet headless --override-channels
pip install -e third_party/habitat-lab

三、安装第三方包

1. LightGlue (github.com/cvg/LightGlue.git)

功能:一种基于深度学习的局部特征匹配算法,专为稀疏特征匹配设计。它通过自适应计算机制动态调整推理深度和特征点数量,显著提升了匹配效率和精度

  • 支持自省机制,可提前终止简单图像对的推理过程
  • 在3D重建、SLAM等对实时性要求高的场景中表现优异

2. Detectron2 (github.com/facebookresearch/detectron2.git)

功能:Meta开源的计算机视觉工具库,支持目标检测、实例分割、关键点检测等任务

  • 提供预训练模型如Faster R-CNN、Mask R-CNN等
  • 支持灵活的配置系统和分布式训练

3. Grounded-Segment-Anything (IDEA-Research/Grounded-Segment-Anything)

核心组件:

  • Segment Anything (SAM)

    功能:Meta提出的通用图像分割模型,支持零样本分割(无需预训练即可分割任意物体)

    模型文件:sam_vit_h_4b8939.pth

    是其预训练权重,适用于高精度分割任务

  • GroundingDINO

    功能:基于文本提示的零样本目标检测模型,可将自然语言描述与图像中的物体关联

    • 模型文件groundingdino_swint_ogc.pth是其预训练权重,支持如“检测戴帽子的人”等复杂语义查询

4. 依赖项与模型文件

segment_anything:SAM的PyTorch实现,提供图像分割基础功能

GroundingDINO依赖:包含文本-图像特征融合模块,支持多模态目标检测

模型文件用途:

  • sam_vit_h_4b8939.pth:SAM的ViT-Huge模型权重,适合高精度分割。
  • groundingdino_swint_ogc.pth:GroundingDINO的Swin Transformer权重,支持文本引导检测。
pip install git+https://github.com/cvg/LightGlue.git
pip install git+https://github.com/facebookresearch/detectron2.git # 请保证有足够的运行内存编译
git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git third_party/Grounded-Segment-Anything
cd third_party/Grounded-Segment-Anything
git checkout 5cb813f # 会得到一个警告:分离头指针状态,不用理会
pip install -e segment_anything
pip install --no-build-isolation -e GroundingDINO
cd ../../
mkdir -p data/models/  # -p 参数自动创建多级目录,这里原作者没搞,wget -O 参数指定的是完整文件路径,不会自动创建目录
wget -O data/models/sam_vit_h_4b8939.pth https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
# data/models/sam_vit_h_4b 100%[=================================>]   2.39G  13.5MB/s    in 3m 9s

# source /etc/network_turbo 这里需要VPN ↓
wget -O data/models/groundingdino_swint_ogc.pth https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
# data/models/groundingdin 100%[=================================>] 661.85M   121MB/s    in 5.5s

四、安装别的小东西

pytorch::faiss-gpu 是 Meta AI 开发的 GPU 加速版 Faiss 库,专为大规模向量相似性搜索和聚类设计,可能和图搜索有关系

conda install pytorch::faiss-gpu
# 如果清华源抽风
# conda install pytorch::faiss-gpu -c pytorch -c nvidia -c anaconda --override-channels
pip install -r requirements.txt

至此,环境配置完成。

数据集(HM3D)

数据集下载

为了数据集组织正确,先创建文件夹:

mkdir -p data/datasets/instance_imagenav/hm3d/v3/
mkdir -p data/scene_datasets/hm3d_v0.2/val/

然后,下载两个压缩包:

https://dl.fbaipublicfiles.com/habitat/data/datasets/imagenav/hm3d/v3/instance_imagenav_hm3d_v3.zip

https://mp-app-prod.s3.amazonaws.com/habitat/v1.0/hm3d-val-habitat-v0.2.tar

最后,对于instance_imagenav_hm3d_v3.zip,把valtrainval_mini复制到data/datasets/instance_imagenav/hm3d/v3/下面

对于hm3d-val-habitat-v0.2.tar,一样操作

注意,需要这么组织数据集,名字也要一模一样:

UniGoal/
└── data/
    ├── datasets/
    │   └── instance_imagenav/
    │       └── hm3d/
    │           └── v3/
    │               └── val/
    │                   ├── content/
    │                   │   ├── 4ok3usBNeis.json.gz
    │                   │   ├── 5cdEh9F2hJL.json.gz
    │                   │   ├── ...
    │                   │   └── zt1RVoi7PcG.json.gz
    │                   └── val.json.gz
    └── scene_datasets/
        └── hm3d_v0.2/
            └── val/
                ├── 00800-TEEsavR23oF/
                │   ├── TEEsavR23oF.basis.glb
                │   └── TEEsavR23oF.basis.navmesh
                ├── 00801-HaxA7YrQdEC/
                ├── ...
                └── 00899-58NLZxWBSpk/

LLM and VLM

Option 1: Install Ollama.【我们这里使用这个选项】

Linux 文档 - Ollama 中文文档

# source /etc/network_turbo
export OLLAMA_MODELS=/root/autodl-tmp/ollama_models
echo 'export OLLAMA_MODELS=/root/autodl-tmp/ollama_models' >> ~/.bashrc
source ~/.bashrc
sudo mkdir -p ~/autodl-tmp/ollama_models

# Environment="OLLAMA_MODELS=/root/autodl-tmp/ollama_models"
# sudo vim /etc/systemd/system/ollama.service

curl -fsSL https://ollama.com/install.sh | sh
# 不用理会:
# WARNING: systemd is not running
# WARNING: Unable to detect NVIDIA/AMD GPU. Install lspci or lshw to automatically detect and install GPU dependencies.
# 如果你没有GPU的情况下,ollama会自行关闭,需要重新启动
ollama serve &

ollama pull llama3.2-vision
# ollama支持中断继续下载,放心Kill
export HF_ENDPOINT=https://hf-mirror.com

image-20250425170952781

Option 2: Use LLM and VLM via your own API. Change the llm_model, vlm_model, api_key, base_url in the configuration file configs/config_habitat.yaml to your own.

Evaluation

ollama serve & # 如果没启动
nohup ollama serve & # 这会启动Ollama后台服务。保持此终端运行或使用nohup将其置于后台
netstat -tuln | grep 11434 # 运行以下命令查看11434端口是否被监听

CUDA_VISIBLE_DEVICES=0 python main.py  # instance-image-goal

# 首次运行huggingface会出问题

export HF_ENDPOINT=https://hf-mirror.com

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

DeBug

报错ImportError: libEGL.so.1: cannot open shared object file: No such file or directory

觉得眼熟?ImportError正在全世界发生!

    import habitat_sim
  File "/root/miniconda3/envs/unigoal/lib/python3.8/site-packages/habitat_sim-0.2.3-py3.8-linux-x86_64.egg/habitat_sim/__init__.py", line 13, in <module>
    import habitat_sim._ext.habitat_sim_bindings
ImportError: libEGL.so.1: cannot open shared object file: No such file or directory
sudo apt-get update
sudo apt-get install libegl1 mesa-utils libgl1-mesa-glx

报错ModuleNotFoundError: No module named 'openai'

Traceback (most recent call last):
  File "main.py", line 15, in <module>
    from src.graph.graph import Graph
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 9, in <module>
    from openai import OpenAI
ModuleNotFoundError: No module named 'openai'
pip install openai

报错ImportError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory

Traceback (most recent call last):
  File "main.py", line 15, in <module>
    from src.graph.graph import Graph
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 28, in <module>
    from .utils.utils import filter_objects, gobs_to_detection_list
  File "/root/autodl-tmp/UniGoal/src/graph/utils/utils.py", line 7, in <module>
    import faiss
  File "/root/miniconda3/envs/unigoal/lib/python3.8/site-packages/faiss/__init__.py", line 16, in <module>
    from .loader import *
  File "/root/miniconda3/envs/unigoal/lib/python3.8/site-packages/faiss/loader.py", line 65, in <module>
    from .swigfaiss import *
  File "/root/miniconda3/envs/unigoal/lib/python3.8/site-packages/faiss/swigfaiss.py", line 13, in <module>
    from . import _swigfaiss
ImportError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory
conda install mkl=2021 

报错ImportError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory

num timesteps 0, episode_idx 0
update_observation 6...
    mapping3d...
        compute_spatial_similarities...
Traceback (most recent call last):
  File "main.py", line 259, in <module>
    main()
  File "main.py", line 151, in main
    graph.update_scenegraph()
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 716, in update_scenegraph
    self.mapping3d()
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 524, in mapping3d
    spatial_sim = compute_spatial_similarities(self.cfg, fg_detection_list, self.objects)
  File "/root/autodl-tmp/UniGoal/src/graph/utils/mapping.py", line 21, in compute_spatial_similarities
    spatial_sim = compute_overlap_matrix_2set(cfg, objects, detection_list)
  File "/root/autodl-tmp/UniGoal/src/graph/utils/utils.py", line 331, in compute_overlap_matrix_2set
    iou = compute_3d_iou_accuracte_batch(bbox_map, bbox_new) # (m, n)
  File "/root/autodl-tmp/UniGoal/src/graph/utils/iou.py", line 58, in compute_3d_iou_accuracte_batch
    import pytorch3d.ops as ops
ModuleNotFoundError: No module named 'pytorch3d'

pip install "git+https://github.com/facebookresearch/pytorch3d.git" -v ; /usr/bin/shutdown  
# 一定要从从源码安装,不然很逆天
# 会一堆警告,不用管,没停就是正常
# 要在有GPU的环境下(如果你是租用平台,就得开着GPU)
# 20分钟左右(实测23:44到00:07)

报错:graph.py 的878903AttributeError: 'numpy.ndarray' object has no attribute 'cpu'

Traceback (most recent call last):
  File "main.py", line 259, in <module>
    main()
  File "main.py", line 166, in main
    goal = graph.explore()
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 732, in explore
    goal = self.get_goal(goal)
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 903, in get_goal
    frontier_locations = frontier_locations.cpu().numpy()
AttributeError: 'numpy.ndarray' object has no attribute 'cpu'

修改/root/autodl-tmp/UniGoal/src/graph/graph.py878903行:

# frontier_locations = frontier_locations.cpu().numpy()
frontier_locations = torch.tensor(frontier_locations).cpu().numpy()

报错:graph.py get_goal(goal) 917AttributeError: 'numpy.ndarray' object has no attribute 'cpu'

    goal = graph.explore()
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 732, in explore
    goal = self.get_goal(goal)
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 917, in get_goal
    scores += distances_16_inverse
ValueError: operands could not be broadcast together with shapes (1016,) (934,) (1016,)

全面修改get_goal(goal) 函数:

def get_goal(self, goal=None):
    fbe_map = torch.zeros_like(self.full_map[0,0])
    if self.full_map.shape[1] == 1:
        fbe_map[self.fbe_free_map[0,0]>0] = 1  # first free 
        else:
            fbe_map[self.full_map[0,1]>0] = 1  # first free 
            fbe_map[skimage.morphology.binary_dilation(self.full_map[0,0].cpu().numpy(), skimage.morphology.disk(4))] = 3  # dilate obstacle

            fbe_cp = copy.deepcopy(fbe_map)
            fbe_cpp = copy.deepcopy(fbe_map)
            fbe_cp[fbe_cp==0] = 4  # unknown space
            fbe_cp[fbe_cp<4] = 0  # free and obstacle
            selem = skimage.morphology.disk(1)
            fbe_cpp[skimage.morphology.binary_dilation(fbe_cp.cpu().numpy(), selem)] = 0  # dilate unknown space

            diff = fbe_map - fbe_cpp  # frontier area
            frontier_map = diff == 1
            frontier_map = frontier_map & (self.num_of_goal < 3).to(frontier_map.device)
            frontier_locations = torch.stack([torch.where(frontier_map)[0], torch.where(frontier_map)[1]]).T
            num_frontiers = frontier_locations.shape[0]
            if num_frontiers == 0:
                return None

            # 统一使用初始前沿位置计算 --------------------------------------------------
            input_pose = np.zeros(7)
            input_pose[:3] = self.full_pose.cpu().numpy()
            input_pose[1] = self.map_size_cm/100 - input_pose[1]
            input_pose[2] = -input_pose[2]
            input_pose[4] = self.full_map.shape[-2]
            input_pose[6] = self.full_map.shape[-1]
            traversible, start = self.get_traversible(self.full_map.cpu().numpy()[0, 0, ::-1], input_pose)

            # 初始距离计算
            planner = FMMPlanner(traversible)
            state = [start[0] + 1, start[1] + 1]
            planner.set_goal(state)
            fmm_dist = planner.fmm_dist[::-1]

            # 统一将前沿位置转换为numpy并计算初始距离
            frontier_locations_np = frontier_locations.cpu().numpy() + 1  # 加1补偿坐标偏移
            distances = fmm_dist[frontier_locations_np[:,0], frontier_locations_np[:,1]] / 20

            # 初始前沿筛选
            distance_threshold = 3
            valid_mask = distances >= distance_threshold
            valid_distances = distances[valid_mask]
            valid_locations = frontier_locations_np[valid_mask]  # 保存有效前沿位置

            if len(valid_distances) == 0:
                return None

            # 初始化scores
            scores = 10 - (np.clip(valid_distances, 0, 10 + distance_threshold) - distance_threshold)

            # 处理传入的goal时使用同一组前沿位置 ----------------------------------------
            if isinstance(goal, (list, np.ndarray)):
                try:
                    # 使用已筛选的有效前沿计算新距离
                    planner_goal = FMMPlanner(traversible)
                    state_goal = [int(goal[0]) + 1, int(goal[1]) + 1]
                    planner_goal.set_goal(state_goal)
                    fmm_dist_goal = planner_goal.fmm_dist[::-1]

                    # 直接使用valid_locations计算距离
                    goal_distances = fmm_dist_goal[valid_locations[:,0], valid_locations[:,1]] / 20
                    goal_scores = 1 - (np.clip(goal_distances, 0, 10 + distance_threshold) - distance_threshold)/10
                    scores += goal_scores  # 保证形状一致

                    except Exception as e:
                        print(f"Goal processing error: {str(e)}")

                        # 最终目标选择
                        if len(scores) == 0:
                            return None
                        best_idx = np.argmax(scores)
                        final_goal = valid_locations[best_idx] - 1  # 补偿坐标偏移
                        return final_goal

KeyError: 6

rank:0, episode:8, cat_id:0, cat_name:chair
Traceback (most recent call last):
  File "main.py", line 259, in <module>
    main()
  File "main.py", line 205, in main
    obs, _, done, infos, observations_habitat = agent.step(agent_input)
  File "/root/autodl-tmp/UniGoal/src/agent/unigoal/agent.py", line 404, in step
    self.reset()
  File "/root/autodl-tmp/UniGoal/src/agent/unigoal/agent.py", line 89, in reset
    self.envs.set_goal_cat_id(idx)
  File "/root/autodl-tmp/UniGoal/src/envs/habitat/instanceimagegoal_env.py", line 264, in set_goal_cat_id
    self.info['goal_name'] = self.index2name[idx]
KeyError: 6

在agent.py 692行修改:

            if ((ins_whwh[0][2][0]+ins_whwh[0][2][2]-self.instance_imagegoal.shape[1])/2)**2 \
                    +((ins_whwh[0][2][1]+ins_whwh[0][2][3]-self.instance_imagegoal.shape[0])/2)**2 < \
                        ((self.instance_imagegoal.shape[1] / 6)**2 )*2:
                # return int(ins_whwh[0][0])
                cat_id = int(ins_whwh[0][0])
                # 添加有效性检查
                if cat_id in self.envs.index2name:
                	return cat_id
        return None
    ...

加入轮次选择 main.py





Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Ai2THOR-ProcTHOR
  • AutoDL最佳实践
  • 服务器使用——Tmux 保活进程
  • 怎么和别人和谐共处的使用服务器上的GPU
  • 【FunHPC服务器远程桌面】安装x11、桌面环境和vncserver