UniGoal

在机器人导航领域，现有方法通常针对特定任务（如物体导航、图像导航或文本导航）设计独立框架，导致模型泛化能力受限且难以应对真实场景中复杂的多模态指令。为此，本研究提出UniGoal ——首个零样本学习的通用目标导航框架，通过统一的图表示与大语言模型（LLM）推理，实现了跨物体类别、图像和文本描述三类任务的零样本导航。其核心创新在于：（1）构建动态场景图与目标图的统一表示，将环境感知与目标描述转化为结构化图，保留丰富的空间与语义关系；（2）设计多阶段探索策略，根据图匹配程度动态调整探索策略，在零匹配阶段通过子图分解逐步探索未知区域，部分匹配时利用坐标投影与锚点对齐推断目标位置，完全匹配时通过场景图修正与验证确保定位准确。此外，引入黑名单机制避免重复探索失败区域，显著提升效率。

UniGoal的推理流程以图匹配为核心，结合LLM的推理能力实现高效决策。在场景图构建中，实时融合RGB-D观测中的语义信息，形成拓扑结构。目标图则根据输入类型（物体类别、图像或文本）通过LLM/VLM处理生成，确保与场景图表示的一致性。实验表明，UniGoal在MatterPort3D、HM3D等数据集上全面超越现有零样本的专用任务方法（如Mod-IIN[1]、SG-Nav[2]）及需训练的通用方法（如GOAT[3]），在物体导航（成功率41.0%）、图像实例导航（成功率60.2%）和文本导航（成功率20.2%）任务中均达最优性能。其无需训练的特性与多模态通用性，为机器人在未知环境中的灵活导航提供了全新范式，拥有在真实场景落地的潜力。

——[CVPR2025

UniGoal：通用零样本目标导航，Navigate to Any Goal! - 知乎 (zhihu.com)](https://zhuanlan.zhihu.com/p/30973430092)

环境配置

一、Conda

PyTorch  2.5.1
Python  3.12(ubuntu22.04)
CUDA  12.4

git clone https://github.com/bagh2178/UniGoal.git
cd UniGoal
conda create -n unigoal python==3.8
# conda init
conda activate unigoal

# AutoDL的学术加速
source /etc/network_turbo
unset http_proxy && unset https_proxy

二、Habitat-sim-0.2.3 & Habitat-lab

请注意，这里的Python一定要是==3.8

conda install habitat-sim==0.2.3 -c conda-forge -c aihabitat
 withbullet headless --override-channels
pip install -e third_party/habitat-lab

三、安装第三方包

1. LightGlue (`github.com/cvg/LightGlue.git`)

功能：一种基于深度学习的局部特征匹配算法，专为稀疏特征匹配设计。它通过自适应计算机制动态调整推理深度和特征点数量，显著提升了匹配效率和精度

支持自省机制，可提前终止简单图像对的推理过程
在3D重建、SLAM等对实时性要求高的场景中表现优异

2. Detectron2 (`github.com/facebookresearch/detectron2.git`)

功能：Meta开源的计算机视觉工具库，支持目标检测、实例分割、关键点检测等任务

提供预训练模型如Faster R-CNN、Mask R-CNN等
支持灵活的配置系统和分布式训练

3. Grounded-Segment-Anything (`IDEA-Research/Grounded-Segment-Anything`)

核心组件:

Segment Anything (SAM)

功能：Meta提出的通用图像分割模型，支持零样本分割（无需预训练即可分割任意物体）

模型文件：sam_vit_h_4b8939.pth

是其预训练权重，适用于高精度分割任务
GroundingDINO

功能：基于文本提示的零样本目标检测模型，可将自然语言描述与图像中的物体关联
- 模型文件：groundingdino_swint_ogc.pth是其预训练权重，支持如“检测戴帽子的人”等复杂语义查询

4. 依赖项与模型文件

segment_anything：SAM的PyTorch实现，提供图像分割基础功能

GroundingDINO依赖：包含文本-图像特征融合模块，支持多模态目标检测

模型文件用途：

sam_vit_h_4b8939.pth：SAM的ViT-Huge模型权重，适合高精度分割。
groundingdino_swint_ogc.pth：GroundingDINO的Swin Transformer权重，支持文本引导检测。

pip install git+https://github.com/cvg/LightGlue.git
pip install git+https://github.com/facebookresearch/detectron2.git # 请保证有足够的运行内存编译
git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git third_party/Grounded-Segment-Anything
cd third_party/Grounded-Segment-Anything
git checkout 5cb813f # 会得到一个警告：分离头指针状态，不用理会
pip install -e segment_anything
pip install --no-build-isolation -e GroundingDINO
cd ../../
mkdir -p data/models/  # -p 参数自动创建多级目录，这里原作者没搞，wget -O 参数指定的是完整文件路径，不会自动创建目录
wget -O data/models/sam_vit_h_4b8939.pth https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
# data/models/sam_vit_h_4b 100%[=================================>]   2.39G  13.5MB/s    in 3m 9s

# source /etc/network_turbo 这里需要VPN ↓
wget -O data/models/groundingdino_swint_ogc.pth https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
# data/models/groundingdin 100%[=================================>] 661.85M   121MB/s    in 5.5s

四、安装别的小东西

pytorch::faiss-gpu 是 Meta AI 开发的 GPU 加速版 Faiss 库，专为大规模向量相似性搜索和聚类设计，可能和图搜索有关系

conda install pytorch::faiss-gpu
# 如果清华源抽风
# conda install pytorch::faiss-gpu -c pytorch -c nvidia -c anaconda --override-channels
pip install -r requirements.txt

至此，环境配置完成。

数据集（HM3D）

数据集下载

为了数据集组织正确，先创建文件夹：

mkdir -p data/datasets/instance_imagenav/hm3d/v3/
mkdir -p data/scene_datasets/hm3d_v0.2/val/

然后，下载两个压缩包：

https://dl.fbaipublicfiles.com/habitat/data/datasets/imagenav/hm3d/v3/instance_imagenav_hm3d_v3.zip

https://mp-app-prod.s3.amazonaws.com/habitat/v1.0/hm3d-val-habitat-v0.2.tar

最后，对于instance_imagenav_hm3d_v3.zip，把val、train和val_mini复制到data/datasets/instance_imagenav/hm3d/v3/下面

对于hm3d-val-habitat-v0.2.tar，一样操作

注意，需要这么组织数据集，名字也要一模一样：

UniGoal/
└── data/
    ├── datasets/
    │   └── instance_imagenav/
    │       └── hm3d/
    │           └── v3/
    │               └── val/
    │                   ├── content/
    │                   │   ├── 4ok3usBNeis.json.gz
    │                   │   ├── 5cdEh9F2hJL.json.gz
    │                   │   ├── ...
    │                   │   └── zt1RVoi7PcG.json.gz
    │                   └── val.json.gz
    └── scene_datasets/
        └── hm3d_v0.2/
            └── val/
                ├── 00800-TEEsavR23oF/
                │   ├── TEEsavR23oF.basis.glb
                │   └── TEEsavR23oF.basis.navmesh
                ├── 00801-HaxA7YrQdEC/
                ├── ...
                └── 00899-58NLZxWBSpk/

LLM and VLM

Option 1: Install Ollama.【我们这里使用这个选项】

Linux 文档 - Ollama 中文文档

# source /etc/network_turbo
export OLLAMA_MODELS=/root/autodl-tmp/ollama_models
echo 'export OLLAMA_MODELS=/root/autodl-tmp/ollama_models' >> ~/.bashrc
source ~/.bashrc
sudo mkdir -p ~/autodl-tmp/ollama_models

# Environment="OLLAMA_MODELS=/root/autodl-tmp/ollama_models"
# sudo vim /etc/systemd/system/ollama.service

curl -fsSL https://ollama.com/install.sh | sh
# 不用理会：
# WARNING: systemd is not running
# WARNING: Unable to detect NVIDIA/AMD GPU. Install lspci or lshw to automatically detect and install GPU dependencies.
# 如果你没有GPU的情况下，ollama会自行关闭，需要重新启动
ollama serve &

ollama pull llama3.2-vision
# ollama支持中断继续下载，放心Kill
export HF_ENDPOINT=https://hf-mirror.com

Option 2: Use LLM and VLM via your own API. Change the llm_model, vlm_model, api_key, base_url in the configuration file configs/config_habitat.yaml to your own.

Evaluation

ollama serve & # 如果没启动
nohup ollama serve & # 这会启动Ollama后台服务。保持此终端运行或使用nohup将其置于后台
netstat -tuln | grep 11434 # 运行以下命令查看11434端口是否被监听

CUDA_VISIBLE_DEVICES=0 python main.py  # instance-image-goal

# 首次运行huggingface会出问题

export HF_ENDPOINT=https://hf-mirror.com

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

DeBug

报错ImportError: libEGL.so.1: cannot open shared object file: No such file or directory

觉得眼熟？ImportError正在全世界发生！

    import habitat_sim
  File "/root/miniconda3/envs/unigoal/lib/python3.8/site-packages/habitat_sim-0.2.3-py3.8-linux-x86_64.egg/habitat_sim/__init__.py", line 13, in <module>
    import habitat_sim._ext.habitat_sim_bindings
ImportError: libEGL.so.1: cannot open shared object file: No such file or directory

sudo apt-get update
sudo apt-get install libegl1 mesa-utils libgl1-mesa-glx

报错ModuleNotFoundError: No module named 'openai'

Traceback (most recent call last):
  File "main.py", line 15, in <module>
    from src.graph.graph import Graph
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 9, in <module>
    from openai import OpenAI
ModuleNotFoundError: No module named 'openai'

pip install openai

报错ImportError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory

Traceback (most recent call last):
  File "main.py", line 15, in <module>
    from src.graph.graph import Graph
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 28, in <module>
    from .utils.utils import filter_objects, gobs_to_detection_list
  File "/root/autodl-tmp/UniGoal/src/graph/utils/utils.py", line 7, in <module>
    import faiss
  File "/root/miniconda3/envs/unigoal/lib/python3.8/site-packages/faiss/__init__.py", line 16, in <module>
    from .loader import *
  File "/root/miniconda3/envs/unigoal/lib/python3.8/site-packages/faiss/loader.py", line 65, in <module>
    from .swigfaiss import *
  File "/root/miniconda3/envs/unigoal/lib/python3.8/site-packages/faiss/swigfaiss.py", line 13, in <module>
    from . import _swigfaiss
ImportError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory

conda install mkl=2021 

报错ImportError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory

num timesteps 0, episode_idx 0
update_observation 6...
    mapping3d...
        compute_spatial_similarities...
Traceback (most recent call last):
  File "main.py", line 259, in <module>
    main()
  File "main.py", line 151, in main
    graph.update_scenegraph()
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 716, in update_scenegraph
    self.mapping3d()
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 524, in mapping3d
    spatial_sim = compute_spatial_similarities(self.cfg, fg_detection_list, self.objects)
  File "/root/autodl-tmp/UniGoal/src/graph/utils/mapping.py", line 21, in compute_spatial_similarities
    spatial_sim = compute_overlap_matrix_2set(cfg, objects, detection_list)
  File "/root/autodl-tmp/UniGoal/src/graph/utils/utils.py", line 331, in compute_overlap_matrix_2set
    iou = compute_3d_iou_accuracte_batch(bbox_map, bbox_new) # (m, n)
  File "/root/autodl-tmp/UniGoal/src/graph/utils/iou.py", line 58, in compute_3d_iou_accuracte_batch
    import pytorch3d.ops as ops
ModuleNotFoundError: No module named 'pytorch3d'

pip install "git+https://github.com/facebookresearch/pytorch3d.git" -v ; /usr/bin/shutdown  
# 一定要从从源码安装，不然很逆天
# 会一堆警告，不用管，没停就是正常
# 要在有GPU的环境下（如果你是租用平台，就得开着GPU）
# 20分钟左右（实测23:44到00:07）

报错：graph.py 的878和903行 AttributeError: 'numpy.ndarray' object has no attribute 'cpu'

Traceback (most recent call last):
  File "main.py", line 259, in <module>
    main()
  File "main.py", line 166, in main
    goal = graph.explore()
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 732, in explore
    goal = self.get_goal(goal)
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 903, in get_goal
    frontier_locations = frontier_locations.cpu().numpy()
AttributeError: 'numpy.ndarray' object has no attribute 'cpu'

修改/root/autodl-tmp/UniGoal/src/graph/graph.py的878和903行：

# frontier_locations = frontier_locations.cpu().numpy()
frontier_locations = torch.tensor(frontier_locations).cpu().numpy()

报错：graph.py get_goal(goal) 917 行AttributeError: 'numpy.ndarray' object has no attribute 'cpu'

    goal = graph.explore()
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 732, in explore
    goal = self.get_goal(goal)
  File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 917, in get_goal
    scores += distances_16_inverse
ValueError: operands could not be broadcast together with shapes (1016,) (934,) (1016,)

全面修改get_goal(goal) 函数：

def get_goal(self, goal=None):
    fbe_map = torch.zeros_like(self.full_map[0,0])
    if self.full_map.shape[1] == 1:
        fbe_map[self.fbe_free_map[0,0]>0] = 1  # first free 
        else:
            fbe_map[self.full_map[0,1]>0] = 1  # first free 
            fbe_map[skimage.morphology.binary_dilation(self.full_map[0,0].cpu().numpy(), skimage.morphology.disk(4))] = 3  # dilate obstacle

            fbe_cp = copy.deepcopy(fbe_map)
            fbe_cpp = copy.deepcopy(fbe_map)
            fbe_cp[fbe_cp==0] = 4  # unknown space
            fbe_cp[fbe_cp<4] = 0  # free and obstacle
            selem = skimage.morphology.disk(1)
            fbe_cpp[skimage.morphology.binary_dilation(fbe_cp.cpu().numpy(), selem)] = 0  # dilate unknown space

            diff = fbe_map - fbe_cpp  # frontier area
            frontier_map = diff == 1
            frontier_map = frontier_map & (self.num_of_goal < 3).to(frontier_map.device)
            frontier_locations = torch.stack([torch.where(frontier_map)[0], torch.where(frontier_map)[1]]).T
            num_frontiers = frontier_locations.shape[0]
            if num_frontiers == 0:
                return None

            # 统一使用初始前沿位置计算 --------------------------------------------------
            input_pose = np.zeros(7)
            input_pose[:3] = self.full_pose.cpu().numpy()
            input_pose[1] = self.map_size_cm/100 - input_pose[1]
            input_pose[2] = -input_pose[2]
            input_pose[4] = self.full_map.shape[-2]
            input_pose[6] = self.full_map.shape[-1]
            traversible, start = self.get_traversible(self.full_map.cpu().numpy()[0, 0, ::-1], input_pose)

            # 初始距离计算
            planner = FMMPlanner(traversible)
            state = [start[0] + 1, start[1] + 1]
            planner.set_goal(state)
            fmm_dist = planner.fmm_dist[::-1]

            # 统一将前沿位置转换为numpy并计算初始距离
            frontier_locations_np = frontier_locations.cpu().numpy() + 1  # 加1补偿坐标偏移
            distances = fmm_dist[frontier_locations_np[:,0], frontier_locations_np[:,1]] / 20

            # 初始前沿筛选
            distance_threshold = 3
            valid_mask = distances >= distance_threshold
            valid_distances = distances[valid_mask]
            valid_locations = frontier_locations_np[valid_mask]  # 保存有效前沿位置

            if len(valid_distances) == 0:
                return None

            # 初始化scores
            scores = 10 - (np.clip(valid_distances, 0, 10 + distance_threshold) - distance_threshold)

            # 处理传入的goal时使用同一组前沿位置 ----------------------------------------
            if isinstance(goal, (list, np.ndarray)):
                try:
                    # 使用已筛选的有效前沿计算新距离
                    planner_goal = FMMPlanner(traversible)
                    state_goal = [int(goal[0]) + 1, int(goal[1]) + 1]
                    planner_goal.set_goal(state_goal)
                    fmm_dist_goal = planner_goal.fmm_dist[::-1]

                    # 直接使用valid_locations计算距离
                    goal_distances = fmm_dist_goal[valid_locations[:,0], valid_locations[:,1]] / 20
                    goal_scores = 1 - (np.clip(goal_distances, 0, 10 + distance_threshold) - distance_threshold)/10
                    scores += goal_scores  # 保证形状一致

                    except Exception as e:
                        print(f"Goal processing error: {str(e)}")

                        # 最终目标选择
                        if len(scores) == 0:
                            return None
                        best_idx = np.argmax(scores)
                        final_goal = valid_locations[best_idx] - 1  # 补偿坐标偏移
                        return final_goal

KeyError: 6

rank:0, episode:8, cat_id:0, cat_name:chair
Traceback (most recent call last):
  File "main.py", line 259, in <module>
    main()
  File "main.py", line 205, in main
    obs, _, done, infos, observations_habitat = agent.step(agent_input)
  File "/root/autodl-tmp/UniGoal/src/agent/unigoal/agent.py", line 404, in step
    self.reset()
  File "/root/autodl-tmp/UniGoal/src/agent/unigoal/agent.py", line 89, in reset
    self.envs.set_goal_cat_id(idx)
  File "/root/autodl-tmp/UniGoal/src/envs/habitat/instanceimagegoal_env.py", line 264, in set_goal_cat_id
    self.info['goal_name'] = self.index2name[idx]
KeyError: 6

在agent.py 692行修改：

            if ((ins_whwh[0][2][0]+ins_whwh[0][2][2]-self.instance_imagegoal.shape[1])/2)**2 \
                    +((ins_whwh[0][2][1]+ins_whwh[0][2][3]-self.instance_imagegoal.shape[0])/2)**2 < \
                        ((self.instance_imagegoal.shape[1] / 6)**2 )*2:
                # return int(ins_whwh[0][0])
                cat_id = int(ins_whwh[0][0])
                # 添加有效性检查
                if cat_id in self.envs.index2name:
                	return cat_id
        return None
    ...

加入轮次选择 main.py