Ai2THOR-ProcTHOR
Ai2THOR-ProcTHOR
ProcTHOR 使用程序生成对大规模多样化、逼真、交互式、可定制和高性能的 3D 环境进行采样,以训练模拟的体现代理。以下是对虚拟家庭环境进行采样的示例。
环境配置
一、安装
pip install ai2thor
# or
conda install -c conda-forge ai2thor
二、Habitat-sim-0.2.3 & Habitat-lab
请注意,这里的Python一定要是==3.8
conda install habitat-sim==0.2.3 -c conda-forge -c aihabitat
pip install -e third_party/habitat-lab
三、安装第三方包
1. LightGlue (github.com/cvg/LightGlue.git
)
功能:一种基于深度学习的局部特征匹配算法,专为稀疏特征匹配设计。它通过自适应计算机制动态调整推理深度和特征点数量,显著提升了匹配效率和精度
- 支持自省机制,可提前终止简单图像对的推理过程
- 在3D重建、SLAM等对实时性要求高的场景中表现优异
2. Detectron2 (github.com/facebookresearch/detectron2.git
)
功能:Meta开源的计算机视觉工具库,支持目标检测、实例分割、关键点检测等任务
- 提供预训练模型如Faster R-CNN、Mask R-CNN等
- 支持灵活的配置系统和分布式训练
3. Grounded-Segment-Anything (IDEA-Research/Grounded-Segment-Anything
)
核心组件:
-
Segment Anything (SAM)
功能:Meta提出的通用图像分割模型,支持零样本分割(无需预训练即可分割任意物体)
模型文件:
sam_vit_h_4b8939.pth
是其预训练权重,适用于高精度分割任务
-
GroundingDINO
功能:基于文本提示的零样本目标检测模型,可将自然语言描述与图像中的物体关联
- 模型文件:
groundingdino_swint_ogc.pth
是其预训练权重,支持如“检测戴帽子的人”等复杂语义查询
- 模型文件:
4. 依赖项与模型文件
segment_anything
:SAM的PyTorch实现,提供图像分割基础功能
GroundingDINO依赖
:包含文本-图像特征融合模块,支持多模态目标检测
模型文件用途:
-
sam_vit_h_4b8939.pth
:SAM的ViT-Huge模型权重,适合高精度分割。 -
groundingdino_swint_ogc.pth
:GroundingDINO的Swin Transformer权重,支持文本引导检测。
pip install git+https://github.com/cvg/LightGlue.git
pip install git+https://github.com/facebookresearch/detectron2.git # 请保证有足够的运行内存编译
git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git third_party/Grounded-Segment-Anything
cd third_party/Grounded-Segment-Anything
git checkout 5cb813f # 会得到一个警告:分离头指针状态,不用理会
pip install -e segment_anything
pip install --no-build-isolation -e GroundingDINO
cd ../../
mkdir -p data/models/ # -p 参数自动创建多级目录,这里原作者没搞,wget -O 参数指定的是完整文件路径,不会自动创建目录
wget -O data/models/sam_vit_h_4b8939.pth https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
# data/models/sam_vit_h_4b 100%[=================================>] 2.39G 13.5MB/s in 3m 9s
# source /etc/network_turbo 这里需要VPN ↓
wget -O data/models/groundingdino_swint_ogc.pth https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
# data/models/groundingdin 100%[=================================>] 661.85M 121MB/s in 5.5s
四、安装别的小东西
pytorch::faiss-gpu
是 Meta AI 开发的 GPU 加速版 Faiss 库,专为大规模向量相似性搜索和聚类设计,可能和图搜索有关系
conda install pytorch::faiss-gpu
pip install -r requirements.txt
至此,环境配置完成。
数据集(HM3D)
数据集下载
为了数据集组织正确,先创建文件夹:
mkdir -p data/datasets/instance_imagenav/hm3d/v3/
mkdir -p data/scene_datasets/hm3d_v0.2/val/
然后,下载两个压缩包:
https://dl.fbaipublicfiles.com/habitat/data/datasets/imagenav/hm3d/v3/instance_imagenav_hm3d_v3.zip
https://mp-app-prod.s3.amazonaws.com/habitat/v1.0/hm3d-val-habitat-v0.2.tar
最后,对于instance_imagenav_hm3d_v3.zip
,把val
、train
和val_mini
复制到data/datasets/instance_imagenav/hm3d/v3/
下面
对于hm3d-val-habitat-v0.2.tar
,一样操作
注意,需要这么组织数据集,名字也要一模一样:
UniGoal/
└── data/
├── datasets/
│ └── instance_imagenav/
│ └── hm3d/
│ └── v3/
│ └── val/
│ ├── content/
│ │ ├── 4ok3usBNeis.json.gz
│ │ ├── 5cdEh9F2hJL.json.gz
│ │ ├── ...
│ │ └── zt1RVoi7PcG.json.gz
│ └── val.json.gz
└── scene_datasets/
└── hm3d_v0.2/
└── val/
├── 00800-TEEsavR23oF/
│ ├── TEEsavR23oF.basis.glb
│ └── TEEsavR23oF.basis.navmesh
├── 00801-HaxA7YrQdEC/
├── ...
└── 00899-58NLZxWBSpk/
LLM and VLM
Option 1: Install Ollama.【我们这里使用这个选项】
# source /etc/network_turbo
curl -fsSL https://ollama.com/install.sh | sh
# 如果你没有GPU的情况下,ollama会自行关闭,需要重新启动
ollama serve &
ollama pull llama3.2-vision
# ollama支持中断继续下载,放心Kill
export HF_ENDPOINT=https://hf-mirror.com
Option 2: Use LLM and VLM via your own API. Change the llm_model
, vlm_model
, api_key
, base_url
in the configuration file configs/config_habitat.yaml
to your own.
Evaluation
ollama serve & # 如果没启动
nohup ollama serve & # 这会启动Ollama后台服务。保持此终端运行或使用nohup将其置于后台
netstat -tuln | grep 11434 # 运行以下命令查看11434端口是否被监听
CUDA_VISIBLE_DEVICES=0 python main.py # instance-image-goal
DeBug
报错ImportError: libEGL.so.1: cannot open shared object file: No such file or directory
觉得眼熟?ImportError正在全世界发生!
import habitat_sim
File "/root/miniconda3/envs/unigoal/lib/python3.8/site-packages/habitat_sim-0.2.3-py3.8-linux-x86_64.egg/habitat_sim/__init__.py", line 13, in <module>
import habitat_sim._ext.habitat_sim_bindings
ImportError: libEGL.so.1: cannot open shared object file: No such file or directory
sudo apt-get update
sudo apt-get install libegl1 mesa-utils libgl1-mesa-glx
报错ModuleNotFoundError: No module named 'openai'
Traceback (most recent call last):
File "main.py", line 15, in <module>
from src.graph.graph import Graph
File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 9, in <module>
from openai import OpenAI
ModuleNotFoundError: No module named 'openai'
pip install openai
报错ImportError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "main.py", line 15, in <module>
from src.graph.graph import Graph
File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 28, in <module>
from .utils.utils import filter_objects, gobs_to_detection_list
File "/root/autodl-tmp/UniGoal/src/graph/utils/utils.py", line 7, in <module>
import faiss
File "/root/miniconda3/envs/unigoal/lib/python3.8/site-packages/faiss/__init__.py", line 16, in <module>
from .loader import *
File "/root/miniconda3/envs/unigoal/lib/python3.8/site-packages/faiss/loader.py", line 65, in <module>
from .swigfaiss import *
File "/root/miniconda3/envs/unigoal/lib/python3.8/site-packages/faiss/swigfaiss.py", line 13, in <module>
from . import _swigfaiss
ImportError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory
conda install mkl=2021
报错ImportError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory
num timesteps 0, episode_idx 0
update_observation 6...
mapping3d...
compute_spatial_similarities...
Traceback (most recent call last):
File "main.py", line 259, in <module>
main()
File "main.py", line 151, in main
graph.update_scenegraph()
File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 716, in update_scenegraph
self.mapping3d()
File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 524, in mapping3d
spatial_sim = compute_spatial_similarities(self.cfg, fg_detection_list, self.objects)
File "/root/autodl-tmp/UniGoal/src/graph/utils/mapping.py", line 21, in compute_spatial_similarities
spatial_sim = compute_overlap_matrix_2set(cfg, objects, detection_list)
File "/root/autodl-tmp/UniGoal/src/graph/utils/utils.py", line 331, in compute_overlap_matrix_2set
iou = compute_3d_iou_accuracte_batch(bbox_map, bbox_new) # (m, n)
File "/root/autodl-tmp/UniGoal/src/graph/utils/iou.py", line 58, in compute_3d_iou_accuracte_batch
import pytorch3d.ops as ops
ModuleNotFoundError: No module named 'pytorch3d'
pip install "git+https://github.com/facebookresearch/pytorch3d.git" -v ; /usr/bin/shutdown
# 一定要从从源码安装,不然很逆天
# 会一堆警告,不用管,没停就是正常
# 要在有GPU的环境下(如果你是租用平台,就得开着GPU)
# 20分钟左右(实测23:44到00:07)
报错:graph.py 的878
和903
行 AttributeError: 'numpy.ndarray' object has no attribute 'cpu'
Traceback (most recent call last):
File "main.py", line 259, in <module>
main()
File "main.py", line 166, in main
goal = graph.explore()
File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 732, in explore
goal = self.get_goal(goal)
File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 903, in get_goal
frontier_locations = frontier_locations.cpu().numpy()
AttributeError: 'numpy.ndarray' object has no attribute 'cpu'
修改/root/autodl-tmp/UniGoal/src/graph/graph.py
的878
和903
行:
# frontier_locations = frontier_locations.cpu().numpy()
frontier_locations = torch.tensor(frontier_locations).cpu().numpy()
报错:graph.py get_goal(goal)
917
行AttributeError: 'numpy.ndarray' object has no attribute 'cpu'
goal = graph.explore()
File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 732, in explore
goal = self.get_goal(goal)
File "/root/autodl-tmp/UniGoal/src/graph/graph.py", line 917, in get_goal
scores += distances_16_inverse
ValueError: operands could not be broadcast together with shapes (1016,) (934,) (1016,)
全面修改get_goal(goal)
函数:
def get_goal(self, goal=None):
fbe_map = torch.zeros_like(self.full_map[0,0])
if self.full_map.shape[1] == 1:
fbe_map[self.fbe_free_map[0,0]>0] = 1 # first free
else:
fbe_map[self.full_map[0,1]>0] = 1 # first free
fbe_map[skimage.morphology.binary_dilation(self.full_map[0,0].cpu().numpy(), skimage.morphology.disk(4))] = 3 # dilate obstacle
fbe_cp = copy.deepcopy(fbe_map)
fbe_cpp = copy.deepcopy(fbe_map)
fbe_cp[fbe_cp==0] = 4 # unknown space
fbe_cp[fbe_cp<4] = 0 # free and obstacle
selem = skimage.morphology.disk(1)
fbe_cpp[skimage.morphology.binary_dilation(fbe_cp.cpu().numpy(), selem)] = 0 # dilate unknown space
diff = fbe_map - fbe_cpp # frontier area
frontier_map = diff == 1
frontier_map = frontier_map & (self.num_of_goal < 3).to(frontier_map.device)
frontier_locations = torch.stack([torch.where(frontier_map)[0], torch.where(frontier_map)[1]]).T
num_frontiers = frontier_locations.shape[0]
if num_frontiers == 0:
return None
# 统一使用初始前沿位置计算 --------------------------------------------------
input_pose = np.zeros(7)
input_pose[:3] = self.full_pose.cpu().numpy()
input_pose[1] = self.map_size_cm/100 - input_pose[1]
input_pose[2] = -input_pose[2]
input_pose[4] = self.full_map.shape[-2]
input_pose[6] = self.full_map.shape[-1]
traversible, start = self.get_traversible(self.full_map.cpu().numpy()[0, 0, ::-1], input_pose)
# 初始距离计算
planner = FMMPlanner(traversible)
state = [start[0] + 1, start[1] + 1]
planner.set_goal(state)
fmm_dist = planner.fmm_dist[::-1]
# 统一将前沿位置转换为numpy并计算初始距离
frontier_locations_np = frontier_locations.cpu().numpy() + 1 # 加1补偿坐标偏移
distances = fmm_dist[frontier_locations_np[:,0], frontier_locations_np[:,1]] / 20
# 初始前沿筛选
distance_threshold = 3
valid_mask = distances >= distance_threshold
valid_distances = distances[valid_mask]
valid_locations = frontier_locations_np[valid_mask] # 保存有效前沿位置
if len(valid_distances) == 0:
return None
# 初始化scores
scores = 10 - (np.clip(valid_distances, 0, 10 + distance_threshold) - distance_threshold)
# 处理传入的goal时使用同一组前沿位置 ----------------------------------------
if isinstance(goal, (list, np.ndarray)):
try:
# 使用已筛选的有效前沿计算新距离
planner_goal = FMMPlanner(traversible)
state_goal = [int(goal[0]) + 1, int(goal[1]) + 1]
planner_goal.set_goal(state_goal)
fmm_dist_goal = planner_goal.fmm_dist[::-1]
# 直接使用valid_locations计算距离
goal_distances = fmm_dist_goal[valid_locations[:,0], valid_locations[:,1]] / 20
goal_scores = 1 - (np.clip(goal_distances, 0, 10 + distance_threshold) - distance_threshold)/10
scores += goal_scores # 保证形状一致
except Exception as e:
print(f"Goal processing error: {str(e)}")
# 最终目标选择
if len(scores) == 0:
return None
best_idx = np.argmax(scores)
final_goal = valid_locations[best_idx] - 1 # 补偿坐标偏移
return final_goal
KeyError: 6
rank:0, episode:8, cat_id:0, cat_name:chair
Traceback (most recent call last):
File "main.py", line 259, in <module>
main()
File "main.py", line 205, in main
obs, _, done, infos, observations_habitat = agent.step(agent_input)
File "/root/autodl-tmp/UniGoal/src/agent/unigoal/agent.py", line 404, in step
self.reset()
File "/root/autodl-tmp/UniGoal/src/agent/unigoal/agent.py", line 89, in reset
self.envs.set_goal_cat_id(idx)
File "/root/autodl-tmp/UniGoal/src/envs/habitat/instanceimagegoal_env.py", line 264, in set_goal_cat_id
self.info['goal_name'] = self.index2name[idx]
KeyError: 6
在agent.py 692行修改:
if ((ins_whwh[0][2][0]+ins_whwh[0][2][2]-self.instance_imagegoal.shape[1])/2)**2 \
+((ins_whwh[0][2][1]+ins_whwh[0][2][3]-self.instance_imagegoal.shape[0])/2)**2 < \
((self.instance_imagegoal.shape[1] / 6)**2 )*2:
# return int(ins_whwh[0][0])
cat_id = int(ins_whwh[0][0])
# 添加有效性检查
if cat_id in self.envs.index2name:
return cat_id
return None
...
加入轮次选择 main.py
Enjoy Reading This Article?
Here are some more articles you might like to read next: