基于传统方法的单目深度估计-基于马尔可夫随机场的深度估计方法

深度学习

127 浏览量 2023-11-10 20:45:39 上传评论 1 收藏 1.89MB PDF 举报

资源推荐

资源详情

资源评论

Learning 3-D Scene Structure from a Single Still Image

Ashutosh Saxena, Min Sun and Andrew Y. Ng

Computer Science Department, Stanford University, Stanford, CA 94305

{asaxena,aliensun,ang}@cs.stanford.edu

Abstract

We consider the problem of estimating detailed 3-d struc-

ture from a single still image of an unstructured environment.

Our goal is to create 3-d models which are both quantita-

tively accurate as well as visually pleasing.

For each small homogeneous patch in the image, we use a

Markov Random Field (MRF) to infer a set of “plane param-

eters” that capture both the 3-d location and 3-d orienta-

tion of the patch. The MRF, trained via supervised learning,

models both image depth cues as well as the relationships

between different parts of the image. Inference in our model

is tractable, and requires only solving a convex optimiza-

tion problem. Other than assuming that the environment is

made up of a number of small planes, our model makes no

explicit assumptions about the structure of the scene; this

enables the algorithm to capture much more detailed 3-d

structure than does prior art (such as Saxena et al., 2005,

Delage et al., 2005, and Hoiem et el., 2005), and also give

a much richer experience in the 3-d ﬂythroughs created us-

ing image-based rendering, even for scenes with signiﬁcant

non-vertical structure.

Using this approach, we have created qualitatively cor-

rect 3-d models for 64.9% of 588 images downloaded from

the internet, as compared to Hoiem et al.’s performance of

33.1%. Further, our models are quantitatively more accu-

rate than either Saxena et al. or Hoiem et al.

1. Introduction

When viewing an image such as that in Fig. 1a, a human

has no difﬁculty understanding its 3-d structure (Fig. 1b).

However, inferring the 3-d structure remains extremely chal-

lenging for current computer vision systems—there is an in-

trinsic ambiguity between local image features and the 3-d

location of t he point, due to perspective projection.

Most work on 3-d reconstruction has focused on using

methods such as stereovision [16] or structure from mo-

tion [6], which require two (or more) images. Some methods

can estimate 3-d models from a single image, but they make

strong assumptions about the scene and work in speciﬁc set-

tings only. For example, shape from shading [18], relies on

purely photometric cues and is difﬁcult to apply to surfaces

that do not have fairly uniform color and texture. Crimin-

isi, Reid and Zisserman [1] used known vanishing points to

Figure 1. (a) A single image. (b) A screenshot of the 3-d model

generated by our algorithm.

determine an afﬁne structure of the image.

In recent work, Saxena, Chung and Ng (SCN) [13, 14]

presented an algorithm for predicting depth from monocular

image features. However, their depthmaps, although use-

ful for tasks such as a robot driving [ 12] or improving per-

formance of stereovision [15], were not accurate enough to

produce visually-pleasing 3-d ﬂy-throughs. Delage, Lee and

Ng (DLN) [4, 3] and Hoiem, Efros and Hebert (HEH) [9, 7]

assumed that the environment is made of a ﬂat ground with

vertical walls. DLN considered indoor images, while HEH

considered outdoor scenes. They classiﬁed the image into

ground and vertical (also sky in case of HEH) to produce a

simple “pop-up” type ﬂy-through from an image. HEH fo-

cussed on creating “visually-pleasing” ﬂy-throughs, but do

not produce quantitatively accurate results. More recently,

Hoiem et al. (2006) [8] also used geometric context to im-

prove object recognition performance.

In this paper, we focus on inferring the detailed 3-d struc-

ture that is both quantitatively accurate as well as visually

pleasing. Other than “local planarity,” we make no explicit

assumptions about the structure of the scene; this enables our

approach to generalize well, even to scenes with signiﬁcant

non-vertical structure. We infer both the 3-d location and the

orientation of the small planar regions in the image using a

Markov Random Field (MRF). We will learn the relation be-

tween the image features and the l ocation/orientation of the

planes, and also the relationships between various parts of

the image using supervised learning. For comparison, we

also present a second MRF, which models only the location

of points in the image. Although quantitatively accurate, this

method is unable to give visually pleasing 3-d models. MAP

inference in our models is efﬁciently performed by solving

a linear program.

Using this approach, we have inferred qualitatively cor-

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余7页未读，立即下载

评论收藏

内容反馈

不要麻酱

粉丝: 157
资源: 4

基于传统方法的单目深度估计-基于马尔可夫随机场的深度估计方法

基于马尔可夫随机场和粒子群算法的脑血管分割新方法。

基于马尔可夫随机场的基于实体分布的查询扩展方法。

一种基于非均匀马尔可夫随机场的图像分割方法

Markov随机场的例子程序

基于单目图像的人脸深度估计.pdf

基于单目视觉的障碍物检测算法

电信设备-一种恢复深度信息的图像处理的方法.zip

通过传播局部和全局提示的置信度来对单眼相对深度进行重新排序

imageCRF:改编imageCRF，Bhole，Chetan的C ++ CRF实现

基于马尔可夫随机场的嘴唇特征提取方法* (2007年)

论文研究-基于马尔可夫随机场的嘴唇特征提取方法.pdf

基于高斯_马尔可夫随机场模型的图像修补方法研究

基于卷积神经网络的单目深度估计.pdf

DeepSeek从入门到精通-清华大学-202502.pdf

DeepSeek从入门到精通-清华大学

DEEP SEEK 本地部署（Ollama + ChatBox）+ 私有知识库（cherry studio）教程

清华deepseek入门到精通文档 夸克网盘资源下载

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

最新资源

清华deepseek入门到精通文档夸克网盘资源下载