Error message here!

Hide Error message here!

忘记密码?

Error message here!

请输入正确邮箱

Hide Error message here!

密码丢失?请输入您的电子邮件地址。您将收到一个重设密码链接。

Error message here!

返回登录

Close

Why use sim3 of stripping scale to calculate projection matching in loopclosing

Cc1924 2022-01-23 13:10:15 阅读数:9 评论数:0 点赞数:0 收藏数:0

1. Problem description

  1. stay LoopClosing in , After obtaining the closed-loop candidate frame of the current frame , Will use word bags between these two frames 2D Characteristic point The matching of . Be careful 2D The matching of feature points is simply to get the same word bag between two frames node The descriptor under is the closest match . Because these feature points have corresponding map points ( In the process of word bag matching, those feature points without map points are screened out ), So it's equivalent to getting Map points between frames The match of . So we can use the matching relationship of map points between the two frames later , To calculate the distance between the current frame coordinate system and the closed-loop candidate frame coordinate system sim3 Transformation . Reason for calculation sim3, Because monocular has scale drift , That is, actually The pose of the current frame and the world coordinates of map points calculated by it are inaccurate , However, the coordinates of the map points of the current frame in the current frame coordinate system are accurate , Because this is a Local relations .

  2. Preliminary results have been obtained sim3 After transformation ,bool LoopClosing::ComputeSim3() The function is called matcher.SearchBySim3() The function is in the map point of the closed-loop candidate frame , Find more matches with the current frame . When matching projections in this , The above preliminary calculation is used sim3 Transformation , Because at this time, the coordinate transformation relationship between the two camera coordinate systems is sim3, Scale drift is considered , So it is relatively accurate . After getting more matches , Just use g2o Optimize to get more accurate sim3 Transformation .

  3. The problem arises in the last step : In fact, just from the calculation sim3 Come on , The task has been completed above . However, in order to carefully judge whether the closed-loop matching is successful or not , The program also uses matcher.SearchByProjection() The map points of the closed-loop candidate frame and its common view key frame are projected into the current frame again , See how many map points match in the end . In fact, it can be considered that there is no scale drift between the closed-loop candidate frame and its common view key frame because they are very close , Therefore, the map points of the common view key frame of the closed-loop candidate frame can be transformed into the closed-loop candidate frame by Euclidean transformation , And then according to 2 The operation , utilize sim3 Transform to the current frame to find a match . But there are two different things :

    • matcher.SearchBySim3() It's just a match between two map points , Their perspectives don't differ much . But now all map points in the common view key frame group of the closed-loop candidate frame are matched with the map points of the current frame , There is likely to be a big difference in perspective , stay ORB-SLAM Medium angle of view is poor >60 The degree is that the match is inaccurate . Therefore, the direction vector of the map point from the camera optical center of the current frame to the common view key frame group of the closed-loop candidate frame , Therefore, we need to know the distance to the optical center of the camera Real world coordinates .

2. Why peel off the scale s

At first I thought I understood , But after careful consideration, I found that I still didn't understand a lot . But a vague idea can be summed up in one sentence, that is, scale s After stripping ,sim3 The transformation is in the following form :
X ′ = s R ∗ X + t = s ( R ∗ X + 1 s t ) X' = sR*X + t = s(R*X + \frac{1}{s}t) X=sRX+t=s(RX+s1t)

Among them is by R R R and 1 s t \frac{1}{s}t s1t It consists of a Euclidean transformation , The Euclidean transformation can represent the pose . So this way of using peel scale , It's equivalent to recovering The real pose of the camera , So scale s What does it stand for ? here scale s Is the scaling of the camera axis scale . For example, in the world coordinate system, the length is 1 Vector , If it is a simple Euclidean transformation, then the length is still 1. But use here sim3 Transformation , It can be considered that according to R R R and 1 s t \frac{1}{s}t s1t The composition of the Euclidean transformation , But after the transformation, the coordinates have to be changed again s Zoom in . It is equivalent to the coordinate axis of the camera after Euclidean transformation , After scaling , The length of the axis is not a unit 1 了 , Turned into s( The axis has no length , Such a metaphor may not be appropriate , But that's what it means ).

Copyright statement
In this paper,the author:[Cc1924],Reprint please bring the original link, thank you