3D Structure from 2D Dimensional Images Using Structure from Motion Algorithms

Elkhrachy, Ismail

doi:10.3390/su14095399

Open AccessArticle

3D Structure from 2D Dimensional Images Using Structure from Motion Algorithms

by

Ismail Elkhrachy

Civil Engineering Department, College of Engineering, Najran University, King Abdulaziz Rd., P.O. Box 1988, Najran 11001, Saudi Arabia

Sustainability 2022, 14(9), 5399; https://doi.org/10.3390/su14095399

Submission received: 9 April 2022 / Revised: 27 April 2022 / Accepted: 28 April 2022 / Published: 30 April 2022

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Natural disasters and human interference have endangered heritage structures around the world. Therefore, 3D modeling of buildings is important for historical preservation, particularly in low-income and war-affected countries. The majority of 3D structure surveying acquisition approaches, terrestrial laser scanning (TLS), total station measurements, or traditional photogrammetry require either high-cost technologies or professional user supervision. Structure from motion (SfM) approaches address both of these issues by allowing a non-expert user to produce a dense point cloud for real structures by taking a few 2D photographs with a digital camera and processing them with highly automated and freely available data processing tools. The state of the art for the SfM technique is presented in this paper. Agisoft Metashape, VisualSFM, and Regard3D, three well-known types of SfM software, were examined and compared. The 3D point cloud was scaled and transformed into a local coordinates system using total station instruments that were used to obtain some ground control points (GCPs). Ninety-six 2D digital photographs for the historical Emara Palace in Najran, Saudi Arabia, were obtained as data input, and the image matching, bundle adjustment (BA), completeness, and accuracy of three used packages were calculated and compared.

Keywords:

structure from motion; photogrammetry; heritage documentation; 3D model reconstruction; point cloud; accuracy; Agisoft Metashape; VisualSFM; Regard3D

1. Introduction

Heritage buildings around the globe are at risk due to natural catastrophes or human activity. Therefore, the 3D modeling of heritage buildings is significant and necessary for their identification, monitoring, preservation, and restoration [1]. Moreover, it is critical to obtain the 3D surface coordinates of heritage sites for conservation and digital documentation; the collection of geometric data for historic structures is a crucial part of the documentation. Excellent outcomes have been achieved in recent years as a result of remarkable advancements in surveying equipment and methodology [2].

The categorization of common styles for the digital recording of cultural heritage buildings includes classical surveying (e.g., total station or GNSS) and optical sensors (e.g., laser scanners or digital cameras) [3,4,5,6]. In recent years, much interest has been shown in active optical sensors, such as terrestrial laser scanning (TLS) equipment, for 3D modeling in the heritage field [7]. Passive sensors such as digital photogrammetry are well-known methods for collecting 3D surface information [8,9,10]. Each technique has its strengths and drawbacks. Although TLS is expensive, it has the advantage of directly recording 3D data for surfaces and creating point clouds. Moreover, photogrammetry requires the capturing and processing of overlapped photos; however, it is widely considered the most effective method for converting image data to the 3D surface data of an object [11,12]. This method also provides real-color data and is less costly than laser scanning.

Unfortunately, to process captured images and obtain high-quality 3D surface coordinates of any heritage buildings, most of the traditional photogrammetry processing methods still require expensive processing software and particular user knowledge. The development of structure from motion (SfM) techniques, on the other hand, allows for extremely low-cost 3D data gathering with minimal human guidance and knowledge [13]. Both classical photogrammetry and SfM photogrammetry use overlapping 2D images obtained from several viewpoints in their solution approaches [14]. The SfM solution method differs from traditional photogrammetric approaches in that it automatically determines internal camera geometry, as well as camera location and orientation, without the usage of ground control points (GCPs) located in the scene of interest [15,16]. Stereo matching and 3D point cloud generation have shown tremendous success with deep learning (DL) for SFM solutions [17,18,19,20]. Such methods ignore the relationship between camera motion and depth prediction while solving the camera motion as a regression problem [21].

Some researchers have attached digital cameras to unmanned aerial vehicles (UAVs) to collect detailed 3D data of large and complex historical buildings [22]. Esposito et al., 2014 [23] Referenc reported the local and global accuracy range of a point cloud acquired by a UAV and TLS techniques for a building which was found to be between 0.05 m and 0.1 m. Altman et al., 2017 [24] addressed the final accuracy and cost analysis of the acquisition of 3D information for a project utilizing UAV photogrammetry and TLS technologies.

With the evolution of processing software and the steady advancement of digital camera specifications, digital photogrammetry has become less economical, making it easy to acquire 3D surface information, even for non-specialists. In addition, the majority of cameras produced today are inexpensive and have large storage capacity. By contrast, photogrammetry is significantly more inexpensive than laser scanner methods. The performance of the three most popular digital photogrammetry processing programs—Bentley ContextCapture, RealityCapture, and Agisoft Metashape—have been compared [25,26,27]. Their studies indicate that 3D models created using various types of software are not always consistent. Moreover, some of the processing software is simple, but it necessitates more human interaction, and the majority of the programs provide limited insight into their internal algorithms. Several World Heritage sites have been documented through digital photogrammetry. Remondino et al., 2010 [28] indicated that photogrammetry allows for precise 3D reconstructions at various scales as well as hybrid 3D models (such as the terrain model plus archaeological structures).

In this paper, a low-cost digital camera and three types of SfM processing software were used to obtain 3D surface information for the massive and complex Emara Palace. Overlapping photos were captured with a Canon EOS 600D digital camera and processed with SfM algorithms offered by Agisoft Metashape [29], VisualSFM (http://ccwu.me/vsfm// (accessed on 18 March 2022)), and Regard3D (https://www.regard3d.org/ accessed on 18 March 2022) software to produce point clouds and digital surface models (DSMs). The accuracy of the applied recording data was compared to the total station measurements.

2. The Historical Emara Palace in Najran

The palace was built in 1961 as headquarters for the provincial principality and included a courthouse, a radio station, and the residence of the prince, his family, and his companions. The study area is shown in Figure 1.

3. Materials and Methods

3.1. Digital Camera Used

The Canon EOS 600D was used to take 96 photos with a resolution of 5184 × 3456 pixels and an 18 mm focal length with auto-adjust lighting turned off. Table 1 illustrates the characteristics of the digital sensor used to capture images. The most widely used photogrammetry software program, Agisoft Metashape, was used to build 3D models in this study, and the results were compared with coordinates collected using a total station.

3.2. Site Preparing and Imaging

A local geodetic network of six occupation points around the building was established to detect the 3D coordinates of all distributed coded targets. A Leica TS 06 Plus laser total station, whose specifications are displayed in Table 2, was used to collect control points. From the main traverse stations, 25 well-spread check points (CPs) were measured.

Figure 2a displays the main traverse points and CPs, as well as their distribution and labels.

Multiple images with 70–80% overlap were acquired from various stations. The cameras were held manually, and the camera settings for each exposure were within its specified limits, including the maximum f-stop, lowest ISO to reduce noise, and fixed focus. From the original batch of photos, a few unusable images were eliminated, including blurry images and those covering a small region.

3.3. SfM Workflow

The SfM solution’s overall process is divided into two primary components: keypoint search and incremental reconstruction, respectively, as shown in Figure 3a. As the first step of the overall SfM process, keypoint search stage consists of three steps: extraction of features, feature matching, and geometric verification to obtain a consistent and well-distributed natural feature for matching adjacent captured images. Image position calculation and building 3D point cloud of the imaged structure contain the second overall SfM process step. Five steps can be achieved in this stage: establishment and selection of the best image pair, image registration and orientation solution, triangulation for 3D point computation, BA for local and global coordinate systems, and outlier reduction are all steps in this process. Any general structure has a world coordinate (X, Y, Z), and in this world coordinate, we have points of interest

(P_{p})

. Let us say we have such

N

of points, and we take multiple images with large overlap collected from different positions and directions of 3D structure (see Figure 3b). Each point gives us a 2D projection of these on the captured images. The structure of SfM can be stated as a given set of corresponding image points in

(u, v)

image coordinate system. Each image contains (2D):

(u_{f, p}, v_{f, p})

and the objective to find the coordinate of points in 3D. The first solution for the SfM algorithm was developed with an orthographic camera model assumption by [30].

The corresponding feature detection identification in individual images can be used to solve camera motion

(R_{i}, T_{i})

parameters. The SIFT is popular solution to detect features; when enough 2D overlapped image data are obtained, the photogrammetry software tools perform well [31]. The SfM approach makes use of natural features such as trees and other thick textures. Coded markers are used in other programs. These are excellent for scanning things that are difficult to analyze due to their lack of features or have a shiny and/or transparent surface. When used effectively, coded markers can provide greater precision than the SfM technique. The common features or coded markers between images are utilized to determine both interior and external orientation factors. The scale-invariant feature transform (SIFT) is among the most extensively used feature detectors. The SIFT approach uses the maxima from a difference-of-Gaussians (DOG) pyramid as features [32,33]. The speeded-up robust features (SURF) is also another effective feature detector used by SfM algorithms [34,35]. In the SURF method, the Hessian-matrix-based blob detection approach replaces the DOG in the SURF feature detection method [36,37]. After all of the features from neighboring photos have been recognized, they are then matched. A few of the features that have been matched are incorrectly matched. As a consequence, the matches should be filtered using the random sample consensus (RANSAC) algorithm [38,39].

The primary processes of this investigation were site preparation before capturing imaging, point cloud production, and accuracy assessments, as illustrated in Figure 4. Before taking images, coded targets were distributed in the scene and used as reference points during image processing. Following the estimation of camera parameters based on keypoints, sparse point clouds are constructed, the density of which can be improved using clustering view for multi-view stereo (CMVS) and patch-based multi-view stereo (PMVS2) algorithms [40,41]. The CMVS decomposes overlapping input images into clusters using camera parameters as input data. The PMVS2 is used to reconstruct dense point clouds for the structure. Figure 4 shows the detailed processing steps of obtaining the 3D point cloud from captured 2D images using SfM algorithms.

4. Three-Dimensional Point Clouds Based on Image Data and Existing Software

A set of captured image datasets was utilized in the experiment to evaluate the quality of three well-known SfM software products: Agisoft Metashape, VisualSFM, and Regard3D.

4.1. Imagery Acquisition for Generation of Point Cloud

The Canon EOS 600D camera was tightly put on a tripod, and it photographed the entire Palace. Around the palace, 200 photos were captured with an overlap of more than 80% in the imaging direction. Too-low aperture values and too-high gain values during image capturing were avoided to provide optimal image quality, including sharpness and brightness. Only 96 images were found in high quality from the total images and were used to obtain the 3D model. In this study, commercial software packages Agisoft Photoscan, VisualSFM, and Regard3D were utilized to reconstruct the palace 3D model from only 2D images using SfM tools.

Photogrammetry is the art and science of identifying the positions and shapes of objects using images taken from various perspectives. After images are captured, they can be processed through the conventional photogrammetric pipeline of sensor calibration, image orientation, surface measurement, feature extraction, and orthophoto creation [31]. In the 3D heritage community, fully automated digital image processing methods based on structure from motion (SfM) approaches are becoming increasingly popular [42]. Based on the scale-invariant feature detection (SIFT) technique, the SfM algorithm finds and matches common points in overlapped areas between consecutive images [43,44]. SfM calculates the locations of those common points in a local coordinate system by producing sparse 3D points after finding and matching the common points, a procedure known as triangulation [13,45]. SfM solutions are different from classic photogrammetry solutions in three ways: features can always be recognized and matched automatically; the algorithm’s solutions are obtained without requiring the camera’s position or GCPs; and camera calibration can be automatically performed [46].

4.2. Software Packages

A variety of SfM tools, such as PC image processing software, smartphone, and web-based apps, are currently available on the market. They all provide similar services for 3D model construction but vary in the range of post-processing capabilities. Some of them, such as Autodesk Tinkercad (https://www.tinkercad.com/ (accessed on 1 March 2022)), Fusion 360 (https://www.autodesk.com/products/fusion-360/overview (accessed on 1 March 2022)), ReCap Pro (https://www.autodesk.com/products/recap/overview?term=1-YEAR&tab=subscription (accessed on 1 March 2022)), and Microsoft Photosynth (https://photosynth.en.softonic.com (accessed on 1 March 2022)), require the user to upload the taken photographs to a company’s server, where they will be processed and the results downloaded. The following is a brief explanation of three software products that were selected.

Agisoft Metashape commersoftware was used to process the overlapped captured images [29]. The general approach is as follows: the importation of photographs, alignment (creation of a sparse point cloud for image orientation using the SIFT technique and bundle block adjustment), the generation of point clouds, and the building of a mesh and texture.
VisualSFM developed by Wu, 2013, http://ccwu.me/vsfm/ (accessed on 2 March 2022), is an open-source software package that processes the data on the local PC.
Regad3D is also an open-source software package that uses an SfM tool that is free and open-source (https://www.regard3d.org/ (accessed on 3 March 2022)).

All used SfM software products have similar workflows: importing the same 96 captured images, pairwise matching using produced keypoint, and finally producing a dense 3D point cloud.

4.3. Results and Evaluation

The performance, completeness, and precision of the three software products were evaluated. The time spent on feature matching and BA was used as a scale to evaluate each software’s performance. Once SfM-based image processing was complete, the number of matched photos and produced point clouds was compared to determine the completeness of each employed package. The RMSE (root mean square error) was calculated from an absolute orientation test and CPs from total station observations to determine accuracy for each product.

RMSE = \sqrt{\frac{1}{n} \sum_{j = 1}^{n} {(y_{i} - \hat{y})}^{2}}

(1)

where n is the number of compared pairs,

y_{i}

is the reference coordinate, and ŷ is the predicted coordinate. All computations were performed on a Windows PC with a 2.20 GHz Intel(R) Core(TM) i3-2330 M CPU and an NVIDIA GeForce GT 525 M graphics card for the evaluation.

4.3.1. Performance and Completeness

Table 3 shows the statistical results of completion time for feature matching and bundle adjustment. The Agisoft feature matching step clearly has the maximum efficiency in feature matching, despite its use of the larger number of connected images (58/98). Because of the integration of both multi-core CPU and GPU acceleration, the VisualSFM software achieved the best time in bundle adjustment. Table 3 illustrates also the statistical results for completeness.

In respect of completeness, none of the software packages utilized 3D points from all of the acquired photos in the dataset; only around 60% of the captured images were processed. The quantity of resumed 3D points from Agisoft Metashape is higher than the other solutions throughout the assessed software packages while it uses the divide-and-conquer algorithm (DCA) feature matching approach [47]. Figure 5 shows the 3D point cloud produced with the evaluated software packages.

The Agisoft Metashape software can manually detect and measure CPs in high-resolution photos, as well as camera calibration parameters such as focal length, principal point, and radial lens distortion. The image dataset was reduced to 96 images, mainly of the external walls, for 3D data processing. During the check and control processing stage, only 16 of the 25 GCPs’ locations were detected. Figure 6 shows the point cloud of the whole place manually georefrenced with some GCPs.

The RMSE value of the complete project using the Agisoft Metashape software was 0.007 m (Table 4). On the check points, residual values were found, with an RMS of 0.005 m (Table 5).

4.3.2. Accuracy

BA tests are used to examine the relative quality of image orientation with/without GCP observations. Due to the inability of all software packages to connect all photos in a single model, the images of the front wall of the building were processed to compare their accuracies. The front wall of the palace was imaged with 24 photographs picked from a total of 98 images of the complete building to be processed with the three programs to evaluate and compare their accuracy. The absolute orientation accuracy of the reconstructed models was assessed using five GCPs collected with a total station instrument. The Regrd3D program is the only one of the tested software packages that does not directly allow the user to calculate the GCP-based orientation. The user should use Meshlab (https://www.meshlab.net/ (accessed on 5 March 2022)) or CloudCompare (https://www.cloudcompare.org/ (accessed on 5 March 2022)) free-to-use and open-source code software to finish this stage. Table 6 shows the statistical values of the RMSE of BA and obtained 3D points of the front wall based on three used software packages. Figure 7 shows 3D point clouds of the front façade based on three used software packages.

The Metashape and Regard3D software packages have no systematic errors due to low residual magnitudes and random distribution patterns. The VisualSFM program was the least accurate, with an RMSE value of 0.62 m. The Metashape obtains the maximum accuracy for BA calculations without using GCPs, with an RMSE value of 0.79 (pixels).

4.3.3. Direct Cloud-to-Cloud Comparison

The direct 3D point comparison is straightforward and does not need data gridding or meshing [48]. The distance between the two points’ datasets is used to measure the distance difference. The difference between two point clouds was calculated using the Meshlab program [49]. The cloud-to-cloud distance tool is used to calculate the distances between two clouds that are being compared [50]. The point cloud generated from Agisoft Metashape was utilized as a reference due to the precision of its results (RMSE = 0.004 m Table 6) and the absence of a reference model, while the cloud obtained from VisualSFM and Regard3D software was used as the compared cloud.

For the Regard3D point cloud when it was compared to the Metashape 3D points, the maximum distance was 2.05 m, and the majority of the locations were blue, indicating variations of less than 5 cm (Figure 8a). The mean distance was 0.0.067 m with the standard deviation (std) = 0.125 m, and 90.9% of all points had deviations of less than 0.125 m (Figure 8b).

On the other hand, the VisualSFM software point cloud is the worst, and the majority of discrepancies in blue and green are noticeable (Figure 9a). The mean distance was 0.447m with std = 0.264 m, and 90.9% of all points had deviations of less than 0.806 m (Figure 9b).

5. Discussion

This study assessed SfM algorithms in terms of keypoint search, image matching, and BA improvement. From many free SfM open-source software products, the following were selected for the purpose of the demonstration of SfM procedures and techniques using images taken with a digital camera: Agisoft Metashape, VisualSFM, and Regard3D. The three software products were compared in terms of SfM-based 3D point rebuilding efficiency, completeness, and accuracy. Using Canon EOS 600D, 96 photos of the Emara Palace in Najran, Saudi Arabia, were captured with a resolution of 5184 × 3456 pixels and an 18 mm focal length with auto-adjust lighting turned off. Image matching is the most difficult challenge for terrestrial photogrammetry based on SfM algorithms, especially for oblique photos or high buildings, due to severe geometric and radiometric deformations produced by multiple perspective views and brightness variations. In addition, certain historical structures have fewer architectural details and little surface variation, leading to low-texture images. Furthermore, streets around monuments are typically narrow; thus terrestrial photogrammetry images are collected from short distances compared to aerial images. This leads to significant occlusions and inaccurate image matching. Because of these difficulties, the majority of local feature-based image matching algorithms have failed. To solve the existing challenges, multi-primitive matching techniques, such as line or plan correspondences in addition to point features, can be used to produce acceptable and reliable matches between image pairs [51,52]. It is critical for SfM software packages to provide a complete 3D product for scanned objects. For example, Metashape, as commercial software, aims to present a complete product; however, VisualSFM and Regard3D need to enhance their efficiencies in order to produce a complete model. The best BA accuracy results were achieved by the Metashape package, while the Regard3D results are not poor but still indicate that the user will need to do more work using additional software. With an RMSE accuracy value of 0.62 m, the VisualSFM software had the lowest accuracy. The SFM algorithm is used to analyze and process not just the outside walls but also the rooms behind the walls in order to create a comprehensive 3D representation. Correct 3D surface coordinates for multi-story buildings are challenging to obtain because the camera is normally at ground level when gathering photos. As a result, lifting the camera with a scaffold or long poles is a low-cost option. In addition, UAV photogrammetry can be used to document architectural heritage.

6. Conclusions

Three SfM photogrammetry algorithms were used to create a 3D model of the historical Emara Palace in Najran, Saudi Arabia. The efficiency of the used packages with aspects of their image matching, BA calculation, and accuracy of the generated 3D point cloud was investigated. Some GCP targets were distributed in the scene before the imaging stage, and their coordinates were collected using a laser total station to evaluate the accuracy of the final product. The Agisoft Metashape package achieved the best results in terms of BA accuracy and completeness, with an RMSE of 0.004 m and the ability to match 58 images out of 98. Regard3D created a 3D model with a BA accuracy of 0.04 m and matched 57 pictures, whereas VisualSFM had the worst BA accuracy and completeness. Although some of the SfM photogrammetry software is simple to use, inexpensive, and excellent for modeling complicated structures, it does have limitations, which the user must be aware of.

Funding

The Deanship of Scientific Research at Najran University funded this work under the National Research Priorities funding program.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the author upon reasonable request.

Acknowledgments

The author is thankful to the Deanship of Scientific Research at Najran University for funding this work under the National Research Priorities funding program (NU/NRP/SERC/11/27).

Conflicts of Interest

The author declares no conflict of interest.

References

Peña-Villasenín, S.; Gil-Docampo, M.; Ortiz-Sanz, J. 3-D Modeling of Historic Façades Using SFM Photogrammetry Metric Documentation of Different Building Types of a Historic Center. Int. J. Arch. Herit. 2017, 11, 871–890. [Google Scholar] [CrossRef]
Mancini, F.; Pirotti, F. Innovations in photogrammetry and remote sensing: Modern sensors, new processing strategies and frontiers in applications. Sensors 2021, 21, 2420. [Google Scholar] [CrossRef] [PubMed]
Sansoni, G.; Trebeschi, M.; Docchio, F. State-of-the-art and applications of 3D imaging sensors in industry, cultural heritage, medicine, and criminal investigation. Sensors 2009, 9, 568–601. [Google Scholar] [CrossRef] [PubMed]
Karagianni, A. Terrestrial laser scanning and satellite data in cultural heritage building documentation. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2021, 46, 361–366. [Google Scholar] [CrossRef]
Elkhrachy, I. Modeling and Visualization of Three Dimensional Objects Using Low-Cost Terrestrial Photogrammetry. Int. J. Arch. Herit. 2019, 14, 1456–1467. [Google Scholar] [CrossRef]
Moyano, J.E.; Nieto-Julián, J.E.; Lenin, L.M.; Bruno, S. Operability of Point Cloud Data in an Architectural Heritage Information Model. Int. J. Arch. Herit. 2021, 1–20. [Google Scholar] [CrossRef]
Kersten, T.P.; Mechelke, K.; Maziull, L. 3D model of al zubarah fortress in qatar-Terrestrial laser scanning vs. dense image matching. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, 40, 1–8. [Google Scholar] [CrossRef] [Green Version]
Grussenmeyer, P.; Landes, T.; Voegtle, T.; Ringle, K. Comparison Methods of Terrestrial Laser Scanning, Photogrammetry and Tacheometry Data for Recording of Cultural Heritage Buildings. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2008, 37, 213–218. Available online: https://www.scopus.com/inward/record.uri?eid=2-s2.0-84981192332&partnerID=40&md5=49320dead7a8eec25ae4c19010b5c0b9 (accessed on 5 March 2022).
Yastikli, N. Documentation of cultural heritage using digital photogrammetry and laser scanning. J. Cult. Herit. 2007, 8, 423–427. [Google Scholar] [CrossRef]
Lerma, J.L.; Navarro, S.; Cabrelles, M.; Villaverde, V. Terrestrial laser scanning and close range photogrammetry for 3D archaeological documentation: The Upper Palaeolithic Cave of Parpalló as a case study. J. Archaeol. Sci. 2010, 37, 499–507. [Google Scholar] [CrossRef]
Atkinson, K.B. Introduction to Modern Photogrammetry. Photogramm. Rec. 2003, 18, 329–330. [Google Scholar] [CrossRef]
Granshaw, S.I. Close Range Photogrammetry: Principles, Methods and Applications. Photogramm. Rec. 2010, 25, 203–204. [Google Scholar] [CrossRef]
Iglhaut, J.; Cabo, C.; Puliti, S.; Piermattei, L.; O’Connor, J.; Rosette, J. Structure from Motion Photogrammetry in Forestry: A Review. Curr. For. Rep. 2019, 5, 155–168. [Google Scholar] [CrossRef] [Green Version]
Micheletti, N.; Chandler, J.H.; Lane, S.N. Investigating the geomorphological potential of freely available and accessible structure-from-motion photogrammetry using a smartphone. Earth Surf. Process. Landf. 2014, 40, 473–486. [Google Scholar] [CrossRef] [Green Version]
Westoby, M.; Brasington, J.; Glasser, N.F.; Hambrey, M.J.; Reynolds, J.M. ‘Structure-from-Motion’ photogrammetry: A low-cost, effective tool for geoscience applications. Geomorphology 2012, 179, 300–314. [Google Scholar] [CrossRef] [Green Version]
Bemis, S.; Micklethwaite, S.; Turner, D.; James, M.R.; Akciz, S.; Thiele, S.T.; Bangash, H.A. Ground-based and UAV-Based photogrammetry: A multi-scale, high-resolution mapping tool for structural geology and paleoseismology. J. Struct. Geol. 2014, 69, 163–178. [Google Scholar] [CrossRef]
Wang, S.; Clark, R.; Wen, H.; Trigoni, N. Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In Proceedings of the 2017 IEEE international conference on robotics and automation (ICRA), Singapore, 29 May–3 June 2017; pp. 2043–2050. [Google Scholar]
Zhou, T.; Brown, M.; Snavely, N.; Lowe, D.G. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1851–1858. [Google Scholar]
Klodt, M.; Vedaldi, A. Supervising the new with the old: Learning sfm from sfm. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 698–713. [Google Scholar]
Liu, J.; Ding, H.; Shahroudy, A.; Duan, L.-Y.; Jiang, X.; Wang, G.; Kot, A.C. Feature boosting network for 3d pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 494–501. [Google Scholar] [CrossRef] [Green Version]
Wei, X.; Zhang, Y.; Li, Z.; Fu, Y.; Xue, X. DeepSFM: Structure from Motion via Deep Bundle Adjustment. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 230–247. [Google Scholar] [CrossRef]
Ulvi, A. Documentation, Three-Dimensional (3D) Modelling and visualization of cultural heritage by using Unmanned Aerial Vehicle (UAV) photogrammetry and terrestrial laser scanners. Int. J. Remote Sens. 2021, 42, 1994–2021. [Google Scholar] [CrossRef]
Esposito, S.; Fallavollitaa, P.; Wahbeh, W.; Nardinocchic, C.; Balsia, M. Performance evaluation of UAV photogrammetric 3D reconstruction. In Proceedings of the International Geoscience and Remote Sensing Symposium (IGARSS), Quebec City, QC, Canada, 13–18 July 2014; pp. 4788–4791. [Google Scholar] [CrossRef]
Altman, S.; Xiao, W.; Grayson, B. Evaluation of Low-Cost Terrestrial Photogrammetry for 3D Reconstruction of Complex Buildings. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 4, 199–206. [Google Scholar] [CrossRef] [Green Version]
Kingsland, K. Comparative analysis of digital photogrammetry software for cultural heritage. Digit. Appl. Archaeol. Cult. Herit. 2020, 18, e00157. [Google Scholar] [CrossRef]
Niederheiser, R.; Mokroš, M.; Lange, J.; Petschko, H.; Prasicek, G.; Elberink, S.O. Deriving 3D point clouds from terrestrial photographs—Comparison of different sensors and software. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41, 685–692. [Google Scholar] [CrossRef] [Green Version]
Alidoost, F.; Arefi, H. Comparison of uas-based photogrammetry software for 3D point cloud generation: A survey over a historical site. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 4, 55–61. [Google Scholar] [CrossRef] [Green Version]
Remondino, F.; Rizzi, A. Reality-based 3D documentation of natural and cultural heritage sites—Techniques, problems, and examples. Appl. Geomat. 2010, 2, 85–100. [Google Scholar] [CrossRef] [Green Version]
Agisoft LLC. Agisoft PhotoScan User Manual: Professional Edition. Version 1.2User Manuals. 2016, p. 97. Available online: http://www.agisoft.com/downloads/user-manuals/ (accessed on 15 March 2022).
Tomasi, C.; Kanade, T. Shape and motion from image streams: A factorization method. Proc. Natl. Acad. Sci. USA 1993, 90, 9795–9802. [Google Scholar] [CrossRef] [Green Version]
Remondino, F.; Fraser, C. Digital camera calibration methods: Considerations and comparisons. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2006, 36, 266–272. [Google Scholar]
Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Li, Q.; Wang, G.; Liu, J.; Chen, S. Robust Scale-Invariant Feature Matching for Remote Sensing Image Registration. IEEE Geosci. Remote Sens. Lett. 2009, 6, 287–291. [Google Scholar] [CrossRef]
Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2006; Volume 3951, pp. 404–417. [Google Scholar] [CrossRef]
Herbert, B.; Andreas, E.; Tinne, T.; Luc, V.G. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
Schonberger, J.L.; Frahm, J.-M. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar]
Stewart, J. Calculus, Concepts and Contexts; Cengage Learning: Boston, MA, USA, 1998. [Google Scholar]
Fischler, M.A.; Bolles, R.C. Random sample consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Kim, T.; Im, Y.-J. Automatic satellite image registration by combination of matching and random sample consensus. IEEE Trans. Geosci. Remote Sens. 2003, 41, 1111–1117. [Google Scholar] [CrossRef]
Furukawa, Y.; Ponce, J. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 1362–1376. [Google Scholar] [CrossRef]
Agarwal, S.; Furukawa, Y.; Snavely, N.; Simon, I.; Curless, B.; Seitz, S.M.; Szeliski, R. Building Rome in a day. Commun. ACM 2011, 54, 105–112. [Google Scholar] [CrossRef]
Vergauwen, M.; Van Gool, L. Web-based 3D reconstruction service. Mach. Vis. Appl. 2006, 17, 411–426. [Google Scholar] [CrossRef]
Bolles, R.C.; Baker, H.H.; Marimont, D.H. Epipolar-plane image analysis: An approach to determining structure from motion. Int. J. Comput. Vis. 1987, 1, 7–55. [Google Scholar] [CrossRef]
Fonstad, M.A.; Dietrich, J.T.; Courville, B.C.; Jensen, J.L.; Carbonneau, P.E. Topographic structure from motion: A new development in photogrammetric measurement. Earth Surf. Process. Landf. 2012, 38, 421–430. [Google Scholar] [CrossRef] [Green Version]
Ullman, S. The interpretation of structure from motion. Proc. R. Soc. B 1979, 203, 405–426. [Google Scholar] [CrossRef]
Akpo, H.A.; Atindogbé, G.; Obiakara, M.C.; Adjinanoukon, A.B.; Gbedolo, M.; Fonton, N.H. Accuracy of common stem volume formulae using terrestrial photogrammetric point clouds: A case study with savanna trees in Benin. J. For. Res. 2021, 32, 2415–2422. [Google Scholar] [CrossRef]
Dwyer, R.A. A faster divide-and-conquer algorithm for constructing delaunay triangulations. Algorithmica 1987, 2, 137–151. [Google Scholar] [CrossRef]
Girardeau-Montaut, D.; Roux, M.; Marc, R.; Thibault, G. Change detection on points cloud data acquired with a ground laser scanner. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2005, 36, W19. [Google Scholar]
Ranzuglia, G.; Callieri, M.; Dellepiane, M.; Cignoni, P.; Scopigno, R. MeshLab as a complete tool for the integration of photos and color with high resolution 3D geometry data. CAA 2012 Conf. Proc. 2013, 2, 406–416. [Google Scholar]
Cignoni, P.; Rocchini, C.; Scopigno, R. Metro: Measuring Error on Simplified Surfaces. Comput. Graph. Forum 1998, 17, 167–174. [Google Scholar] [CrossRef] [Green Version]
Marapane, S.; Trivedi, M. Multi-primitive hierarchical (MPH) stereo analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 227–240. [Google Scholar] [CrossRef]
Sun, Y.; Zhao, L.; Huang, S.; Yan, L.; Dissanayake, G. Line matching based on planar homography for stereo aerial images. ISPRS J. Photogramm. Remote Sens. 2015, 104, 1–17. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Emara Palace in Najran: (a) as seen from a Google Earth satellite image; (b) map of Saudi Arabia.

Figure 2. Coordinates collection of used GCPs and CPs: (a) their spatial distribution; (b) used total station.

Figure 3. The general process of SfM is illustrated; (a) general workflow, (b) multi-camera configuration used for collecting image sets.

Figure 4. Main research process flowchart.

Figure 5. Overview of the complete 3D point cloud: (a) 8,588,854 before cleaning reduced to 6,359,946 points after mild cleaning based on Agisoft Metashape; (b) based on VisualSFM; (c) 3,143,343 points based on Regard3D software.

Figure 6. The 3D point cloud of the whole place outer walls based on Agisoft Metashape manually georefrenced with some GCPs, 18,242,653 points before cleaning reduced to 15,511,264 points after mild cleaning.

Figure 7. Three-dimensional point cloud of front façade based on used software: (a) Agisoft Metashape, (b) VisualSFM, and (c) Regard3D.

Figure 8. CloudCompare’s color-coded absolute distances: (a) between Agisoft Metashape point clouds and Regard3D point clouds; (b) absolute distance histogram.

Figure 9. CloudCompare’s color-coded absolute distances: (a) between Agisoft Metashape point clouds and VisualSFM point clouds; (b) absolute distance histogram.

Table 1. Canon 600D camera and sensor characteristics.

Sensor size	22.3 mm × 14.9 mm
Pixel dimensions	5184 × 3456
Camera model	Canon EOS Rebel T3i/600D
Megapixels	18.7
Pixel size	4.30 µm

Table 2. Technical specifications of total station.

Angle measurement	Display resolution	0.1″
Angle measurement	Accuracy	1″
Distance measurement	Reflectorless (m)	2 mm + 2 ppm
Distance measurement	Prism (m)	1.5 mm + 2 ppm
Telescope	Magnification	30×
Telescope	Field of view	1°30′ (1.66 gons) 2.7 m at 100 m

Table 3. The time spent matching features and adjusting bundles (in minutes).

Software Products	Feature Matching	Bundle Adjustment	Completeness
Software Products	Feature Matching	Bundle Adjustment	Images	Points
Agisoft Metashape	36.12	77	58/98	47,133
VisualSFM	52.17	14.03	55/98	37,158
Regad3D	244.6	18.5	57/98	45,029

Table 4. Statistics of GCP residuals for the whole palace using Agisoft Metashape software.

Nr.	Target Label	Error (m)	X Error (m)	Y Error (m)	Z Error (m)
1	T1	0.01206	−0.00419	−0.00535	0.009961
2	T3	0.011339	0.002024	0.001014	−0.01111
3	T11	0.002511	0.001865	−0.00144	−0.00087
4	T13	0.001467	0.000207	0.001451	−6.3 × 10⁻⁵
5	T15	0.003757	0.001207	0.003447	−0.00088
6	T17	0.005028	0.002652	0.003855	0.001841
7	T23	0.000654	−6.8 × 10⁻⁵	−0.00039	0.000522
8	T25	0.005267	0.003245	−0.00316	−0.00269
	RMSE	0.007 m

Table 5. Statistics of CP residuals for the whole palace using Agisoft Metashape software.

Nr.	Target Label	Error (m)	X Error (m)	Y Error (m)	Z Error (m)
1	T2	0.008603	0.000777	0.004934	−0.007
2	T4	0.009766	0.0003	−0.00155	0.009637
3	T10	0.002518	−0.00105	0.000635	−0.0022
4	T12	0.00461	−0.00307	−0.00313	0.001428
5	T14	0.003192	−0.0016	−0.00261	−0.00089
6	T16	0.002553	−0.00018	−0.0023	0.001102
7	T22	0.003538	−0.00227	0.002307	0.001436
8	T24	0.001872	0.000028	0.001862	−0.00018
	RMSE	0.005 m

Table 6. Statistics of the residuals, the RMSE for BA with/without using GCPs.

Target Label	Agisoft Metashape Error (m)	VisualSFM Error (m)	Regard3D Error (m)
T1	0.00429	0.72945	0.03489
T2	0.00615	0.13299	0.02914
T3	0.00350	0.72811	0.02797
T4	0.00287	0.57834	0.04189
T5	0.00050	0.71424	0.02268
RMSE	0.004 m	0.620 m	0.032 m
RMSE without GCPs	0.79 (pixels)	Not provided	Not provided

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Elkhrachy, I. 3D Structure from 2D Dimensional Images Using Structure from Motion Algorithms. Sustainability 2022, 14, 5399. https://doi.org/10.3390/su14095399

AMA Style

Elkhrachy I. 3D Structure from 2D Dimensional Images Using Structure from Motion Algorithms. Sustainability. 2022; 14(9):5399. https://doi.org/10.3390/su14095399

Chicago/Turabian Style

Elkhrachy, Ismail. 2022. "3D Structure from 2D Dimensional Images Using Structure from Motion Algorithms" Sustainability 14, no. 9: 5399. https://doi.org/10.3390/su14095399

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

3D Structure from 2D Dimensional Images Using Structure from Motion Algorithms

Abstract

1. Introduction

2. The Historical Emara Palace in Najran

3. Materials and Methods

3.1. Digital Camera Used

3.2. Site Preparing and Imaging

3.3. SfM Workflow

4. Three-Dimensional Point Clouds Based on Image Data and Existing Software

4.1. Imagery Acquisition for Generation of Point Cloud

4.2. Software Packages

4.3. Results and Evaluation

4.3.1. Performance and Completeness

4.3.2. Accuracy

4.3.3. Direct Cloud-to-Cloud Comparison

5. Discussion

6. Conclusions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI