Photometric DIC: a unified framework for global Stereo Digital Image Correlation based on the construction of textured digital twins

An innovative approach allowing to rigorously address surface curvature and lighting effects in Digital Image Correlation (DIC) is proposed. We draw inspiration from the research work in Computer Vision regarding the physical modelling of a camera and adopt it to bring novel and significant capabilities for full-field measurements in experimental solid mechanics. It gives rise to a unified framework for global stereo DIC that we call Photometric DIC (PhDIC). It is based on the irradiance equation that relies on physical considerations and explicit assumptions, which stands for a clear breakthrough compared to the usual grey level conservation assumption. Most importantly, it allows to define a Digital Twin of the Region of Interest, which makes it possible to compare a model with different observations (real images taken from different viewpoints). This results in a consistent formalism throughout the framework, suitable for large-deformation and large-strain displacement measurements. The potential of PhDIC is illustrated on a real case. Multi-view images are first used to measure (or scan) the shape and albedo (sometimes called intrinsic texture) of an open-hole plate. The kinematic basis considered for the displacement measurement is associated to a Finite-Element mesh. Results for the shape and albedo measurement are compared for two completely different sets of pictures. Eventually, a large displacement of the structure is measured using a well-chosen single image.


Introduction
Ever-increasing interest is being shown in full-field measurement methods able to provide data-rich monitoring of structural tests (Avril et al. 2008;Dalémat et al. 2019;Leygue et al. 2019;Neggers et al. 2018;Schreier et al. 2009;Wittevrongel et al. 2015). Among them, Stereo Digital Image Correlation (SDIC) is a popular easy-to-setup method as it requires only a few cameras and lights together with an optical path to the Region Of Interest (ROI). It provides a 3D displacement field, on a possibly non-planar surface, associated with the matching of pictures (coming from at least two non-coplanar camera positions) taken at different deformation states of the ROI. Different approaches often referred to as local (Lucas and Kanade 1981;Kahn-Jetter and Chu 1990;Luo et al. 1993) and global exist (Beaubier et al. 2014;Dufour et al. 2015;Pierré et al. 2017;Réthoré et al. 2013). Figure 1 provides two diagrams illustrating the main differences between Subset-based SDIC, which belongs to the local approaches, and Finite-Element SDIC (FE-SDIC), which is associated to the global ones.
In both cases, we consider multi-view pictures of the ROI in the reference state (time 0 ) and in the deformed state (time 0 + Δ ), so as to observe a large part of the structure surface. Note that there are as many reference state images as deformed state ones, which naturally leads to setups where there are as many cameras as there are pictures at each state. A first step consists in calibrating the cameras.
In the case of Subset-based SDIC, in Figure 1(a), cameras are paired and for each camera pair a master camera is defined (the left one in our case) (Schreier et al. 2009). This allows to define the subsets (depicted as yellow squares in the left images of the reference state ( , ) ) which are used to generate a point cloud associated to a specific part of the ROI. Basically, each subset of the left image is sought in the right image thanks to standard 2D DIC technique. The knowledge of the positions of the cameras with respect to the scene allows then to obtain the position of each subset centre in the world reference frame (triangulation), which will be added to the corresponding point cloud. Finally, all point clouds are stitched together to generate a 3D representation of the reference state of the whole ROI which we denote . Regarding the deformed state, the procedure is the same, except that the subsets remain defined in the reference state left image , . Thus additional 2D DIC procedures are used to obtain the position of each subset in both left and right deformed state images , and , . Then, thanks again to triangulation, the position of the subset centres in the deformed state ′ are retrieved. Finally, the displacement is obtained as a difference between the two point clouds: = ′ − .
In contrast, in the case of FE-SDIC, in Figure 1(b), no so-called master camera is defined. Instead, pictures are interrogated at pixel positions corresponding to physical points defined on a FE mesh of the ROI thanks to projectors ( ) ( maps a 3D point in the world reference frame R to a 2D point in the camera reference frame). In the shape measurement step based on reference state images ( 0 ) , the position of each physical point is adjusted so that the different grey levels associated to the projection of this point in each picture match. The best shape minimising the discrepancies over all reference state image pairs is sought. In this context, shape measurement is an extremely ill-posed problem, as the mesh may slide on the object (Pierré et al. 2017, Figure 3). The cause is that the texture and the mesh are not attached to one another. Then, the displacement measurement step is performed by adjusting the new position of physical points so that the grey levels in the deformed state pictures ( ) associated to this point match. However, and unlike the previous step, the minimisation of grey level discrepancy is performed Raphaël Fouque et al. Photometric DIC: a unified framework for global Stereo Digital Image Correlation on a camera by camera basis only (i.e. registration between images and 0 ).
Subset-based SDIC is a robust method as it relies on well-studied 2D image registration techniques, but in comparison FE-SDIC exhibits two main advantages. The first one is related to the choice of the measurement kinematic basis. In FE-SDIC, this kinematic basis is the same as FE analysis, making it straightforward to compare measurements and simulations. Second, in FE-SDIC the displacement field is directly the solution of a minimisation problem, while in Subset SDIC, it is computed in a post-processing step using outputs of several minimisation problems. Formulating the problem as a minimisation with as variable allows to easily make use of a priori knowledge of the sought displacement field Rouwane et al. 2021). With such data, a wide variety of powerful algorithms allows to adjust physical parameters associated with numerical models (e.g. FEMU, IDIC) (Hild and Roux 2006;Lecompte et al. 2007;Passieux et al. 2015), densely measure shapes (Colantonio et al. 2020;Dufour et al. 2015;Etievant et al. 2020) or even identify mode shapes during dynamic tests (Berny et al. 2018b;Passieux et al. 2018).
Yet, SDIC is currently limited to experimental setups where the surface area visible by a given camera does not change excessively during an experiment. To clarify this point, we will distinguish, as in (Passieux and Bouclier 2019), the case of small deformations (displacements, rotations and strains are assumed to be small) and the small-strain or large-deformation one (only strains are small). The matching of surface areas that were not visible in previous pictures from the exact same camera is not possible in current SDIC frameworks, thus it is limited to a certain class of large deformations. Also, effects associated with light and surface pixel sampling (which are not accounted for) are strongly related to the local normal vector to the surface (Delaunoy and Pollefeys 2014;Tsiminaki et al. 2019), hence the current framework is rather limited to small-strain contexts. Overall, this results in the use of SDIC only in small-deformation cases. We believe this is because SDIC emerged from 2D DIC based on the same algorithms and assumptions, namely the grey level conservation equation (Horn and Schunck 1981;Lucas and Kanade 1981). This equation can be seen, in the case of 2D DIC, as a way to inverse a forward problem consisting in warping a flat picture with a given displacement field. Yet its extension to SDIC, although practical from an algorithmic standpoint, since it involves only slight modifications of the routines, is quite far-fetched from a physical and mathematical one. Among others, this results in the need for different functionals to be minimised in the current FE-SDIC framework (i.e. one functional for extrinsics calibration and shape measurements and another one for displacement measurements), but most importantly in the lack of physical understanding. In the current state-of-the-art, there is indeed no so-called 'forward problem' allowing to generate virtual images at various deformation states under given lighting conditions of structural tests integrated in a framework (yet methods to generate virtual images to compare measures to a so-called 'ground truth' do exist (Balcaen et al. 2017b)). Thus, we cannot compare these predictions with the observations in actual pictures. As explained by Tarantola (2005), łthe comparison of the predicted outcome and the observed outcome allows us to ameliorate the theoryž. Therefore, we believe a unified framework including a rendering model, mapping an 'intrinsic texture' (further details are provided regarding this expression in Section 2) to a grey level is needed in SDIC. This would allow to rely on an explicit physical modelling that could be further improved instead of on implicit assumptions.
Interestingly, formulations including rendering models already exist in the Computer Vision (CV) Community. They are very similar to those used in DIC (probably because DIC emerged from CV in the 80's). Not only do they propose a physical model, but they come up with astonishing results. Among others, these CV formulations encompass multi-view picture frameworks accounting for simple lighting effects (Birkbeck et al. 2006), frameworks accounting for spatial sampling (Goldlücke et al. 2014) or even frameworks allowing to retrieve shape, texture and camera poses without any prior knowledge (Jancosek and Pajdla 2011;Griwodz et al. 2021;Moulon et al. 2012). Up to the authors knowledge, in the computer vision literature, such approaches have never been used to track displacement or strain fields. This certainly explains why such recent works were hardly included as references in related DIC research works. In this paper, we modestly aim at (i) introducing such formulations to the DIC community and (ii) proposing a formalism to extend such enhanced computer vision methods to displacement measurement. The measurement of displacement fields in such a framework would open the door to very interesting perspectives such as, for instance, the instrumentation of a twisting sample (where it is not possible to see a given point in the reference and deformed state images from the same viewpoint).
The idea is to extend the Global SDIC framework (by relying on FEDIC in this work), in which a model of the structure is available (that is, a FE mesh herein) while taking advantage of the strong regularisation allowed by the knowledge of the initial object texture, such as the one provided by the master reference state image in Subset-based SDIC. To achieve this goal, we propose in the present work a framework relying on a Textured Digital Twin of the ROI constituted thanks to a physical modelling of the scene, see Figure 2. In a preliminary step 0 , denoted calibration herein, the object is 'scanned' thanks to numerous multi-view pictures ( ) allowing to elaborate a refined model of the structure, the Digital Twin, encompassing both a shape correction field and an albedo (or intrinsic texture). Note that in this step data are not compared to one another anymore. Instead, the Digital Twin is used as a common thread to compare pictures with. In this sense, a parallel can be drawn with the work proposed in Digital Volume Correlation by Leclerc et al. (2015). Then, in the displacement measurement step 1 , the deformed state images ( ) are compared to the model and the sought displacement field is identified. Note that both the number of deformed state pictures and the associated viewpoints may be different from the reference ones. The position of the specimen and the lighting conditions can also be different, for instance the calibration can be done outside the testing machine. We call this framework Photometric Stereo Digital Image Correlation (PhDIC) as both steps rely on the same photometric error. We denoted the time associated to the displacement measurement step 1 instead of 0 + Δ as in Figure 1. There are two reasons for that. First, time 0 may correspond to a scan of the structure without mechanical loading. Second, the displacement measurement step 1 can also stand for the positioning of the specimen in the test setup (which may apply loads that are usually neglected) and camera removal.
This article is structured as follows. We first propose in Section 2 a modelling of the camera as a sensor converting a surface power density into grey levels (in the same way as it is usually considered as a projector mapping a 3D point in the scene to a 2D point in the image). This offers the opportunity to provide a general framework and to show that further assumptions regarding surface response to lighting and light distribution are needed. Especially, simple models relevant in the DIC context can be introduced, namely the Lambertian (surface response) and distant point light source (light distribution) ones. This allows to build a general unified photometric functional. Then the Lambertian and distant point light source models are plugged into this functional. In this section, a particular care is also taken to review some important and delicate points discussed in CV in a condensed and comprehensible manner for their significance in the experimental mechanics community.
Next, Section 3 focuses on the photometric calibration phase (shape and albedo measurement) whereas Section 4 details the photometric displacement measurement phase. In these sections, we present a practical setup and further assumptions that we made in order to provide a working algorithm. It is based on the same functional for the extrinsics and shape measurement, and for the associated displacement measurement. As a first but illustrative example, we consider an open hole plate subjected to a 90°rigid-body rotation. Such a transformation would be extremely difficult to capture with usual SDIC approaches while it is direct with our methodology. The associated results are presented and discussed. This finally brings us to Section 5 were concluding remarks are provided together with some outlooks.

Photometric Digital Image Correlation
In this section we want to develop a photometric functional based on the comparison between a prediction and an observation, and then discuss some important features of the proposed framework. Before that, we propose to expose to the reader the general key elements of scene modelling derived from CV. Hence, readers already familiar with radiometric and photometric principles can directly jump to Section 2.2.

From the light source to the grey level value
Here below, we present the irradiance equation and the associated physical considerations. We first intend to put forward some challenges in computer graphics and rendering (among others). This general overview is then progressively simplified by explicitly making assumptions that are commonly implicitly adopted in the DIC community.

General overview
As explained earlier, we wish here to describe a general unified framework for SDIC, based on a photometric error. By photometric error we mean an error thought as a distance between a prediction from a model and an observation in a picture. This kind of error can be based on the image irradiance equation (Horn 1986). Irradiance (or ) is the amount of light falling on a surface (power per unit area) while radiance is the amount of light radiated from a surface (power per unit area per unit solid angle). As discussed by Horn (1986), the slightly more intricate unit for radiance comes from the fact that a surface can emit different amounts of light depending on the emission directions. The image irradiance equation states that the radiance coming from a point on an object (what we want to model) is proportional to the irradiance at the corresponding point in the image (what we observe).
To formalise this equation, we denote the pixel coordinates in the image of the associated 3D point in the world reference frame. In this case, the image irradiance equation reads where is the unit vector pointing from to the optical centre of the camera, see Figure 3, and , called the throughput (Cohen and Wallace 1993), depends on the -number and the angle formed by and the optical axis of the camera. However, this dependency of on the angle formed by and the optical axis is usually neglected (Horn 1986) so that is considered constant. In the following, we will not distinguish the image irradiance from the corresponding grey level value, as we assume the camera sensor to provide a linear relationship between these two quantities. The multiplicative constant is thus included in . Note that Equation (1)   Remark This model assumes a homogeneous and transparent medium. Hence the distance from to the viewpoint was omitted in Equation (1). The dependency with respect to light frequency was also omitted, as we usually deal with monochrome cameras in DIC.
Before choosing a simplified model for the radiance emitted by a surface under given lighting conditions (model ( , )), we will first introduce a more general framework. In the general case, Horn (1986) explains that the radiance emitted from a point depends on the amount of light falling on it, the irradiance , as well as on the irradiance fraction which is reflected per unit solid angle. The radiance also depends on the geometry and light position, as illustrated by specular reflections. Thus, we can locally parameterise the problem thanks to four degrees of freedom (dofs): two for the incident light direction (the incident polar and azimuth angles, respectively and ) which allow to define a unit incident vector ( , ) and two for the emission direction (respectively and ) which allow to define a unit emission vector ( , ). The definition of these angles with respect to the local normal vector ( ) and an arbitrary vector belonging to the tangent plane to the surface can be seen in Figure 3. The fraction of incident light coming from the direction ( , ) reflected in the direction ( , ) is usually denoted ( , , , ) (per unit solid angle) and is called the Bidirectional Reflectance-Distribution Function (BRDF). For the sake of simplicity, we omit, in and in following developments, the space dependency of each quantity (with respect to ).
Remark Since we assume that the only way out for incoming energy is to be reflected, and since the incident light should come from the outside of the surface, both and belong to [0, /2], as shown in Figure 3. Thus, effects such as transmission and subsurface scattering are not accounted for. For an even more general concept than BRDF which is called Bidirectional Scattering-Surface Reflectance-Distribution Function (BSSRDF), we refer the interested reader to (Nicodemus et al. 1977).
Hence, the radiance can be written as a function of the irradiance and the BRDF where ( , ) denotes the incident radiance coming from the direction − ( , ) and Ω the solid angle delimited by [ , + ] and [ , + ]. Thus, the total radiance emitted by the surface in direction is given by the integral over all elementary contributions coming from every single direction: Remark Equation (4) is called the reflection equation. In computer graphics, it is called the rendering equation and a second term in the right-hand side may be included. The latter is an outgoing radiance in the case where the surface emits light by itself, in addition to the reflection from incident light. We chose to discard this term as, in structural mechanics, materials generally do not act as light sources.
At this point, we can observe that the emitted radiance coming from ( , ) depends on the contributions of all incident radiances ( , ). Each of these incident radiances is in turn the solution of the same kind of equation as Equation (4) and so on. This is an infinite-dimensional problem and further assumptions are needed to be able to model this radiance.
Remark Note the difficulty to define an 'intrinsic' texture. The albedo is defined as the ratio of emitted irradiance over incident irradiance and depends in general on the incident radiance distribution:

The Lambertian model
Most of the time in SDIC, the grey level conservation equation is used. In Global SDIC, whether it be for shape measurement, where the grey level associated to a physical point is assumed to be the same in each camera, or for displacement measurement, where the grey level associated to a physical point is assumed to remain constant in time for a given camera, it relies on a Lambertian assumption. That is, the incoming light is assumed to be reflected with equal intensity in all directions ( , ) (i.e. ∀( , , ), ( , , ) = ( )). Obviously, this assumption is not correct if the motion/rotation of the object is significant for the displacement measurement step (e.g. if the surface orientation with respect to the light changes). Regarding the shape measurement one, the grey level conservation equation assumes that the throughput is the same for all cameras, which may not be the case. Some works tried to account for surface illumination changes and optical system differences in cameras. The first step toward this goal in DIC was to use a Zero-Mean Normalised Sum of Squared Differences cost function as a matching criterion between pictures (Tong 2005). The same idea was introduced earlier by Faugeras and Keriven (1998) in CV. Because a correction on the whole ROI was not always sufficient to explain higher residual values in some areas, local corrections were introduced in SDIC on a finite-element basis (Colantonio et al. 2020) for instance, or thanks to low-order polynomials (Dufour et al. 2015;Charbal et al. 2020).
Instead of a Lambertian assumption, or unphysical corrections, we use here a Lambertian model. It follows that ( , , , , ) = ( ) = ( )/ (Nicodemus et al. 1977, Appendix C). From Equation (4), we get Note that is independent of the viewpoint and light direction in the Lambertian model. Thus, we could refer to it as the intrinsic texture, but we prefer a physical designation: the albedo. From Equation (6), we can see that further assumptions are required to model the radiance , in particular regarding the incident light distribution .

Distant point light source
We further assume infinitely distant point light sources. This way, neither the surface power, nor the direction of the light depend on the position in the scene. These light sources are parameterised by ∈ 1, . It is also assumed that indirect illumination (contribution of the radiance emitted by all other points in the scene to the incident radiance to a point) is negligible compared to the direct illumination from the light sources. In this case, does not depend on where Φ denotes the irradiance and ( , ) the direction associated to the light source , see again Figure 3. Finally, we get In this case, Equation (1) becomes Remark From Equation (9), (Horn 1986, Lightness & Color) and (Woodham 1980), it is possible to identify ( ) 1 with one single light 1 but at least three different non-coplanar lighting conditions (three non-coplanar vectors ). An object in the scene of which the albedo is known allows to evaluate Remark Via Equation (9), we can make explicit the assumptions on which grey level conservation equation in DIC relies. In the displacement measurement step, it assumes that the scalar product ⟨ ( , ), ( )⟩ remains constant over time. This results in constraining DIC to operate in setups where displacements and strains are rather small.
Of course, more sophisticated parametric models can be derived in the same way, accounting for an ambient lighting term and/or specular reflections (Birkbeck et al. 2006;Yu et al. 2004).
Now that both a substitute to the grey level conservation equation and a way to model the scene radiance have been introduced, the corresponding framework developed for SDIC can be presented.

Photometric functional
Based on the work by Goldlücke et al. (2014), we consider a set of cameras. Each camera takes images of the ROIΩ (typically a surface of R 3 ). That is, we consider multi-view pictures ofΩ. The image ∈ 1, taken by the camera ∈ 1, is denoted where Π stands for the image plane. Note that Π depends on . We adopt a general formulation where camera positions may change. Some of the introduced notations are presented in Figure 4 for better understanding. is the camera model associated to Π , mapping a 3D point from the physical space W to 2D pixel coordinates in the image plane Π : where P denotes the space of camera parameters, whose definition depends on the chosen camera model. Usually in SDIC, it contains at least the 6 extrinsic parameters (three rotations and three translations) defining the rigid-body transformation between the world reference frame R and the camera reference frame R . It also contains at least four intrinsic parameters (focal/sampling parameters ( , ), optical centre pixel coordinates ( 0 , 0 )). The other possible parameters are distortion parameters (skew and/or radial, prismatic and decentring distortions). In the following, to reduce the amount of notation, we will simply write ( ) instead of ( , ). We also define the silhouettes = (Ω) ∩ I = (Ω vis ) where I (⊂ Π ) stands for the bounded domain of Π corresponding to the image.Ω vis denotes the visible part ofΩ in the picture taken by camera such that there is a one-to-one relation betweenΩ vis and thanks to the projection map .
Remark Strictly speaking, in Equation (10), is defined over I . Yet, as images are usually interpolated in DIC, this makes it possible to define over the whole image plane Π .
Finally, we need to introduce theoretically the backprojection operator which denotes the inverse function of the restriction of toΩ vis . As of now, we note that this operator is only introduced for the sake of consistency in our theoretical developments but will not be used in our implementation.

Figure 4
Diagram introducing different applications and notations used herein. Note that the sphere will not be used in this study for the construction of the light model. (1) is an ill-posed problem, similar to the measurement of a displacement field based on the grey level conservation equation between a reference and a deformed state image. Equation (1) does not account indeed for spatial averaging (pixelation), that is we have a finite set of equations. On top of that, grey level values, which are the result of the sum of an integral of the irradiance over each photosensor together with noise, are quantised. The existence of a solution is thus not guaranteed. The usual way to deal with these issues is to reformulate the problem (whether it be shape measurement or displacement measurement) as a functional minimisation. Thus, a norm of the residual associated to Equation (1) is integrated and the functional is built up by adding the integrals of all images together. Finally, the configurationΩ is sought in a smaller space (finite dimension because of the finite set of equations). Based on CV literature (Faugeras and Keriven 1998;Soatto et al. 2003;Goldlücke et al. 2014), we pretend that the right place to compute these integrals are the silhouettes . The idea is that the relevant quantum of information is the pixel. The unit weight associated to the residual norm should thus be assigned in the image plane. We will discuss that later on (see Section 2.3). According to these considerations, the photometric functional denoted by reads

RetrievingΩ on the sole basis of Equation
Raphaël Fouque et al. Remark For the sake of simplicity, we assumed in Equation (13) that images ( ) provide equally reliable data. Otherwise, residuals may be scaled appropriately (as in (Mathieu et al. 2015, noise variance) for instance).
Since = (Ω vis ), we can then express this functional over the visible parts of the ROĨ Ω vis thanks to integrations by substitutions, in the same way as Goldlücke et al. (2014): where and = ( , ) ⊤ . det(∇ ) and det(∇ ) denote the area elements of the corresponding projection maps and • obviously stands for the composition between two applications. Note that, as in (Goldlücke et al. 2014), we denoted here the differential (Jacobian matrix) in the same way as the gradient operator, for the sake of notational simplicity.
Remark For a simpler understanding of the integration by substitution from Equation (13) to Equation (14), note that J • could be substituted by |det(∇ )| in Equation (13). Yet, Equation (13) actually was preferred to make a parallel with the field of CV where the weighting defined in the images is introduced, i.e. J .
In a general framework, computing J is complex and costly. For this reason, and by assuming a pinhole camera model without distortions, we give an analytical expression for this area element: where = , and , respectively denote the world reference frame origin and the camera reference frame origin associated to the picture taken by camera , , stands for the Z coordinate of point in the camera reference frame associated to the picture taken by camera , and = − , /∥ , ∥ 2 , see Figure 4. Be careful that as a subscript denotes a vector or a point associated to a camera reference frame R whereas as a superscript is an index ∈ 1, . Expression (16) can be found in the CV literature (Soatto et al. 2003;Delaunoy and Pollefeys 2014). However, no proof is given in the literature reviewed. For this reason, a detailed demonstration and physical interpretation of this equation are given in Appendix A.
We can then integrate over the whole observed ROIΩ introducing a visibility function : with: The condition ( ) ∈ makes sure that the projection of lies in the image frame I while the condition • ( ) = ensures that the considered point is not hidden due to self-occlusion for instance.
Remark J and naturally appear when the residual is defined with unit weight in the images, no further assumptions are needed for this weighting scheme. This is a direct consequence of the adopted variational formulation (Goldlücke et al. 2014). Further discussions regarding this matter are presented in Section 2.3.
At this point, we should note that the residual is computed over what is observed in the images. Thus, what is observed is a deformed (or uncalibrated) stateΩ in Equation (17). Yet,Ω is one of the unknowns which should be described. Considering that a model is available, a simple way to do so is to introduce the discrepancy map . In the standard SDIC framework (Pierré et al. 2017), stands for either a shape correction field , or a displacement field defined on the reference configuration Ω (which is consistent with the solid mechanics formalism) standing respectively for the nominal geometry or the undeformed state. Usually, this discrepancy map belongs to the linear span of a set of chosen shape functions (e.g. FE shape functions (Pierré et al. 2017), splines (Dufour et al. 2015)), but it should be stressed that no prior assumptions are needed regarding the discrepancy map which still belongs to an infinite-dimensional space at this point. We define and we denoteΩ = (Ω), where Ω stands for the ROI in the reference state. Finally, we can express the functional over the reference state Ω, which actually makes sense in the context of solid mechanics, relying on Lagrangian approaches: . (20) Remark ∇ does correspond to the gradient of the mechanical transformation.
Note that with this method, the functional used to identify a shape correction or a displacement field is exactly the same. This offers a consistent, unified formalism throughout the entire framework. Usually in Global SDIC, the functional associated to the extrinsics and shape measurement problem enforces in a weak way that the grey level associated to a physical point should be the same for all cameras, see Figure 1(b). Thus, it consists of a sum over all camera pairs of the residual norm squared, while the functional associated to the displacement measurement is built as a sum of another kind of residual norm squared. This other residual is based on the conservation over time of the grey level associated to a given point on a camera by camera basis only (Pierré et al. 2017).
Introducing the Lambertian reflectance and the distant point light source models from Section 2.1 in Equation (20) allows finally to write a functional taking into account a Lambertian model: It also offers the possibility to make explicit an often implicit assumption in DIC. If the pattern deposited on the ROI is assumed to exactly follow the deformation of the specimen, and does not depend on the displacement or strain level, we can write where and˜respectively stand for the albedos in the reference and deformed states. Eventually the Photometric DIC (PhDIC) functional reads, in the case of a Lambertian BRDF and distant point light sources: Raphaël Fouque et al.
Photometric DIC: a unified framework for global Stereo Digital Image Correlation

Discussions
As already evoked, it seems logical to compute the discrepancy between images and the model in the image domain, as the pixel stands for the elementary unit of information. Besides this heuristic justification, the weighting term J , that naturally arises when substitutingΩ for between Equations (13) and (17), is a key driver for defining with unit weight in the images. It accounts for the foreshortening of the surface in input views (e.g. a surface is well described in a picture when viewed straight on). Hence, this term acts as an automatic regularisation of the variational problem while making the problem intrinsic, i.e. independent of the parameterisation chosen for the ROI. This is clearly established in CV (Goldlücke et al. 2014;Faugeras and Keriven 1998;Soatto et al. 2003). Also, the weighting term J would allow to define a consistent framework with multiple cameras with different resolutions and distances with respect to the specimen since it accounts for the spatial sampling of the surface, as shown in Appendix A with the distance , and the focal lengths and in the case of a pinhole camera model. No arbitrary relative weights would be needed for more resolved or near-field cameras as this formulation intrinsically defines a weighting scheme through J .
In Equation (23), the weighting term [|det(∇ )|((J • ) ) • ] alone explains why SDIC is restricted to a certain class of displacements and strains. Indeed, in most SDIC framework, both terms are assumed to be the same for all pictures and to remain constant over time. Thus, it means |det(∇ )| ∼ 1 and ((J • ) ) • ∼ (J • ) (i.e. + ( ) ∼ or equivalently ∼ where denotes the identity function). As stated above, the use of a model enables us to define a functional based on the sum of actual errors, that is the difference between a model and an observation. Thus the uncertainty associated to the identified discrepancy map (standing equivalently for a displacement field or a shape correction field ) in PhDIC would be reduced compared to the usual SDIC framework. This explains why some authors aimed at forming a substitute reference state image, in applications where the level of confidence in this reference is low for instance, by taking a mean over all available pictures (Berny et al. 2018a).
Finally, in the present work, and in contrast to the usual DIC framework, the camera model encompasses not only a projection model, but also a model to define the grey level value depending on the amount of energy received by the camera sensor. This requires the definition of a radiance model for the experimental setup encompassing both a light model and a Digital Twin of the structure. For a Lambertian surface, the albedo depends only on the position on the structure and a Textured Digital Twin based on physical quantities can be defined.

Intrinsics, extrinsics, shape and albedo measurements
In the following we propose a practical application on real images of the formalism presented in Section 2. In the present section, we showcase the calibration procedure prior to displacement measurement, the latter being, in turn, described in Section 4.
Before being able to perform a displacement measurement thanks to SDIC, several prerequisites must first be fulfilled (Pierré et al. 2017, Figure 2). The cameras should be calibrated (intrinsics and positions relative to one another), and the extrinsics and shape should be measured (position of the model with respect to the camera rig and corrections between nominal and actual shape). The difficulty in this prior phase concerns the shape measurement problem which is extremely ill-posed (see for instance Pierré et al. 2017, Figure 3). Regularisation strategies must therefore be adopted to circumvent this issue. They usually consist in restricting the subspace in which the shape is sought (whether it be in a strong or a weak sense) (Colantonio et al. 2020;Etievant et al. 2020;Benning and Burger 2018;Chapelier et al. 2021).
Another path that could be followed is an increase of the amount of available data (Passieux et al. 2015), but as detailed by Goldlücke et al. (2014), obtaining numerical schemes which scale favourably with the number of cameras is not straightforward (in the case of FE-SDIC, it scales as 2 as each picture has to be compared with every other one). This may explain why this possibility has not been fully investigated in SDIC yet.

Setup
In the present work, a single camera ( = 1) was used to take multi-view pictures of a rectangular plate with a circular hole, see Figure 5(a). The specimen was 20 cm long, 2.35 cm wide and 6 mm (a) Example of an input image used for the shape and albedo scan. Superimposed blue dots stand for the light calibration points. The world reference frame is shown in orange. neither of the two were present in the original picture.  thick, while the diameter of the bore was 7 mm. A classic black and white pattern was created by spraying paint on the surface of the sample. A Jai GO-5000C-USB 5 Megapixel camera and a 25 mm macro lens were used. The distance between the sample and the camera was about 1 m. The spatial sampling provided by the pictures was about 8 pixels per mm. Also, a single halogen light was placed right behind the camera so that a point visible by the camera was lit as well. The beam was attached on a custom calibration target composed of 8 points printed on an A4 sheet. The target was then fixed on a turntable allowing to take 360-degree pictures of the coupon, as indicated in Figure 5(b). Let us stress that, with such a setup, the direction of the light with respect to the beam changes for each picture, while remaining the same in the camera reference frames. Also, the turntable was only a convenient way to take multi-view pictures of the specimen. It served no metrological purpose. As described below, camera poses were rather identified thanks to the target.
A classic photogrammetric calibration (Garcia 2001) was performed on the pictures containing both the coupon and the target thanks to an in-house calibration software. Usually, at the end of this step, the camera intrinsic parameters are identified and saved but the relative position of the target with respect to the camera, which is also one of the identified quantities, is discarded. Here, we use this knowledge to initialise the extrinsic and shape measurement procedure since it allows to directly estimate the pose of the images with respect to the coupon which is assumed to be fixed in the target reference frame.
We now place ourselves in a SDIC Finite-Element framework (Passieux 2018). A perfect CAD model of the specimen is first meshed using T3 elements, see Figure 5(b). The typical size for elements was 5 mm but 2 mm-elements were used for mesh refinement care around the hole. The hole inner surface was not meshed.
Remark Let us underline at this stage that, since our method belongs to the class of global DIC approaches, the developed formalism is generic enough to incorporate any geometrical and kinematic description. In particular, a CAD-compatible discretisation made of splines could be used to perform the SDIC measurements (see, e.g., (Réthoré et al. 2010;Dufour et al. 2015;Dufour et al. 2016)), which would have the interest to establish a full link between the geometrical description in CAD, measurement, and simulation using IsoGeometric Analysis (IGA) (Hughes et al. 2005 we decided to consider a FE mesh as the geometrical input in this work since it remains the common practice in the field. We actually adopt the point of view of the experimenter who starts with a FE mesh provided by the analysts and who needs to perform a measurement using this FE mesh to communicate with standard FE-based simulation. Finally, as for regularisation, let us stress that our framework increases the number of data with the multi-view setup so it can be used with a rather fine FE mesh while alleviating the problem ill-posedness.

Assumptions
Since here we only focus on shape measurement, we assume that the available FE model allows to consider only slight corrections , see Equation (19), compared to the specimen characteristic length; in other words, the true shape is expected to be close to the nominal shape of the specimen. In this case, quantities in Equation (23) can be computed on the reference state geometry: This simplifies the formulation and allows to compute once and for all the normal field and the weighting term on the reference geometry.
On top of that, we will assume that the only light contribution comes from the light source mentioned in Section 3.1 (thus we omit the index in the following). We further consider that for each picture ∈ 1, , the light can be modelled as an infinitely distant point light source with a vector ( , ) given by the -vector of the camera reference frame associated to picture : , (in our convention, , is the unit vector colinear with the optical axis and pointing from the scene towards the camera, see Figure 4). Hence, for image , Equation (9) becomes This infinitely distant point light source assumption is valid if the size of the coupon is negligible with respect to the distance between the coupon and the light, which was the case here (ratio of approximately one order of magnitude). We further assume that the camera can be well described by a pinhole camera model, without distortions. This assumption, which is practically true with the optical system used in this work, comes with the benefit to have an analytical expression for J (see Section 2.2) which can be computed exactly (see Appendix A) and Remark The interest of the linear model is that it allows to express a closed-form solution for J whereas it would be a challenge with a non-linear model. However, non-linear camera models can be used straightforwardly for the practical implementation of the method. The above closed-form of J (exact for a pinhole model) would then be a correct first order approximation.
In the examples, we used a pinhole model to preserve the homogeneity of the paper. In addition, a pinhole model was considered łpractically true" since the reprojection errors estimated after calibration with and without non-linear distortions were approximately the same. A last benefit associated to the pinhole camera model is the smaller number of parameters and thus the need for a smaller number of calibration pictures.
As the light and camera are close to each other, the size of the coupon is also negligible with respect to the distance between the coupon and the camera. We can thus further simplify the weight J . Indeed ∀( , 0 ) ∈ (Ω vis ) 2 , ∥ , ( ) ∥ 2 ∼ | , ( )| ∼ | , ( 0 )| and, since there is only one camera, we do not need to consider the factor / 2 , and may take J = |⟨ , ⟩|, as in (Birkbeck et al. 2006).
One of the delicate tasks in our framework is to compute in practice the visibility function, since its value at a point depends on the camera position and orientation but also on the model geometry. Here, since the coupon is a convex shape (notwithstanding the hole) we propose to assess the value of ( ) based on the sign of ⟨ ( ), ( )⟩. In addition, since |⟨ ( ), ( )⟩| is equal to J • ( ), we can simplify the expression of the weighting as follows: [(J • ) ] ( ) = (⟨ ( ), ( )⟩) + , where (·) + denotes the positive part function. We recall at this stage that the visibility function is only introduced to say that we do not integrate the residual in the parts of the specimen that are not well seen by the camera.
Since the camera extrinsics (with respect to the coupon) have already been calibrated, we know the positions of the images relative to one another. Thus we can consider that we do not need to estimate each image pose with respect to the coupon but rather the position of the coupon (only 6 parameters) with respect to the virtual camera rig formed by the pictures (see Figure 5(b)). Ultimately, the functional to be minimised reads which is very close to the functional used by Birkbeck et al. (2006). ext 0 denotes here the extrinsic parameters of image 0 with respect to the coupon (all other images are then positioned through the photogrammetric calibration). Here is the sum of the shape correction field and the rigid-body displacement ext associated to ext 0 ( = + ext ). Now that all required assumptions have been clearly stated and expressed in mathematical terms, it is possible to define the process used to minimise this functional.

Discretisation and interpolation
In order to compute the integrals in Equation (27) for instance, the specimen surface is discretised via integration points and the integral over the physical domain is rewritten as a sum over all these integration points. The idea is to define integration points once and for all in the physical domain following the same strategy as in (Pierré et al. 2017, Figure 5d), where as many integration points as the number of pixels in a finite-element are used. There is however a slight difference because of our multi-view setup. Depending on the image used to observe it, the number of pixels in an element can be very different. For this reason, we decided to define an integration point density in points/mm. The number of points along each direction of our T3 elements was then defined as the product of the density with the associated edge length. To choose the value for the density, we used the pictures where the greatest number of pixels was reached for a given physical area. That is, we made the opposite choice to Dufour et al. (2015) where the coarsest mapping is used. In our case, this led us to choose = 8 points/mm.
To avoid undesirable oscillations at the free edges of the specimen (edges that belong to only one element), we removed integration points closer than to these edges (Baconnais et al. 2020). In this work the chosen distance was = 0.5 mm.
Regarding both picture subpixel interpolation and gradient computation, a regular bi-cubic spline interpolation was used.
The FE discretisation is performed through the shape functions associated to each dof such that with the condition ∀ ∈ Ω, ( ) = ( ) = ( ) .
Remark An important speedup was obtained by computing the integrals only over Ω vis (making use of the visibility function to exclude the integration points with zero weight).

Minimisation strategy
The process that we used to minimise (27) was a fixed-point algorithm consisting in an alternating optimisation algorithm. The reason for that is the problem ill-posedness. In addition to the usual sliding modes (Pierré et al. 2017), one should also be aware of the bas-relief ambiguities. We refer the interested reader to (Belhumeur et al. 1999) explaining (for orthographic projection models though) that a surface object is indistinguishable from a generalised bas-relief transformation of the geometry and an appropriate scaling of the albedo. Before describing in detail the way it was implemented, we describe the search directions we used and how we managed to minimise with respect to each of these directions.

Extrinsics and shape
To minimise with respect to extrinsics and shape, the extrinsic displacement field ext and the correction field were sought in subspaces of lower dimension than the linear span of the shape functions associated to the FE mesh. A Gauss-Newton iterative minimisation scheme together with a Ritz-Galerkin reduced order method were used. Both and ext were written as a linear combination of elementary displacement fields, that is a treatment similar to (Colantonio et al. 2020, Eqs. (4) and (5)).
Extrinsics One difficulty with this approach is that it is not straightforward to write the coupon rigid-body displacement as a linear combination, as rotations involve sine and cosine functions of rotation angles. To circumvent this issue, we used infinitesimal rotations around the centre of the coupon (the position is assumed to be well initialised). From an implementation point of view, we write ext = ext ext 0 where ext 0 collects the extrinsics. Shape It is common to measure the shape correction along the normal at the nodes of the mesh (Colantonio et al. 2020;Pierré et al. 2017;Chapelier et al. 2021). Defining the normal at a node is not straightforward, and it is usually done by computing the mean over the normals of neighbouring elements. This definition is satisfactory for nodes located in the bulk of the surface, it is not when considering nodes located on an edge or a corner. In this work, we used a k-means clustering algorithm (Jiawei et al. 2000) to be able to detect nodes where two (edge) or even three (corner) different dofs were needed to consistently measure the shape. We defined a maximum value for the half-angle of the cone circumscribed to all the normals of a cluster. For each node, a -means clustering algorithm was called with only one cluster over the set of normals of neighbouring elements. Then the number of clusters was increased until either there were three clusters or in each cluster the angle formed by all normals and the cluster centre was less than the maximum defined half angle. Finally, the node was affected the cluster centres as dofs. Recapitulating, we end up, as outputs of the k-means clustering strategy, with three types of nodes: those that lie within the face and have one dof, those that are on an edge to which we attribute two dofs in the two perpendicular directions of the edge, and those that are located on a vertex that have three dofs. With such a parametrisation, we did not encounter instabilities in our numerical tests. Note finally that, following Section 3.2, the normal will not be updated during the minimisation algorithm. From a practical point of view, we can write = shape , where shape gathers all the required information. Gauss-Newton algorithm To minimise with respect to extrinsics and shape, a Gauss-Newton algorithm was used (Pierré et al. 2017). The Gauss-Newton update at each iteration results from the linear system where ∇ denotes the image gradient. The Ritz-Galerkin method then reads In the case of extrinsics, = ext and = ext 0 , in the case of shape, = shape and = .
Remark Compared to the usual framework (Pierré et al. 2017), considering only ext 0 as parameter and not every single ext allows to use the exact same algorithm as for the shape measurement to calibrate the extrinsics. Remark Practically, we used a slightly different visibility function ′ , as in (Birkbeck et al. 2006). We considered that a point was visible not only when ⟨ ( ), ( )⟩ > 0 but when ⟨ ( ), ( )⟩ > VisionThre > 0. Because, we found that results tend to be more accurate when increasing VisionThre, the value VisionThre = 0.4 was determined as the greatest possible value leading to a non-singular matrix (the top face of the beam was not 'seen' for higher values of VisionThre). In other words, we remove some points that are seen with an angle larger than 66°, which is consistent with well-known developments in experimental mechanics (required angles between the cameras for appropriate SDIC measurements, see, e.g. (Balcaen et al. 2017a)). Thus, in the following, (⟨ ( ), ( )⟩) + will rather stand for the product of (J • ) and ′ .

Light intensity
We chose to calibrate the light once and for all. To do so, we arbitrarily set the value for the albedo of the white sheet standing for the target to 1. We then considered four points located on the sheet indicated as blue dots in Figure 5(a). These points are denoted by ( light ) . An overdetermined system was solved in the least-squares sense for each picture to retrieve through Equation (25) by taking ( light ) = − : Remark We attempted to use the framework of (Birkbeck et al. 2006) where spheres are used to calibrate both the light intensity and light direction. This has the benefit to allow the identification of an ambient term, which is not the case here because the normals ( light ) are all the same (the matrix associated to the overdetermined system would not have full rank). The specular reflection can easily be detected and allows to obtain the direction of the source while the Lambertian part of the surface allows to get the other parameters. However, in our case this yielded poorer results than the method described above. We believe that this is because the infinitely distant point light source assumption is not completely valid. There is only one sphere and thus the light intensity information is only valid around the sphere. The four points used above allow to obtain a less accurate but more general value for .

Albedo estimation
This part is probably the easiest one since a closed-form solution for the albedo minimising (27) can be derived (standard linear least-squares problem): Remark This definition for ( ) is in some sort a weighted average of all available observations of the physical point . In this sense, it makes it similar to the definition ofˆin (Dufour et al. 2015). However, considering both foreshortening and lighting effects shows the benefit to obtain a much sharper albedo, see Figure 6.
Remark In order to obtain a speedup in computation time, no interpolation scheme was used to evaluate the numerator in Equation (33), that is the nearest neighbour pixel was used to evaluate • ( + ( )). No significant changes in the identified shape nor albedo were observed regardless of the interpolation scheme.

Alternating optimisations
The structure of the iterative algorithm used herein to minimise the functional is presented in Figure 7.
On top of the procedure detailed above consisting in minimising with respect to different variables, we also made use of a multiscale (or coarse-graining) initialisation process (Pierré et al. 2017;Colantonio et al. 2020). At the beginning, the discrepancy map was initialised to 0 and pictures were considered at a scale scale = scale max = 3. At the scale scale, pixels in the Raphaël Fouque et al.
Photometric DIC: a unified framework for global Stereo Digital Image Correlation (a) Texture identified at initialisation without accounting for lighting effects, similar toˆin (Dufour et al. 2015). Results are presented in grey levels.
(b) Albedo (dimensionless) identified at initialisation accounting for lighting effects. Figure 6 Texture and albedo identified at the initialisation step. Note that the units are different: we make a distinction between the texture (in grey levels) and the albedo (dimensionless). To avoid bias, the scale chosen is the amplitude of each data set. Accounting for lighting effects clearly results in a much sharper gradient for the albedo. initial pictures were aggregated by groups of 2 scale × 2 scale resulting in coarser images. The density of points introduced in Section 3.3 was set accordingly ( /2 scale ). This has two main benefits. First, it drastically reduces computational time associated to the evaluation of the image and its gradient, since much less integration points are considered (4 scale times less). Second, it allows to obtain less accurate but much greater corrections at each iteration, leading in another way to a faster convergence. A Tikhonov regularisation term (Pierré et al. 2017) was added to the functional. As scale increases, the amount of available data shrinks while keeping the exact same number of unknown, making the problem more and more ill-posed, which is counterbalanced by the Tikhonov regularisation term. This term had a decreasing amplitude with the scale, until no regularisation was used for scale = 0.
Two different values were used as stopping criteria, namely stagnation with respect to the discrepancy map and with respect to the functional, defined respectively by Res = 10 −5 and Res = LoopRes /4 scale with LoopRes = 10 −3 in Figure 7. Dividing LoopRes by 4 scale for each scale allows to demand a better precision at the fast-to-compute coarsest scales which, as explained earlier, are known to be less accurate.

Results
In Figure 8 are shown the initialisation and convergence states with the mesh superimposed on pictures. We can see that our method allows to recover the specimen shape even though the object size was overestimated. Note that, in this framework, there is neither a need for selecting points 'by hand' or automatically (Passieux et al. 2015;Pierré et al. 2017) nor for fiducial marks in the pattern on the object. The regularisation of the extrinsics and shape measurement problem was rather obtained via more images than usual and the multi-view setup associated to the 3D mesh which allow to measure the specimen edges. The total number of available pictures of the specimen was 72 (approximately 5°between each pose). To evaluate the methodology described herein, we decided to form two different disjoint sets of pictures. Each of them contained 36 pictures with approximately 10°between each pose, see Figure 5(b). This allowed us to apply independently our method on these two sets to compare the identified shapes and albedos. The results presented in Figure 8 were obtained using one of these two sets.

Camera calibration
In Figure 9 is shown the reprojection error standard deviation associated to the calibration step described in Section 3.1, for one of the 36-picture sets. These results are satisfactory since the total reprojection error standard deviation is equal to 0.18 pixel.

Picture number
Error STD (pixel)

Albedo
In order to compare the retrieved albedos between the two different 36-picture image sets, we decided to compute the normalised albedo difference, defined as 2( 1 − 2 )/( 1 + 2 ) where 1 and 2 stand for the albedo associated to each image set, and 1 and 2 denote their mean values. The distribution of this quantity is plotted in Figure 10. The difference mean value (0.004) is small compared to the difference standard deviation (2.7 %). We chose to compare this last quantity to the normalised camera noise level, since we believe it is the relevant quantity to compare the normalised albedo to, in the same way as we will compare the shape measurement error to the calibration reprojection error, see Section 3.5.3. We estimated the camera noise through nine pictures taken for six different poses (a total of 54 pictures) and obtained a normalised camera noise mean and a normalised camera noise standard deviation respectively equal to 1.9 × 10 −16 and 1.3 %. We can see that the standard deviations of the normalised camera noise and normalised albedo differences are of the same order of magnitude. However, we will see later on, that the camera noise is not the only error source that we identified.

Shape
In Figure 12 are shown the projections of the integration points coordinates difference along each direction of the world reference frame, see Figure 5(a). For the and direction, the mean value is small compared to the standard deviation associated to the projected coordinates difference. Distinguishing the three directions allow to see a particularity in the direction since in this case the mean is much greater than the standard deviation. We interpreted these results based on the Figure 11 From the same mesh (black one), one can obtain two different parameterisations of the surface. We can interpret that by the coupling between shape and extrinsics measurement. The black mesh, corresponding to the initialisation step, is slightly larger than the real object. During the extrinsics calibration step, the mesh can slide indifferently in one direction or another, which is represented respectively by the yellow and purple mesh. Finally, the same shape is measured, but integration points lie at different positions. It is due to the problem ill-posedness (solution non-uniqueness). The crosses stand for the integration points.
solution non-uniqueness and an illustration is shown in Figure 11. The basic idea is that the framework described herein does not prevent from converging to different parameterisations of a same geometry. Thus, integration points can describe the same geometry and lie at different places on the surface of the object, even though they were defined at the exact same place on the initialisation mesh. This can also partly explain the slightly larger standard deviation associated to normalised albedo difference than the one associated to the normalised camera noise. Each integration point stands for an albedo at a slightly different place in each one of the considered image set. The reason we do not end up with odd results in Figure 10 is thanks to the pattern which smoothly varies in space.

Displacement measurement
In this application-oriented section, we aim to show, through an easy-to-setup test case, the potential of the developed PhDIC methodology to measure displacement fields that would be extremely delicate, if not impossible, to capture with the usual SDIC framework. Before really entering into the details of the specific test case, let us present the general way to use this framework to perform arbitrary displacement measurements.  Figure 12 Distribution of the coordinates difference (defined at the integration points) for the two different sets of pictures at convergence state. The reference frame chosen to define , and is the beam (or world) reference frame. The unit chosen for measuring a distance is the pixel in order to compare it to the calibration reprojection error. One should keep in mind that it makes only little sense to measure distances in pixels, as in a general multi-view setup it is not straightforward to convert a distance from mm to pixels.
Here, because the camera stood always about the same distance from the beam, an 8-pixel/mm constant of proportionality was used to convert the measures. For and directions, the mean value is smaller than the standard deviation. This last quantity is of the same order of magnitude as the calibration reprojection error (0.18 pixel). Regarding the direction, we can clearly see a bias (∼ −0.15 pixel), much larger than the standard deviation, corresponding to a shift along the direction. We suggest an interpretation of this bias in the measurements in Figure 11 based on the solution non-uniqueness.

Ground-breaking methodology for displacement measurement
To underline the novelty of the approach, we will denote the deformed state images by ( ) . Note that since we are able to generate virtual images (with the digital twin) to compare our observations to, they do not need to be associated to so-called reference state images (see Figure 2 as a reminder). Among others, this implies that the number of deformed state images does not have to be equal to the number of reference state images and can be greater or smaller, and also can correspond to different camera poses. For the displacement measurement, the functional writes (substituting to in Equation (23)): Remark We chose to update all variables identified in previous section. More precisely, in the equation above, Ω stands for an update of Ω by , and together with for the associated albedo and normal vector field.
With the same assumptions as in Section 3 regarding the weights J and the light model, we can simplify Functional (34). We also make explicit the variables with respect to which will be minimised: Remark is no longer an unknown since it was identified in Section 3.
At this point, we suggest two ways to proceed. Either the integral is rewritten on an initialΩ, which is very close to the actual deformed geometry, and the same assumptions as in Section 3 apply, that is Raphaël Fouque et al.
Photometric DIC: a unified framework for global Stereo Digital Image Correlation or we keep integrating on the reference geometry Ω as in Equation (23). Choosing the first approach may lead to two main benefits. First, the very same Gauss-Newton minimisation scheme as in Section 3.4.1 can be employed because a satisfactory initialisation does require to be built as it is a prerequisite for any gradient-based iterative minimisation scheme. Second, choosing the second approach is extremely costly computationally speaking, since, for instance, the normals and the visibility function should be constantly updated.
Remark Note that in both cases, the functional is unchanged. What changes is the integration domain.
Remark The two proposed resolution procedures can be viewed as the counterparts of the two main variants in computational solid mechanics to perform geometrically non-linear analysis, i.e. the updated Lagrangian (first method) and the total Lagrangian (second method) strategies (Bouclier et al. 2015;ten Thije et al. 2007;Oliver and Oñate 1984).
However, for the sake of pedagogy, we chose to follow the second approach, that is integrating over the reference domain Ω, as it allows to showcase a slightly different minimisation algorithm.

Example of a large rigid-body displacement measurement
The simple but illustrative test case that we considered was a large rotation (90°). In a usual SDIC framework, this would yield to two whole faces from the reference image replaced by two others as shown in Figures 13(a) and 13(b), and thus would certainly make the usual SDIC framework fail. We can further simplify Equation (35). As we want to measure a rigid-body rotation, we have Displacement measurement Calibration Shape and albedo measurement

Displacement measurement
Calibration Shape and albedo measurement (a) One of the multi-view pictures used for the extrinsics, shape and albedo calibration. Above is shown the position of all these multi-view pictures with respect to the beam. In a usual SDIC framework, this picture would stand for the reference state image 0 for displacement measurement.

Displacement measurement
Calibration Shape and albedo measurement (b) One of the multi-view pictures taken for the displacement measurement of a 90°rotation. We chose this picture (instead of the actual picture used) for illustration purposes: it is really close to Figure 13(a). The actual position of the only picture used to perform the displacement measurement is shown above. The mesh superimposed on the picture is the initialisation used (94°rotation). In a usual SDIC framework, this picture would stand for the deformed state image corresponding to 0 in Figure 13(a).

Displacement measurement
Calibration Shape and albedo measurement (c) Position of the mesh superimposed on a deformed state image at convergence state. Figure 13 Explanation of the proposed approach and how difficult it would be to measure this 90°rotation in a usual SDIC framework. As the displacement measurement step in classical SDIC would be based on the comparison between reference state images 0 , Figure 13(a), and deformed state ones , Figure 13(b), that observe different faces of the specimen. |det(∇ )| = 1. Also, the light intensity ′ is obtained in the same way as in Section 3.4.2. Finally, we used only one picture, since we theoretically have only six degrees of freedom, a single image should yield enough information. Once again, a Gauss-Newton minimisation algorithm is used: with Since an acceptable initialisation is needed for the Gauss-Newton minimisation algorithm to work, we kept on using the same linearisation for the extrinsics measurement ( ext ) as in Section 3.4.1.
For initialisation purposes, we computed the position of the camera relatively to the target thanks to the same kind of algorithm as the calibration one. At each step (each time a new was computed), both and were updated. The algorithm successfully converged with a single picture as long as we initialised with a displacement field 0 corresponding to a position of the mesh not further away than 4°from the actual specimen, see Figure 13(c).
Remark This rigid-body rotation measurement can also be seen as a repositioning of the camera with respect to the object. Thus, it opens the possibility of experimental setups where (some) cameras move around the object during tests.

Conclusion and outlooks
In this work, we propose a novel approach to SDIC inspired by CV research. The key idea is to model the grey level value by relying on physical quantities (radiance). This allows to build a photometric functional for DIC to measure indifferently extrinsics, shape and albedo or displacement fields. Unlike previous approaches to Global SDIC, where different functionals are used for these two objectives, the above-mentioned PhDIC provides a consistent and unified framework for DIC. This framework can be seen as an extension of Global DIC (as FEDIC for instance), where a geometric model of the structure exists. This model is enriched in PhDIC by the identification of the BRDF, which in a Lambertian model relies on one single parameter: the albedo. It allows to define a textured digital twin of the structure. In addition, the effects of light can be taken into account and modelled. The photometric functional is thus based on the comparison between a prediction (or a model) and an observation, instead of arbitrarily correcting observations to make them match, as done in classic (FE or Subset) DIC for instance. In addition, grey scale residuals are considered the most objective way to probe the ability of a model to reproduce an experiment from images in DIC (Neggers et al. 2017) or to determine areas where the geometry should be refined (Kleinendorst et al. 2015). It is therefore extremely important to analyse and model finely the different sources of grey level variation during an experiment. These variations cannot always be related to displacement alone, especially in stereo where light-geometry interaction effects can be substantial. One of the main features of this work, clearly established in CV, is that when defining functionals, unit weight should be assigned to residuals in the images, not in the physical space. It results in an automatic problem regularisation. By following this path towards displacement measurement on the reference geometry, and changing variables in the integrals step by step, we were able to naturally reveal a consistent weighting scheme. This weighting scheme exhibits interesting properties, particularly in a multiscale context (Passieux et al. 2015), since it allows to define a functional for DIC encompassing viewpoints with different resolutions and distances to the ROI. Some implicit assumptions in DIC (for instance that the pattern is assumed to exactly follow the deformation of the surface on which it is deposited) needed to be made explicit to establish the final expression of the functional that we considered.
The PhDIC framework, in the case of a Lambertian model for the BRDF and a single distant point light source, was described and applied to a parallelepiped beam with an open hole. For the calibration phase, a turn table was used to scan the object thanks to 360°multi-view pictures. This allowed to recover both the shape and albedo of the beam. The results between two completely different sets of input images were compared, demonstrating the accuracy and robustness of the approach.
Of course, building a textured digital twin comes at a price, but this price is related to the increase in the amount of available data. In return, associated data assimilation allows a strong regularisation of the shape measurement step. This step, which may be very delicate in usual Global SDIC frameworks, is critical for performing accurate displacement measurements.
Then the PhDIC functional was applied to measure a large 3D rigid-body displacement of the specimen using one single image from a different viewpoint than those of the calibration phase.
The displacement was such that the sample was completely flipped, making it impossible for other SDIC algorithms to measure the kinematic field.
There are many outlooks to this work, and we will focus only on a few of them. Investigating other parameterisations for BRDF's and light models seems to be an interesting and challenging task. It would make it possible to consider specular reflections, for instance. These phenomena are usually avoided but we believe that this is a compulsory step in order to extend SDIC application domain to large-deformation (and possibly large-strain) contexts and/or to large-scale applications with multiscale setups.
In this work, we used a turn table and a fixed light and camera in the calibration phase. Two other approaches may be considered in future works, especially adapted to address large-scale applications, where moving the considered structure may not be done easily. First, keeping the cameras and object at the same place while taking images for different positions of the light (Shape-from-Shading) (Mélou et al. 2018) could be investigated. Second, moving around cameras and light thanks to drones to scan the structure stands for another interesting outlook (Kalaitzakis et al. 2021;Jovancevic et al. 2015). Relying on the PhDIC framework together with the latter approaches and a relevant pattern, see for instance (Fouque et al. 2020), would be particularly suitable to tackle multiscale structural tests. Hence, another topic that should be tackled beforehand is the camera pose estimation in a more general setup where the whole target cannot be seen in each scan picture.
The application of this formalism to the measurement of a complex displacement field is a direct perspective of this work. The calibration and displacement measurement phases in completely different configurations will require a step 1 to calibrate the camera setup of the displacement measurement phase using the digital twin texture. The method that we proposed should allow us to consider tests that conventional stereo methods can hardly or simply not instrument. We think for example of folded tape springs (Kwok and Pellegrino 2012), slender elastic rods (Lazarus et al. 2013;Miller et al. 2014), flapping wings (Wu et al. 2011), large torsional deformation of flexible parts (Sicard and Sirohi 2014) or elastic ribbons (Charrondière et al. 2020) to name a few. We could also be able to track large rigid or elastic body translations and rotations of projectiles (Passieux et al. 2014). This finally would allow us to consider experiments in which the cameras can move, which could definitely offer new opportunities such as Stereo-DIC with camera mounted on drones (Kalaitzakis et al. 2021) or on robotic arms (Khrenov et al. 2018).
Finally, the work described herein allows to predict the grey level value associated to an integration point (located in the physical space) in order to compare it to the actual observation. Strictly speaking, it does not allow to generate virtual images from the scene (that could be compared pixelwise to actual pictures). Together with the work (Balcaen et al. 2017b), it may stand for an interesting outlook to envisage.

A Functional weighting term: analytical expression and physical interpretation
We have the identity J = ∥ × ∥ −1 2 . However, the backprojection operator is very costly to evaluate for a camera model accounting for distortions, and its gradient even more. Instead, we where 0 and 0 denote the optical centre pixel coordinates, and the product of the focal length and the camera sampling parameter along each direction, ( , , ) the coordinates of a point in the camera reference frame (denoted R ), and and the pixel coordinates of in the image. We also introduce a world reference frame (denoted R ). A rigid-body transformation makes the link between the coordinates of in the different reference frames:          = 11 + 12 + 13 + = 21 + 22 + 23 + = 31 + 32 + 33 + (A.2) which will alternatively be written: with and the origin of R and R respectively; is a rotation matrix. We also consider that the surface to which belongs can be parameterised as ( , , ) = 0. (A.4) To evaluate J , we first compute both = , and = , . For that, we can derive Equation (A.1) with respect to and . Since the algebra is very similar, we will present only the detailed steps for , : pixel length in the image plane on one of the contributions is illustrated. For this, the orange dash-dotted line and the green dashed line have the exact same values for all parameters except those from the considered contribution, also both correspond to the same length in pixel in the image plane. The orange dash-dotted line stands for a unit contribution. Thus, the ratio of the orange dash-dotted line to the green dashed line lengths allows to retrieve the contribution. In Figure A.1(a), we have similar triangles and thus the ratio of the orange dash-dotted line to the green dashed line lengths is / . In Figures A.1(b) and A.1(c), one must recall that J is defined locally. Hence, both the orange dash-dotted line and the green dashed line should be considered as infinitesimal and the solid blue rays going from the optical centre to the edges of these lines can be considered parallel, although it is obviously not the case in the illustrations. For this reason, the ratio of the orange dash-dotted line to the green dashed line lengths is |⟨ , ⟩| in Figure A.1(b). Similarly, in Figure A.1(c) this ratio equals (sin ) −1 = ∥ ∥ 2 / .