Robustness of the Data-Driven Identification algorithm with incomplete input data

Identifying the mechanical response of a material without presupposing any constitutive equation is possible thanks to the Data-Driven Identification algorithm developed by the authors. It allows to measure stresses from displacement fields and forces applied to a given structure; the peculiarity of the technique is the absence of underlying constitutive equation. In the case of real experiments, the algorithm has been successfully applied on a perforated elastomer sheet deformed under large strain. Displacements are gathered with Digital Image Correlation and net forces with a load cell. However, those real data are incomplete for two reasons: some displacement values, close to the edges or in a noise-affected area, are missing and the force information is incomplete with respect to the original DDI algorithm requirements. The present study proves that with appropriate data handling, stress fields can be identified in a robust manner. The solution relies on recovering those missing data in a way that no assumption, except the balance of linear momentum, has to be made. The influence of input parameters of the method is also discussed. The overall study is conducted on synthetic data: perfect and incomplete data are used to prove robustness of the proposed solutions. Therefore, the paper can be considered as a practical guide for implementing the DDI method.


Introduction
Constitutive equations are historically essential in Mechanics of Materials to perform analytical or numerical calculations: they close the problem when combined with mechanical equilibrium.In practice, the identi cation procedure consists in choosing or deriving a constitutive model that describes well the material response.Then, the calibration of the model parameters has to be done ideally considering multiple deformation states (uniaxial tension, pure shear, biaxial tension...), which renders di cult the experimental process.These steps are often done iteratively to end up with a robust and well-calibrated constitutive model.With the evolution of full-eld measurement techniques such as Digital Image Correlation (Sutton et al. ), identi cation methods are constantly being improved, speci cally with non-standards tests.For example, Avril et al. ( ) and Roux and Hild ( ) proposed an overview of identi cation techniques such as the Virtual Fields Method or the Finite Element Model Updating Method.Concomitantly, with the emergence of Data Sciences, other methods are proposed: for example, Furukawa and Yagawa ( ) and Yang et al. ( ) trained a neural network for identi cation purposes.Here, a new path is chosen: identifying the material response with no underlying constitutive equation.Indeed, it is possible to use the previous techniques (full-eld methods and Data Sciences) to create rich databases that can be used for identi cation but also for simulation.It overcomes the di culties in getting a robust identi cation of the model parameters.This has been introduced in (Kirchdoerfer and Ortiz ) where the constitutive equation is replaced by a discrete database of strain-stress couples.The corresponding approach is referred to as Data-Driven Computational Mechanics (DDCM).Slightly di erent formulations of this solver are proposed in (Ayensa-Jiménez et al.
; Kanno ; Kirchdoerfer and Ortiz ; Nguyen and ).Concerning material characterization, non-parametric approaches are proposed in (Latorre and Montáns ; Crespo and Montáns ) in which the strain energy function of a hyperelastic material is not presupposed but expressed with splines.In (Amores et al. ) splines are further used to build a structure-based non-parametric constitutive manifold of the material, using simple experimental tests which explicitly provide the stress values.Furthermore, it is possible to account for the thermodynamic consistency of the data-driven procedure through well established formalism as proposed by González et al. ( ).For more complex testing conditions, the stress eld is heterogeneous and cannot be obtained in a straightforward manner.In (Réthoré et al. ), a decomposition of the strain eld obtained with Digital Image Correlation is made in order to compute the stress eld without constitutive equation.In (Seghir and Pierron ), experimental dynamic measurements are used in the balance equations so that stress elds can be directly computed.Additionally, several manifold learning approaches have been proposed and validated on synthetic data to identify a material constitutive manifold, see for example (Ibañez et al. ; Kanno ).In the present paper, a speci c algorithm called Data Driven Identi cation (DDI) is considered; it has been recently proposed in (Leygue et al. ).It allows to identify heterogeneous stress elds from measured displacement elds and external forces, without constitutive equation.It relies on the availability of heterogeneous and rich data which can be smartly clustered so that a strain-stress database is built without constitutive equations.It has been validated with synthetic data (Leygue et al. ) and its application to real data has been recently assessed (Dalémat et al. ).It is an innovative tool to measure stress elds, from DIC gathered displacement elds and net forces measured by load cells.
The di culty in applying the DDI algorithm to real data lies mainly in the incompleteness and noisiness of data in some areas of the samples.Indeed, unlike a synthetic problem where everything is perfectly known, neither all forces nor all displacements can be perfectly measured; these di culties are overcome by making preprocessing choices (both on the two intrinsic parameters of the algorithm and on the experimental input data).The so-called preprocessing step transforms raw input data to well-conditioned input data with consistent parameters for the DDI algorithm.The present work demonstrates the robustness of the DDI algorithm when applied to incomplete data: several possible preprocessing choices are compared so that the proper one can be applied with con dence.It is to note that although the discussion is here illustrated on a single non-linear hyperelastic case study, it stems from the experience accumulated in applying DDI to many cases involving synthetic and real data, and linear and non-linear material behaviors (Leygue et al. ; Dalémat et al. ; Stainier et al. ).The paper is organized as follows.First, a brief recall of the algorithm is proposed in order to highlight its optimal parameters and input data.Then, a case study is built to study several preprocessing choices.Synthetic data are considered for which the reference stress response is known.These synthetic data are modi ed to simulate incomplete data representative of reality.Then, a parametric study is conducted to nd the proper preprocessing choice.It focuses on: (i) the intrinsic parameters (of the algorithm) when using the DDI with perfect data; (ii) the preprocessing step for incomplete input data (missing displacements and forces); Finally, the proper preprocessing choice is summarized so that the DDI method can be applied with con dence on real (i.e.partial and noisy) data.It gives the reader the possibility to implement him/herself the DDI method for real data.

Recall of the Data Driven Identification algorithm
This section is recalling the DDI algorithm so that its optimal parameters and input data are highlighted.The Data Driven Identi cation (DDI) (Leygue et al. ) corresponds to the inverse method of Data Driven Computational Mechanics (DDCM) derived in (Kirchdoerfer and Ortiz ).This method identi es the complete response of a structure without using constitutive equation, from a large database.

Input data
We consider a D-meshed geometry, deformed over increments indexed by .For this geometry, the following data are the inputs of the algorithm and are considered to be available: (I-) the nodal displacements u , being the node number.The strain derived from the displacements is the Hencky true strain tensor ln v.It is de ned from b, the left Cauchy-Green strain tensor, by 2 ln v = ln b with b = FF ⊤ , F being the deformation gradient tensor.In practice, a Digital Image Correlation software provides displacement elds on a grid.A mesh with associated connectivity is built from it to compute F, (I-) the matrix B which encodes both geometry and connectivity, being the quadrature point number.In particular, the mechanical balance can be evaluated at all nodes by: where is the integration weight of point at loading step , (I-) the nodal forces f .These are zero in the absence of body forces, excepted for boundary nodes.
Additionally, the method has two intrinsic parameters: (Inp-) the size * of the (stress-strain) database that samples the material response, (Inp-) the positive de nite tensor C that de nes the distance between two points in the phase space (here, the stress-strain space).

Output of the method
After convergence is achieved, the mechanical problems are solved and the method provides: (O-) the stress elds that satisfy the mechanical balance in each node according to Equation ( ).The stress (calculated) and the strain ln v (measured) are referred to as a mechanical state, as they are mechanically admissible (balanced and compatible), (O-) the * material states (ln v * , * ), * being chosen by the user (Inp-).These material states can be interpreted as a sampling of the material strain-stress response surface.Their distance from mechanical states is de ned by a norm || • || 2 C de ned in Equation ( ) where C is a fourth order positive de nite tensor also chosen by the user (Inp-).

Solver
The algorithm aims at nding material states that are as close as possible to statically and kinematically admissible mechanical states (the latter being half known: the strain eld is known, the stress eld not), according to the norm || • || 2 C de ned by: Although this norm has the form and the units of an energy density through C [Pa] it is not related to any actual energy in the system: the magnitude of C simply allows to weight the respective contributions of strain and stress.The problem is formulated as follows: and subject to the constraints: • satisfy Equation ( ), • material state (ln v * , * ) associated to the element of increment belongs to the database (ln v * , * ) * =1 .Therefore, the DDI outputs are: • the mechanical states, • the database of material states, and • the mapping between mechanical and material states.
In (Leygue et al. ), the validity of the method has been demonstrated with perfect synthetic data, from (I-) to (I-).
In the experimental validation (Dalémat et al. ), the algorithm has been applied with incomplete data that are well-preprocessed.The purpose of this paper is to carefully study the preprocessing choices and their in uence on the robustness of the algorithm.

Building the case study
In this section, a case study is developed to study several preprocessing choices.First, features of usual real data are presented then several preprocessing options are proposed, with a focus on missing data.Finally, the methodology for the next section is summarized.

From idealized to realistic input data
Experimental data might have missing information and can be noisy.Here, the construction of the actual realistic problem from perfect synthetic data is explained thanks to a D example.Noise on the displacement eld measurements has to be taken into account.The typical noise in DIC is considered to have an amplitude of the order of pixel, independently of the measured displacement.The discussion of the e ect of noisy displacement values is beyond the scope of this paper and has already been partially addressed in the original DDI publication (Leygue et al. ).

Why are some data missing?
The DDI method is applied to a perforated hyperelastic membrane subjected to uniaxial tension.
The mechanical problem and the di erent notations are provided in Synthetic data are rendered incomplete according to usual experimental constraints: • we cannot measure the nodal forces but a net force, • displacements are sometimes missing in areas called clusters (which are larger than just a few pixels): the latter are the DIC results when using a software that does not provide the considered unreliable displacements (due to large strain, noise or loss of speckles for example), • displacements are also missing close to edges: both the camera and the DIC software which works on a manually preselected region cannot resolve the edges of the part.In addition, most correlation software use rectangular patterns that cannot account for curved edges.
The mechanical problem with real boundaries is thus depicted in Figure (b).Also, in Figure , the theoretical boundaries are the top boundary (Γ T ) where the force is applied, the sides boundaries (Γ S ) that are free edges, the displacement-free bottom boundary (Γ B ) which is clamped and the hole boundary which is stress-free (Γ H ). In the real problem, all the boundaries are close but not exactly identical to the actual ones.They are noted ΓT for the top, ΓS for the sides, ΓB for the bottom and ΓH for the hole boundaries.Plus, the cluster of missing data is de ned by its boundary denoted C .

Possible preprocessing options for missing data
The preprocessing choices concern both the intrinsic parameters of the DDI method and the way of dealing with raw data.First, the preprocessing choices regarding the missing data are detailed.
With such experimental data, it is necessary to rewrite some equations of the initial DDI algorithm given in Section .Indeed, handling properly the areas where data are missing is fundamental to insure robustness.Several possibilities are proposed to deal with the missing data: • In the area near the grip, where the force is measured (in the following, the "t-" stands for the top boundary): (t-) In the synthetic case, we know each nodal force f at the top boundary Γ T ; (t-) Using a load cell, only the sum of the forces f on the top boundary Γ T in the loading direction n sol is known: It is thus possible to de ne a global equilibrium condition on the boundary, by combining Equation ( ) and Equation ( ): In the real case, displacements close to the grips are not measured and the true boundary cannot be considered in the algorithm.Thus, the boundary Γ T cannot be considered and is replaced by ΓT .To deal with the force information, the simplest solution is to assume that Equation ( ) applies also on ΓT as follow: • For clusters of missing displacement values, the objective function Equation ( ) cannot be evaluated in some elements which should be removed from the problem along with associated nodes (in the following, the "c-" stands for clusters).(c-) A simple and naive solution is to simply discard the equilibrium constraint for these nodes.(c-) Another solution is to consider that the boundary of clusters is the boundary of a mechanically balanced subset.Indeed, a global balance condition is prescribed on the boundary C .This is equivalent to consider a zero net force on this boundary.This can be easily explained by the Ostrogradsky-Gauss theorem in the continuous formulation: which gives, for the discrete formulation: • For edges close to holes, the perfect case is the one where the mesh boundary coincides with the real edge of the hole and the free edge condition applies.It is denoted (h-) and will be the reference case (in the following, the "h-" stands for the hole boundary).In the real case, due to the imperfect edge de nition, the displacement values in the vicinity of holes edges are not known.Therefore, the data on the real boundary Γ H are not known and ΓH must be considered instead.On this boundary, several assumptions can be made: (h-) The free edge assumption can be adopted if we consider that ΓH is really close to Γ H so the edge is free.This incorrect assumption is likely to introduce a bias in the predictions.(h-) A weaker assumption consists in applying a zero net force on this boundary.It is veri ed as the missing matter should be mechanically balanced (like in (c-)): These strategies to deal with missing data are summarized in Figure .ΓH Summary of the possible preprocessing choices.They concern missing data for three particular boundaries: top boundary with grip, cluster of missing data and imperfectly de ned edges close to hole.

Inputs and parameters
The methodology to investigate the robustness of the DDI is to compare several cases of synthetic input data that are deteriorated on purpose.The cases are studied with respect to the intrinsic parameters of the DDI algorithm.As a recall, the inputs of the algorithm are: • the algorithm parameters (intrinsic to the resolution method): * and C, • the measured data, especially displacements and forces, which can be incomplete.
Therefore, the discussion is organized as follows: .First, the e ects of intrinsic parameters on a case where the input data are prefect are analyzed; .Second, the in uence of the incomplete measured data is analyzed: the cases of (t-), (t-) and (t-) related to the top grip are compared, the cases of (c-) and (c-) related to the clusters of missing data are discussed, and nally the cases of (h-), (h-) and (h-) related to the edges close to the holes are considered.

Reference model
It is necessary to build synthetic data for which the reference response is known.Thus, a standard Finite Element model (made with the software Abaqus™) is used.The geometry is indicated in Figure where both the initial and deformed meshes of the problem are presented.The initial height is denoted ℎ 0 .The Ogden model ( Ogden) is chosen with the corresponding strain energy density The parameters are listed in Table .They are identi ed in (Ogden ) to t experimental data of Treloar .
= 6108 linear triangular nite elements under plane stress condition are chosen.The displacements are prescribed using a ( , ) coordinate system corresponding to the horizontal and vertical directions, respectively.They are given for the top and bottom boundaries (denoted Γ T and Γ B ) by ( ) The nite element computation is decomposed into = 21 increments under quasi-static loading conditions.It gives the reference stresses in each element denoted FE .The strain elds, meshes and loading conditions are used as inputs in the DDI algorithm with the preprocessing choices introduced in the previous section, resulting in an identi ed stress eld, denoted DDI .
Table 1 Ogden parameters to build the reference solution (Ogden ).

Coe cient
Value Units

Error in stress identification
As the purpose of the DDI is to measure stress eld without constitutive equation, the global error between the stress eld identi ed by the DDI DDI and the reference one FE is computed for all loading increments and all elements by

Results and discussion
This section presents the results obtained by comparing several cases of incomplete data.The aim is to determine the proper preprocessing choices that ensure robustness and reliability for stress identi cation.The in uence of the intrinsic parameters of the DDI algorithm is rst discussed with perfect input data.Then, the incompleteness of input data and the preprocessing choices associated are discussed.
Robustness of the Data-Driven Identification algorithm with incomplete input data

Influence of intrinsic parameters
The number of material states * is the parameter that allows to sample more or less nely the response of the material.It is to be compared to the total number of degrees of freedom of the problem: × = 128 268.We de ne the sampling ratio * = ( × )/ * and consider that it varies between 2 and 10 4 .
The distance to mechanical states (ln v , ) is de ned by the norm || • || 2 C of Equation ( ).The simplest form for the tensor C is spherical with an amplitude , that is C = I where I is the fourth-order identity tensor.This form aims to equally weight all components of the strain and stress elds.The tensor is de ned accordingly to a pseudo-tangent elasticity modulus of the behavior model used in nite element analyses: 0 = 2.3 × 10 6 Pa.It is computed by the slope of the straight line found by the least mean square method in the (|| ln v|| VM , || || VM ) space (von Mises norm).Practically, we choose values of ranging from 10 −6 0 to 10 6 0 .Figure shows the identi cation error after convergence.Figure (a) presents the error as a function of the sampling ratio * for di erent values of .For each value, the minimum error (with respect to * ) is reported.Then, the minimum error (for the optimal value of * ) in relation to / 0 is shown in Figure (b).The error is minimal for * ≈ 20 (10 to 50 depending on the value).A large ratio (not enough material states) implies a sub-sampling of the response and therefore a signi cant error.Conversely, a too small ratio (too many material states) does not provide enough regularization to the stress estimation problem as the behavior is no longer averaged su ciently, which also leads to a signi cant error.It is therefore necessary to choose a value between these two extrema; similar results are reported in (Leygue et al. ).
In addition it is shown that signi cantly contributes to the convergence of the method: the higher it is (to a certain extent), the lower the error is.Indeed, the distance de ned by the norm in uences the mapping between material and mechanical states.By choosing a large value of , the mapping based on strain values is favored, which is relevant since they are measured (and so reliable) unlike stresses which evolve during the convergence of the algorithm.Finally, * is more in uential than : without missing data, a bad choice of * will never be compensated by a good choice of .

Influence of the incompleteness of input data 4.2.1 Force input
First, we consider preprocessing choices related to force information: either with all nodal forces (t-), or represented by their net value on the true boundary (t-) or the net force on the approximate boundary (t-).The in uence of * on the error is reported in Figure for = 10 3 0 .Global errors are similar: a sampling ratio * from to is preferable.It shows that the DDI results are only slightly in uenced by the way these equilibrium conditions are prescribed on the top boundary.
For a local insight, Figure presents the nodal forces computed with the stress identi ed with the DDI in case (t-).They are compared to the reference case (t-).They are really similar which means that stresses computed with the DDI are almost as perfect as the reference ones, even if the input in force is the net force only.

Cluster of missing data
We consider the in uence of the preprocessing choices related to a cluster: by handling it naively (c-) and in a mechanically optimal way (c-).The in uences of * and on the error are shown in Figure .
In the case of a naively handled cluster (c-), it is di cult to achieve a small error.Too many or too few material states lead to more important errors.Here, the choice of is crucial: the larger it is (within a certain limit), the closer we get to a mapping based on strains (which are known).By simply adding the zero net force condition (c-) on the boundary, as proposed in Equation ( ), a robustness similar to results without missing data is recovered.In this case, the choice of is much less critical than that of * .

Imperfect resolution close to holes
Finally, we consider the in uence of the preprocessing choices in the case of an imperfect resolution close to the edges, on the boundary ΓH .The case of (h-) the free edge assumption on this boundary and the case of (h-) zero net force over the boundary ΓH are compared to the perfect case with no missing data close to the edge (h-).Errors are plotted for a given with r* Figure 8 In uence of preprocessing choice for the hole edge de nition on the error as a function of * for a given , for the cases of an imperfectly de ned edge close to the hole (with (h-) the assumption of free edge, (h-) the global balance condition) and of a perfectly de ned edge (h-) (left sub gure).Nodes/elements used in the calculations (right sub gure).
error, whereas the globally balanced assumption again induces a small error, close to the ideal case.Then, it is interesting to study the stress distribution as one approaches the hole: the von Mises stress is plotted along a line of the sample for the three cases, as depicted in Figure .For the free edge assumption (h-), the algorithm predicts a misplaced stress increase close to the wrongly presumed free edge.Stresses are overestimated around the hole and this overestimation propagates to the bulk by equilibrium relations which are global.Therefore, the best manner to handle an imperfect edge consists in adopting a mechanically correct assumption: only a zero net force condition must be enforced.In this case (h-), the error is similar to the one with no missing data.The optimal ratio * is again between 20 and 100.). 5 Closure: implementation of the DDI with real data In this work, the input parameters of the DDI algorithm have been examined, with the objective of identifying correctly the stress eld without constitutive equation.A study of its intrinsic parameters con rms our previous work.In particular, the consequences of incomplete data (inherent to experimental data) is analyzed through two aspects: the availability of net forces instead of nodal forces on the computational mesh, and the di erence between the actual part geometry and the computational mesh.This last aspect appears either through clusters of missing data (areas of a few pixels/elements) and the imperfect edge de nitions close to holes and boundaries.We show that the robustness of the method is ensured when incomplete data are managed under a strict mechanical point of view.Although these recommendations are here illustrated on a single example, they are drawn from our experience in applying DDI to many cases involving synthetic and real data, linear and non-linear material behaviors (Leygue et al. ; Dalémat et al. ; Stainier et al. ).To conclude, we propose to adapt the original DDI algorithm to real experimental data.The

FigureFigure 1
Figure 1 Problem formulation: (a) theoretical and (b) real problems with three particular modi ed boundaries: top boundary with grip, cluster of missing data and imperfectly de ned edges close to hole.

Figure 3
Figure3Case study of a perforated hyperelastic membrane under uniaxial tension: (a) initial geometry, (b) mesh after 200 % of total macroscopic strain.On the top nodes, information on force is available while the bottom ones are clamped.

Figure 4
Figure 4 In uence of the intrinsic parameters * (related to * ) and without missing data: (a) error compared to * and (b) minimum error compared to .

Figure 7
Figure 7 In uence of the preprocessing choice with a cluster on the error as a function of * and for (a) case (c-) naive and (b) case (c-) mechanically optimal.

Figure 9
Figure 9 von Mises stress eld (right) and reported values along a horizontal line going through the sample (left),for the three cases between reference and identi ed stress elds

Figure 10 Figure 11
Figure 10 Comparison between reference stress elds and identi ed stress elds with DDI for a geometry with the proper preprocessing choices for missing data.Colors in D histograms represent the histogram bin probability for each stress component.
Figure5In uence of force inputs on the error as a function of * for (t-) the given nodal forces, and (t-) and (t-) the given net force on respectively the true boundary and the approximate boundary.
Dalémat et al.Robustness of the Data-Driven Identification algorithm with incomplete input data respect to the sampling ratio * in Figure.Considering the free edge assumption leads to a large