Machine Learning Applied to CPT Interpretation

Geotechnical site characterization is the process of collecting in situ data to determine the characteristics of the subsurface, such as its material properties and spatial variability. This allows engineers to develop ground models needed to perform geotechnical design. To gather in situ geotechnical data, the Cone Penetration Test (CPT) is commonly used, as it provides a large amount of data relatively quickly. This data allows engineers to build ground models with more confidence, but it requires a significant amount of time to interpret it manually.   

To decrease the time it takes to interpret CPT data, we explore the field of machine learning. Clustering methods, such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and its counterpart, Hierarchical-DBSCAN (HDBSCAN) classify data into distinct groups, where each group consists of data points that have similar parameters.      

These methods are particularly useful for analysing CPT data, as raw CPT data can be correlated to many different soil properties, which are often difficult to classify. Normally, an engineer uses their best judgement to manually separate the data into distinct groups of subsurface materials. This leads to a subjective ground model and can take a significant amount of time to complete. DBSCAN and HDBSCAN, on the other hand, can perform an analogous classification process in just a few seconds, leading to an objective ground model in a fraction of the time. 

To illustrate this process, we use data obtained at a site containing a dam built over an 800-m-long shallow valley. This site was characterized using many in situ tests, including 206 distinct CPT soundings. Originally, a ground model was manually created for this site. This process took more than three weeks due to the vast amount of data and the variation of subsurface materials.  

In comparison, both DBSCAN and HDBSCAN were used to classify data from a group of 24 CPT soundings along the crest of the dam. Three parameters were used: elevation, normalized tip resistance, and friction ratio. Any parameters can be used for classification purposes, but these three were chosen, as they are commonly used to classify subsurface material types (e.g., Been and Jefferies (1992) or Robertson (2016)). The results show that these clustering methods can classify soil similarly to a manual approach, but in much less time.