A Digital Terrain Model (DTM) is a representation of the bare-earth with elevations at regularly spaced intervals. This data is captured via aerial imagery or airborne laser scanning. Prior to use, all the above-ground natural (trees, bushes, etc.) and man-made (houses, cars, etc.) structures needed to be identified and removed so that surface of the earth can be interpolated from the remaining points. Elevation data that includes above-ground objects is called as Digital Surface Model (DSM). DTM is mostly generated by cleaning the objects from DSM with the help of a human operator. Automating this workflow is an opportunity for reducing manual work and it is aimed to solve this problem by using conditional adversarial networks.
In theory, having enough raw and cleaned (DSM & DTM) data pairs will be a good input for a machine learning system that translates this raw (DSM) data to cleaned one(DTM). Recent progress in topics like 'Image-to-Image Translation with Conditional Adversarial Networks' makes a solution possible for this problem. In this study, a specific CAN implementation “pix2pix” is adapted to this domain.
Data for "elevations at regularly spaced intervals" is similar to an image data, both can be represented as two dimensional arrays (or in other words matrices). Every elevation point maps to an exact image pixel and even with a 1 millimeter precision in z-axis, any real world elevation value can be safely stored in a data cell that holds 24-bit RGB pixel data. This makes total pixel count of image equals to total count of elevation points in elevation data. Thus, elevation data for large areas results in sub-optimal input for "pix2pix" and requires a tiling. Consequently, the challenge becomes "finding most appropriate image representation of elevation data to feed into pix2pix" training cycle. This involves iterating over "elevation-to-pixel-value-mapping functions" and dividing elevation data into sub regions for better performing images in pix2pix.