[PDF] [BIB] [DOI]
There are hundreds of distinct 3D, CAD and engineering file formats. As engineering design and analysis has become increasingly digital, the proliferation of file formats has created many problems for data preservation, data exchange, and interoperability. In some situations, physical file objects exist on legacy media and must be identified and interpreted for reuse. In other cases, file objects may have varying representational expressiveness.
We introduce the problem of automated file recognition and classification in emerging digital engineering environments, where all design, manufacturing and production activities are "born digital." The result is that massive quantities and varieties of data objects are created during the product lifecycle.
This paper presents an approach to automated identification of engineering file formats. This work operates independent of any modeling tools and can identify families of related file objects as well as variations in versions. This problem is challenging as it cannot assume any a priori knowledge about the nature of the physical file object. Applications for these methods include support for a number of emerging applications in areas such as forensic analysis, data translation, as well as digital curation and long-term data management.