IfcObjAsm - an experiment mapping IFC entity instances to files

Hey all,

I’ve recently got to wondering what workflows are afforded if, instead of storing an IFC file per project, asset, or trade, you stored one IFC file per IFC entity. I wrote what amounts to an “executable napkin sketch” to find out: github.com/devonsparks/IfcObjAsm

I’ve included the summary description from the README below. Curious to get this group’s thoughts on the general approach and prior art. Thanks! – Devon


IfcObjAsm is an experimental command-line utility for transforming IFCXML files into networks of small, hyperlinked documents. It does this exploding an IFCXML file into its constituent IFC entities, with one entity per file. Each entity file is named by its IFC GloballyUniqueId and placed in a common folder repository (named “objects”) on the local file system. IFCObjAsm then rewrites all references between IFC entities as references into this common repository. This approach has a few consequences:

  • Monolithic IFC(XML) files are transformed into networks of small, linked objects, represented as files.

  • Subsets of data can be extracted on demand by traversing the object graph. This affords a modular decomposition of IFC data and reduces demand on authoring environments. You only ever transact the subgraph of data you need.

  • Because IFC entities are stored as files, they can be managed directly by version control systems like git. This offers new patterns of open, distributed collaboration based on IFC.

  • By using standard XML linking technologies, stakeholders can create “views” into their object repositories organized by use case. Xincludes, for example, can be used to symlink IFC objects into arbitrary containers, classification structures, or folders. Because the XIncludes only ever reference object data, it’s possible to build up as many “views” onto the object repository as needed.

  • Composition of IFC data reduces to folder merging. Conflicts now occur at the level of individual IFC entities, not project files.

Hi @devonsparks, I believe this idea is a relevant path to utilize IFC data in alternative ways. We have a research project going on which also covers the generation of granular IFC files per entity (guid). However, we utilize the linked data principles and generate ttl files. I think your approach is with xml files per entity. But I see the purpose and output has similar content. You can see the Github repo of our work here.

I think this approach is open to many research questions. This topic in the forum might be a good one for some brainstorming.
As of now, our approach covers the storage, distribution, semantic queries (per model or cross-models) and security of the generated ttl (or xml in your case) files in such scenarios.
We tested IPFS platform as the decentralized storage and file sharing mechanism and made some performance tests on IPFS. Although decentralized networks have many parameters to consider for performance benchmarks (number, location and network of nodes, etc…), we experienced a better download/upload performance when granular files are stored in IPFS compared with 1 IFC as whole in IPFS.
A paper is under review and will be published soon regarding our tests results. Next studies will be about running semantic queries on granular files distributed over the network and development of role-based encryption methods to secure the files and query results during sharing.
The research is planned to have some other platforms/dev paradigms to be tested through time. But I also would like to hear the ideas of the granular approach indeed.

Both you @devonsparks, @ylcnky and even @jonm have started the idea I shared in IfcXtreme project to see each entity as a component, an object, and start to improve each one independently and reduce/add data, datatypes, attributes, templates, etc object by object

I think XML and Graphs maybe be short-term solutions to use existing IFC, but the future will be different

I believe that in Labs and Institutes all around the world, some have started to think about my ideas to develop a “relational” IFC schema, especially based on SQL
Because SQL is the future
And JSON is good to for realtime approaches like IoT

The Built Environment Industry is not Media industry, the majority of data we see are or can be structured data, even let’s forget semi-structured data solutions like XML

Hey @ylcnky,

Definite overlaps in approach, which is both encouraging and not surprising. I’d love to read the paper when it’s available. Here’s what I could parse from the repo and links; let me know where I go awry :wink:

Your system takes an IFC file and:

  1. rewrites it as RDF triples, mapping GUIDs to URIs
  2. segments these triples into groups (by entity?)
  3. normalizes the results to canonical RDF/XML
  4. pushes each resulting document to IPFS, with a map between the GUID URI and the IPFS multihash (IPLD? something custom?)

…and though I couldn’t find it, I’m assuming there’s:
6.) an adapter that exposes the published triples as a SPARQL endpoint, Triple Pattern Fragments, etc, so triples can be read back by consuming applications and
7.) some kind of encryption preflight before step 4 that keeps published triples secure from prying eyes

I’ve targeted the same steps 1-3, replacing ifcOWL+RDF/XML with ifcXML. This is to keep the required runtime intentionally lean and “off the shelf” (vanilla XSLT, no external servers/services) at the cost of some flexibility.

Data consumption and exchange are handled separately, with the latter outsourced to a DVCS like git or mercurial. A user or application consuming or publishing data only ever references the local file system (no network traffic). All @refs between entities are local @refs into the same object repository. Exchanging data requires intentional push/merge between collaborators in the DVCS. This keeps the runtime small (which hopefully means fewer failure points), the data transparent (everything is a file that you can open in a text editor, put on a USB drive), and offloads “the hard stuff” (network transmission, conflict management, authentication) to external tools. This is clearly a point in a larger space of possible approaches (like yours :slight_smile:).

Good stuff!

I know that why you focused on XML, even the majority in the industry have realized that XML is better than Graphs (even Triple)

However, XML inherently has some issues, which one of them is the size of the file for “data/information exchange” which even IFC (in all formats) and all “closed file formats like RVT” have

And your approach is this (something like microservices):

There are lot of issues, which can list, for instance “rule checking” in XML which maybe be solved with Schematron

But still there are lot of obstacles :wink:

@devonsparks your summary is correct. I see the first three mutual steps with your work. We segment the triples by entity. For step four, actually this sth still requires more investigation to move next steps and holds open questions. The content of each file already contains the GUID URI within triple, but as you know IPFS generates the hash out of it. There is still some work to be done for the meta-level hash<-->content index. IPLD looked suitable, but we didn’t go through it in detail. Most likely we can end up with a custom solution. However, since IPFS is a decentralized platform and we keep as the core platform, creating a SPARQL endpoint will require shared computing resources. Currently we “assume” the catting the hashes of entire directory is the way to go to make sth on data. That’s why we kept the scope for conversion, data upload and retrieval performance for now.
In step 7, what you mentioned should not be in the repo yet. But your summary explains the path that we will go through. I see the DVCS can play the role of IPFS (vice-versa) in our case and I think it is an interesting path to experiment.
An alternative topic could be about handling and visualizing the geometry in this approach. Currently we separate the data and geometry and handle only “data” part of entities. Perhaps, this might require more expertise (at least for me) in terms of computer graphics and networks, but a rendering mechanism via the shared compute/graphic resources of other nodes in the network sounds like an exciting topic for me. Let’s see how it rolls up.