Future of IFC5 (academic paper)

I would like to discuss some minor topics related to this paper:

"Future of the Industry Foundation Classes: towards IFC 5 "

This paper suggests that IFC5 should be modelled as an UML class diagram.
The EXPRESS schema language is also critizised.

I would like to express the following opinions:

  1. It is important to use a machine readable non-proprietary openly available schema definition language that can be parsed using non proprietary tools.

EXPRESS is machine readable. UML is mainly a visual language. Some text-based file-format versions of UML exist, but as far as I know, they are all proprietary file formats. It makes no sense to publish IFC using a schema definition stored in a proprietary UML file format.

Summary: Let’s keep it possible to work with IFC using only openly available non-proprietary tools and formats.

  1. It is important to keep the IFC schema definition machine readable.

The visual UML class diagrams could be helpful for people who are designing data models, such as IFC. But the most important aspect of IFC is not to design the IFC.

The most important aspect of IFC is to use the IFC in applications. To use IFC (in any application) you first have to convert the IFC schema definitions into class definitions in any object oriented programming language (OOPL). Nobody uses EXPRESS directly. Nobody can convert EXPRESS to OOPL class definitions by hand. It is too complex for that. It would take to much time and there would be a spelling error somewhere.

If the IFC data model is not presented in a machine readable format, then we can all forget about using IFC in real life applications. Since UML is not machine readable (at least not if the rule is to implement the full UML standard) there are no algoritms that can fully and correctly convert visual UML to class definitions.

Summary: Let’s keep the IFC data model machine readable. It is more important to be machine readable correctly, then to be user friendly. Because no user can convert IFC to class definitions manually.

  1. A method for round-tripping IFC data in graph databases is important.

IFC data is persisted as text files. I think STEP is OK, but it is still a text file. To read or analyze data in a STEP file it first has to be loaded from the hard drive and parsed into the internal memory of the computer. This is not efficient if the STEP file is huge, or if you need to search through many different STEP files, to find the result of a query and retrieve just a tiny bit of information (about a window). Especially not if this had to be done repeatedly from many different STEP files.

Nowadays we prefer to store big data in databases instead of text files. This makes it easier to analyze and query loads of complex data compared to using text files. Text files were used for data storage and retrieval back in the days of COBOL. Today we use databases.

IFC data consists of a complex network of relationships and entitites that inherits from many classes. Such object oriented data is not suitable to store or analyze in a regular relational database using SQL. Because there will soon be too many JOINS. One JOIN for each relationship traversed. For more information on this topic, please read about the “object relational impedance mismatch” in an article on internet.

Data with a network like structure and lots of relationships are in general suitable to store in a graph databases.

The Graph Query Language was just recently adopted as an international ISO standard: GQL. GQL is the only ISO standard database language besides SQL. This means that it is now possible to use graph databases using an open international standardized query language, the GQL.

You will find lots of information about the new GQL ISO standard on the internet (links are not allowed in posts here).

The GQL is based on openCypher which is already used by several open-source and proprietary graph databases, such as Memgraph for example.

An algorithm for storing IFC data in graph databases using Cypher was published in a scientific article called:

“IFC-graph for facilitating building information access and query”.

You will find this article on the internet.

However, the algorithm in that article is only for storage and should be complemented with an algorithm for “round-tripping”. I also made an implementation of this algorithm using a newer driver for python. You will find this algorithm on Github if you search for “IFC graph Github” or similary.

Summary: I suggest using graph databases and ISO GQL as a new and more efficient way of persisting IFC data (besides STEP).

  1. Algoritms for converting EXPRESS schemas to class definitions in new OOPL would be great.

Often C++ have been used for IFC. The EXPRESS schema have been developed in, or converted into data models in C++. There exists algorithms for converting EXPRESS schema definitions into C++ class definitions.

Most developers want to use their favourite OOPL. EXPRESS is just a way to communicate the data model in a machine readable way. Nobody uses EXPRESS directly to get an application running.

Today there are many new object oriented languages besides C++. But to get started using IFC in any OOPL, there first has to be a way to convert the IFC EXPRESS schemas to class definitions this OOPL.

Such algorithms are lacking for most new modern OOPL. Hence IFC can not be used in most enw modern OOPL.

Few organisations have interest or resources to start implementing an EXPRESS to OOPL converter, before starting the real project in the OOPL of their choice.

Why not publish standardized methods to convert the IFC EXPRESS schema definitions into class definitions in the most popular modern object oriented languages?

Or why not just publish IFC as class definitions in some object oriented languages, directly?

Summary: buildingSMART could facilitate the conversion of EXPRESS schemas into class definitions in many new OOPL, such as: Python, Java, Typescript, C#, Ruby, Go, Rust

Hi Martin,

That paper introduced some fundamental ideas but is not the state of the art of the IFC5. The group has already progressed since it was published three years ago.

  1. It is important to use a machine readable non-proprietary openly available schema definition language that can be parsed using non proprietary tools

Agree!

Of course, the IFC5 schema will be in a machine-readable form. The intentions were to make it technology-independent: Towards a technology independent IFC · buildingSMART/NextGen-IFC Wiki · GitHub.

The way the IFC4.3 is managed today, the source definitions are defined in UML/XML and markdown files which can be serialized into EXPRESS (GitHub - buildingSMART/IFC4.3.x-development: Repository to collect updates to the IFC4.3 Specification). And no, UML isn’t proprietary.

Yes, we are well aware of the Junxiang Zhu paper and in direct contact. The graph representation of the data is considered, along with Linked Data, ECS architecture (opposed to OOP) and Universal Scene Description (USD) for coupling geometry representation. This presentation from @gschleusner answers some of your questions: https://www.youtube.com/watch?v=GgN1he00dpc

cc: @berlotti, @aothms

I understand that UML is not proprietary. UML is an ISO-standard.

However, I have always thought of ISO UML as a visual language. And hence ISO UML is not machine readable.

Of course there is image recognition, but this technology is too complicated for most people.

I know there are UML modelling software that saves UML diagrams as text files.

But I do not know of any open and free and well established international standard for serializing visual UML diagrams into text files.

I would thus like to ask:

  • In which open and free text-based file format is UML used and serialized, before it is converted to EXPRESS?

  • Or in which open and free text-based file format would UML be used in the future?

I would say the opposite: UML diagrams can be visualized, but they are primarily text-based. I’m not sure about the details, but I think 4x3 uses XMI (XML Metadata Interchange) for that.

Nevertheless, it’s not decided how IFC5 will be expressed.

According to my knowledge UML (as defined in the ISO standard) is only a visual language (of diagrams, symbols, drawn tables, arrows et.c.)

Yes, UML can be represented by text/code. XMI is a text based language, and might be used to describe UML diagrams. But on wikipedia, I can read the following about using XMI to represent UML:

“For diagrams, the Diagram Interchange (DI, XMI[DI]) standard is used. There are currently several incompatibilities between different modeling tool vendor implementations of XMI, even between interchange of abstract model data. The usage of Diagram Interchange is almost nonexistent. This means exchanging files between UML modeling tools using XMI is rarely possible.”

See page for “XML Metadata Interchange” in English Wikpedia.

This means that XMI in reality can not be used as an open text-based machine readable format for sharing information about data models or UML diagrams.

Each software uses its own flavour of XMI that can not be opened by other software that claim to be able to use XMI.

And if this software is a proprietary software, then I guess that you will have to use that particular proprietary software, to open that XMI file, with that flavour of XMI.

XMI is supposed to be an open standard, and yes it is published as a standard - but in reality (according to Wikipedia) it is not implemented to work in that way. Exchanging XMI data between modelling tools, is rarely possible.

My concerns about UML not beeing machine readable thus remains. I also have concerns about XMI not beeing open and free in practical use, instead beeing tied to a specifik modelling tool product.

@aothms
@berlotti

Having worked on the topic extensively, I can provide informed comments on the UML question. UML might appear to have diagrammatic notation first and XML serialization second, whereas EXPRESS could be seen as having textual syntax first, diagrammatic notation second. In fact, both UML and EXPRESS are ISO-approved formal modelling languages. Historically, UML had a strong emphasis on its graphical notation and most people learn only the graphical notation and with that a litte bit about the semantics of the language. Nowadays, however, the graphical notation is an appendix only as is the XMI serialization. EXPRESS on the other hand, had the graphical notation (EXPRESS-G) always in the appendix, kind of as an addon and the semantics are intertwined with the syntax.

I think there is no doubt that a graphical notation is not sufficient for documenting a model specification - usually the graphical notation is not as expressive as the textual notation - it only covers a part of the model specification and that goes for both UML and EXPRESS. When it comes to the semantic model and syntax, both languages have their strengths and weaknesses. UML is much more powerful, but also more complex (potentially complicated) compared to EXPRESS - with the 4-tier MOF (meta model layers), XMI for serialization of models on all layers, from the instance to the meta-model.

There is both proprietary as well as free and opensource software for both UML and EXPRESS. I think the impression that UML is “proprietary” stems from issues with model exchange because of different dialects and additional information in the models. But in fact, the exchange of the actual models in UML works pretty well. The issues mainly originate from the additional information related to the diagrams which are based on the model and illustrate certain parts and aspects of the model. There is a specification for exchanging “the purely graphical aspects of UML models that modelers have control over, such as the position of shapes on a diagram and line routing points”, but the UML-DI (diagram interchange) is not so well-implemented as to provide good interoperability. However, in the context of the IFC specification we are mainly interested in the actual model, not in the diagram.

2 Likes

Thanks for an interesting reply.

Yes, I agree, we are interested in the data model. Not in the diagram.

I am afraid that the complexity of XMI is a problem for implementers of parsers and converters.

How can developers of XMI parsers and converters know, what parts of the complex XMI spec that must be implemented, and what spec parts are unnecessary to implement?

A feeling that exchange of the XMI data model “works pretty well” sounds nice, but it can in reality also be a bit scary, with such fuzzy logic.

If not every aspect of the IFC data model represented as UML/XMI, can be correctly parsed and converted to an OOPL data model, then lots of real IFC data will eventually get corrupt…

…by all software that uses a data model created using a specific XMI parser and converter.

Before choosing the UML/XMI-path, maybe it is a good idea to:

  1. make sure that there are no data loss of the part of the XMI spec that covers the data model when transfereing XMI data between data modelling software.
  2. make sure there exists XMI parsers and converters, that can convert the IFC data model from UML/XMI generated in any software, to a variety of OOPL data models,
  3. verify that those parsers and converters actually cover the whole IFC specification and gives correct results,

Otherwise maybe it is better to just publish the IFC in data models of some common OOPL directly: No parsing, no conversion of the data model needed. No data loss, no corrupt data.

And a last question: Is it possible to publish the IFC data model using XMI, but only using those parts of XMI that covers the data model, not using any part of XMI that relates to visualizations of diagrams?

(because as I understand your reply, those parts of XMI that relates to visualization of diagrams tend to vary more, depending on the software tool that generated the XMI).

@tauscher