Implementation Guidance page added to technical.buildingsmart.org

I have just updated the bSI Technical site, adding a new page under the Resources section, called “IFC Implementation Guidance”:

It lists the latest versions of the official published resources that the Model Support Group has authored.

I’ll also be adding a new page with a listing of all the official Implementation Agreements for IFC2x3… but please be patient as this will take some time.

4 Likes

Thanks @jwouellette

Regarding to IfcXML I asked a question here which I think can improve it

CC: @TLiebich

Great job, Jeffrey!

I’m in the process of formatting the Implementation Agreements and getting them posted to the new Standards Server with links from the Technical website. Be on the lookout for this to be up and running before the end of the month.

1 Like

Can the IFC GUID page be updated to describe the actual procedure to base64 encode a GlobalId? See this thread describing the difference between a standard base64 encode and the IFC base64 encode, which is different.

@Moult I’m going to need more input/guidance from the @Model_Support_Group (like @TLiebich, @jonm, or @mweise) before I can do that. Hopefully they’ll see this soon and respond.

2 Likes

Also pinging @aothms since he did respond on the discussion thread (perhaps his wording can be used verbatim, as he described it pretty succinctly?), and is an MSG member, and also his Python implementation is already linked to on that page anyway :slight_smile:

There are at least two more code examples:

I am not sure if there has been any better documentation about GUID compression in the past. I guess it was seen as a straight-forward bit encoding task.

My short explanation: A 128 bit UUID (32 hex characters -> 32 by 4 bit) needs to be regrouped into 22 by 6 bit, whereas the regrouping starts with the rightmost bit. As 128 can not be divided by 6, the leftmost character will be based on two bits only, leading to characters 0, 1, 2 or 3 (as already discussed).

Does that explanation help? I am fine with adding additional information to the documentation or the buildingsmart website.

Clarification of the UUID generation algorithm is another topic.

@mweise, it sounded like a straightforward bit encoding task, but as you can see when implemented using completely standard libraries as a one-liner in Python, the results are different (due to the order of bits). A lack of clarification of byte ordering results in this problem. You can see the code that shows standard libraries output something else here.

You can see that the Wikipedia article on base64 encoding also specifies:

Groups of 6 bits (6 bits have a maximum of 26 = 64 different binary values) are converted into individual numbers from left to right

This base64 conversion website also confirms that encoding is left to right.

Therefore, a clarification that the base64 encoding deviates from the standard and instead is done right to left should be documented.

@aothms 's explanation also specifies some rules which are clearly not as simple as a standard base64 encode, which is what the documentation currently implies.

In addition, clarification on which UUID generation algorithm is preferred or suitable should be made on that page. I have seen IFC files created where the UUID is clearly an incrementing number and not random. I believe this is not a good thing, as it creates potential for collision in a future where we might have city-scale BIM datasets, but without any mention of it in the official buildingSMART documentation, I have no official source to turn to.

As a bonus, an explanation on why buildingSMART decided to use a non-standard base64 character set would be good. It always seemed very strange to me since base64 has been around forever.

Attached figure nicely describes the principle described above (found on the German wiki website: https://de.wikipedia.org/wiki/Base64). Main question is where to fill-up missing bits to make it dividable by 6.
Base64-de

Regarding the character set: I cannot really comment on this. I guess there are historical reasons dating back to the beginnings of IFC. You have already posted a simple mapping algorithm between different Base64 character sets. So, it seems to me a minor issue.

Regarding the UUID algorithm: My understanding is that it should follow the rules as documented in ISO/IEC 9834-8:2005. However, I do not know which version according to RFC4122 should be used or if we should enforce one specific version.

The standard is left to right. Therefore if the base64 standard is followed, the last character of the 22 character string should have 4 variations.

For some reason, buildingSMART’s implementation does it right to left (hence the first character has 4 variations). This goes against the standard implementation in programming languages.

For using the Base64 standard (https://tools.ietf.org/html/rfc4648) the following procedure should work:

  • add two zero bytes at the beginning of the UUID
  • convert using standard Base64 conversion
  • strip the first two characters (should be AA always) from 24 character result to 22 characters
  • map from characterset ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ to the characterset used by buildingSMART 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_$

Example (UUID in hex notation and without minus separation characters):

  • DA85046CCE8A4226A38D732788C5C1E7 must be converted to 0000DA85046CCE8A4226A38D732788C5C1E7
  • base64 conversion will result in AADahQRszopCJqONcyeIxcHn
  • remove first two characters to get DahQRszopCJqONcyeIxcHn
  • mapping the characterset: 3....

Would that help if we add to the documentation?

Replacing the 22 GUID by some other solution needs more discussion with the community.

1 Like

@mweise, yes, your procedure describes the process correctly to my understanding.

I compressed your points into three points to use the non-standard character encoding in the second step immediately, as alternative alphabets is a known process:

  • add two zero bytes at the beginning of the UUID
  • convert using standard Base64 conversion, using the character encoding 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_$
  • strip the first two characters (should be 00 always) from the 24 character result to get a 22 character string

Would you be be comfortable with this wording @mweise?

I would recommend that the bullet points should be added after this existing sentence on that IFC GUID page, and also for the sentence to be slightly modified (I have bolded my edits):

Given that each IFC object instance required a unique identifier containing a 128-bit number, a custom compression algorithm using the RFC4648 base 64 standard with an IFC-specific character encoding was devised as shown below:

  • bullet points go here

The word “custom” and “IFC-specific” highlights the two areas where IFC deviates from the norm, which is why I have added them in. For completeness I also mentioned the RFC standard.

@jwouellette, after we confirm the revised wording with @mweise, would that be sufficient for you to make a modification?

Once that edit is made, the only step left is to confirm what the desired UUID generation algorithm should be, and then document that too, since it is currently unmentioned. :slight_smile:

I think the confusion is based on the misconception that the GUID encoding in IFC has too much to do with the base64 encoding specified in RFC4648 and other standards. They just use the same radix 64. Those binary-to-text encodings are made for binary streams of variable length. You know where to start, encode packet by packet and eventually may have to pad in the end. On the contrary, for the GUID, you have a single fixed-length number which is just expressed to a larger base. You would not right pad a numerical value, would you?

Think of the “encoding” just as representing the whole 128-bit number with a different radix. You then have

  • the binary representation with radix 2, which has length 128,
  • the decimal representation with radix 10, which has length 39,
  • the hexadecimal representation with radix 16, which has length 32,
  • the IFC GUID representation with radix 64, which has length 22.

So to convert from standard UUID “encoding” to IFC GUID “encoding” you just have to convert a number from radix 16 to radix 64, which should not take much. I have a two-line implemention to generate IFC GUIDS from UUIDS in GroovyIFC. Look at the method strongRandom(). You can also use the UUID bits and convert from radix 2 to 64.

I would like to keep things simple. Those base64 standards are currently not mentioned anywhere in the IFC4 specification and there is no need to add them. How about a reformulation that replaces the word “base” with “radix” and removes the confusing association?

I agree, however I am not sure if renaming to Radix-64 is more clear. It seems that this term is also used by some conversion algorithm. My proposal for the documentation is therefore:

  • mention that the conversion algorithm does not follow the Base64 encoding from RFC4648
  • provide some information how you can make use of the Base64 encoding (in case you do not want to deal with conversion yourself).

We may also should highlight the fact that the GUID is 128-bit number where you cannot add missing bits/bytes to the end.

The revised version from Moult sounds good and hopefully helps to avoid missinterpretation.

I am OK with either @mweise or @tauscher’s suggestions. Both are correct. Maybe a link to the Groovy implementation can be added too to help Java folks, since currently we only have a C# and Python link.

I definitely agree that it should be made explicit that the conversion does not follow the standard base64 encoding from RFC4648.

@mweise, as you represent the MSG, it is appropriate for you to make a decision on the final wording :slight_smile:

Bump - would be good to fix this issue, it is rather trivial to fix.

In addition, the Implementer Agreements page has many broken links (the top few work, but as you scroll to the bottom, many links are broken). It would be good to fix this :slight_smile:

Someone needs to type up a “final and clean” version and let me know what existing text is being deprecated. Then send it to technical@buildingsmart.org. Ideally, it is from @mweise @TLiebich or another MSG member who has conferred with the rest of the MSG.

I have fixed the IA list (there weren’t that many, maybe 3). Thanks for catching though, @Moult. I just put it up last night, but have not made a public notice yet. There is still some QA/QC to do, and maybe adding some agreements I don’t know about, but it should be a good enough reference for now. If anyone sees any other issues, please note it here: https://forums.buildingsmart.org/c/site-feedback/

@jwouellette Thanks! I still see broken links, though, for example CV-2x3-149. It looks like any link which links to a “page*.html” page is broken. Can you confirm?

Oy… you are right. Far worse than I thought. The fixes have been made.

The offending authoring tool will be tried, summarily sentenced, taken out back and shot. Leave it to me to trust a piece of software to try to automatically catch user… oversights… ;-}P>

1 Like