IfcGloballyUniqueIds spec description is incorrect - proposal to simplify

A little bit of investigation shows that this problem exists with a lot of implementations. Here’s a list that I have tested so far:

  • IfcOpenShell
  • FreeCAD
  • GeometryGym
  • ArchiCAD
  • Revit

For example, this is an IfcGloballyUniqueId string produced by Revit: 18mPNPiNXBUv50hwee2Yod. Because it ends in d, it doesn’t seem to be base64 encoded, for example: 18mPNPiNXBUv50hwee2Yod, 18mPNPiNXBUv50hwee2Yoe, 18mPNPiNXBUv50hwee2Yof, (where the last character is in the set {WXYZabcdefghijkl}) etc all resolve to the exact same 128-bit number, which increases the likelihood of collisions. This is somewhat contrary to the purpose of a UUID.

I have written a short reference implementation in Python, which is platform independent.

import base64
import uuid
from string import maketrans

class IfcGloballyUniqueId:
    def __init__(self):
        self.b64_charset = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
        self.ifc_charset = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_$'

    def generate(self):
        return self.encode_uuid_to_ifc_guid(uuid.uuid4().bytes)

    def encode_uuid_to_ifc_guid(self, uuid):
        return base64.b64encode(uuid)[0:22].translate(
            maketrans(self.b64_charset, self.ifc_charset))

    def decode_ifc_guid_to_uuid(self, guid):
        return uuid.UUID(bytes=base64.b64decode(guid.translate(
            maketrans(self.ifc_charset, self.b64_charset)) + '=='))

Here’s a little demo for generating a GUID, starting from some other agnostic UUID (say from a stored DB of BIM data), and going there and back again.

# Note how possible ending characters are 0, G, W and m
ifc_guid = IfcGloballyUniqueId()
print(ifc_guid.generate()) # 9Fs5IkRuIgIQsQaT_sdI00
some_uuid = uuid.uuid4()
some_ifc_guid = ifc_guid.encode_uuid_to_ifc_guid(some_uuid.bytes)
some_uuid_again = ifc_guid.decode_ifc_guid_to_uuid(some_ifc_guid)
print(some_uuid) # 316d658d-6db1-43c2-b9bf-c3f5104ac16b
print(some_ifc_guid) # CMrbZMsnGyAvlyFr44h1Qm
print(some_uuid_again) # 316d658d-6db1-43c2-b9bf-c3f5104ac16b

To resolve this fully, there are a few options:

  1. Implementers will have to follow the spec and use standard base64 encoding. Unfortunately, This will break existing IFC middleware that use the GUID in this fashion
  2. Simplify the specification, and then update implementations:
  • Use the standard b64 charset
  • Specify treatment of the null GUID (already questioned by @lassi.liflander and @jonm)
  • Specify UUID v4 to maximise randomness.
  • Maybe consider removing the b64 encoding? Just make it a UUID string.
  1. Change the spec completely and just say that the IFC GUID is a random 22-character string where characters are a subset of the character set. This is kinda bad because it doesn’t use the UUID standard which is the whole point of preventing collisions and making things unique, and it also encourages developers to go rogue and use weak randomness algorithms which increase collisions.

My own recommendation is to do option 2, but keep the b64 encoding (to match format of older IFC files for legacy reasons).

Thoughts?

1 Like