I am reading the specification, in particular how a
IfcGloballyUniqueId are generated.
I am curious because it’s a rather concise and simple description: generate a 128-bit number, and base64 encode it using the charset provided, and give a resultant 22 character string.
I have a few concerns.
Firstly, given that it is a 128-bit number, and UUIDs are indeed 128-bit, it would make sense that I’d use one of the UUID versions (although the IFC spec doesn’t specify anything about this). I guess I could equally well create one on my head, say “1”, and call it a day. It doesn’t say if the Nil UUID should be treated specially, or if any particular UUID version is preferred for collision prevention.
Regardless, my 128-bit number translates into a 24-character base64 string (21*6+2 remainder), which means that there will be two
==padding. Given that the EXPRESS specification wants a 22-character string, it makes sense that it is probably because they want to truncate the
==padding. This is not explicit in the definition, but it is merely a sensible assumption. It would be good to make this clearer in the spec.
Finally, given that I have a 2 bit remainder, it means that my last (i.e. 22nd) base64 character will be padded by 4 bits. The four resulting possibilities are
110000. These translate to
48, and given the charset defined, are the characters
m. Therefore, any
IfcGloballyUniqueIdmust end in one of those four characters. Yet in the examples show values that end in other characters. Testing an implementation such as IfcOpenShell with
ifcopenshell.guid.new()also gives me values which don’t end in those characters.
It seems odd why IFC doesn’t use the standard A-Za-z0-9 + 2 special base64 charset but instead defines its own?
Am I misunderstanding the spec? Or are the examples / implementations wrong?
It also doesn’t help that GUID is a rather Microsoft-oriented term, it’ll be a little bit more politically correct to call it a UUID, eh