Task to document known and unknown ID needs for IFC+ Data
Thanks @gschleusner ! Some possible questions and notes to start discussion.
- What objects within IFC and adjacent data standards should get a name/identity? How should those names be aligned across platforms and serializations, if at all?
- Should we align on a particular identification syntax (e.g., URIs)?
- How do users learn more about an object (e.g., an IFC entity or instance) given its unique name, or generate new names others can look up?
Unique names within IFC instances
There are at least 2-3 types of names/identifiers within IFC instance populations, depending how you count.
- The instance IDs local to the instance population (e.g.,"#84 = IFCPROPERTYSINGLEVALUE(…)"
in IFC-STP, < IfcPropertySingleValue id="i84" >…</…> in ifcXML). These are heavily used to describe the graph model of an instance population.
- The GlobalIds of all instances of IfcRoot or its descendants, which the spec describes as “Assignment of a globally unique identifier within the entire software world.”, represented as a unique 128-bit number in base64 (see encoding notes). All elements in the IFC Resource Layer, including things like IfcPerson and IfcOrganization, do not have GlobalIds, so must define their own identity attributes (e.g., IfcPerson’s Identification).
- With the release of Part 21v3, anchors and references allow us to map local instance IDs to URIs, linking IFC instance populations into the web’s global namespace.
There’s been some discussion about omitting local instance IDs from IFCJSON, and doing lookups only based on GlobalId. AFAIK, this means all Resource Layer instances would need to be copied by value (increasing file size) and leaves unresolved integration with P21v3. Conversation is ongoing, but figured worth mentioning here.
Persistent IDs for IFC instance files
There’s an ongoing discussion on Github about proposing a method to add persistent IDs to IFC-SFP files to support integration with CDEs. Several alternative proposals are described there, so linking to this conversation, as the two themes seem related.
Unique names within IFC’s Schema
At least in EXPRESS, Schemas, Keywords, Entities, Types, and Attributes are identified by name in a hierarchy, e.g.,
Where Universe usually corresponds to a specific release of IFC. Members of each Universe seem to map to different URIs depending on which IFC serialization you’re using. ifcOWL says the URI namespace for IFC4 is “http://ifcowl.openbimstandards.org/IFC4#”. ifcXML for IFC4 uses “http://www.buildingsmart-tech.org/ifcXML/IFC4/”. Should those be merged?
Most of the entities within BCF have unique IDs, although I couldn’t find much detail on their constraints. Is there any hard rules around what a “ProjectID” must be, for example?
See the XSD schema for examples here.
bSDD seems to use namespaceUris for identifying its assets. In the current build, the IFC namespace is different from that of both ifcXML and ifcOWL (e.g., http://bsdd.buildingsmart.org/a/buildingsmart/ifc-4.3/class/IfcDoor)
The newly minted ISO 23386 seems to put a heavily emphasis on Properties as first-class object types, each of which has a GUID. The spec is behind a paywall, so I haven’t been able to dig into what the constraints on those GUIDs are. Does anyone know?
The folks at GS-1 are doing neat work with their Digital Link standard, which adds URIs and linksets to existing GS-1 IDs like barcodes. This seems especially relevant for our industry, where persistent identifiers must span physical and digital representations of assets over a comparatively long lifespan.
There are surely related initiatives that have their own take on naming things. Feel free to add them.
@devonsparks That’s a good overview of existing identification methods!
The persistent ID discussion seems to target “files”, another interesting thing is uniquely identifying “object” revisions, and linking those together. Something IfcOwnerHistory also doesn’t seem to do in full.
@jbrouwer - re: persistent ID discussion: exactly
A specific question for discussion:
Should all IFC serializations and web services - including ifcXML, ifcOWL, and bSDD - use the same namespace URIs?
- ifcOWL uses http://ifcowl.openbimstandards.org/IFC4#”
- ifcXML uses http://www.buildingsmart-tech.org/ifcXML/IFC4/
- bSDD uses http://bsdd.buildingsmart.org/a/buildingsmart/ifc-4.3
If all serializations represent the same underlying data model of IFC (they do, right?) then shouldn’t the identifiers used to name parts of that data model be the same across the various serializations?
Not any form of solution, alas, but having GlobalId or GUID in itself is not enough. Yes, statistically it is unique (although in practice we encounter re-used IDs). And with a GUID or UUID alone, you don’t have any clue as to where to go look for it.
E.g. if we say a property is defined as UUID “123456789…” it is uniquely defined, but there is no way to know in which data dictionary you need to retrieve its other metadata and attributes.
Having an URI provides a method to also assign a location (where to find the resource), so you indeed need both of them. But where do we add the URI? At file level? Can it be different per object? Or even per property?
With ISO 23386 we get the concept of a Data Template and can refer to properties or characteristics which are defined and managed globally, in Property Servers and Data Dictionaries (in plural, as there can be many).
And after all of this, we don’t have clear directions on what the name of an object actually is… Is it the IFC GlobalId/Guid for the object? Is it the internal guid of the source software (sometimes exported as IfcTag)? Is it the human-readable Type or Occurrence Name? Or LongName? Or Description? Or Number? (as it differs a lot between classes).
Or a combination of different properties and attributes?
My thoughts went to an abstract list fist to point out the needs as people tend to conflate these concepts and talk about the technical implementation instead of the needed kinds of IDs first.
This is a start of kinds of IDS that building projects need …
My list starts to look like this;
Locally defined (The source application generates these IDs)
- Instance ID on an Instance of an element
- Type ID on an Instance of an element
- Type ID on a Type of an Element
Examples = Wall Instance, Wall Type , Owner History
There might be more of these that are not as clean.
Multiple Externally defined (The Source Application must reference externally defined IDs but must support multiple sources and multiple instances of each kind of ID. For example, our convention in the US is to list 3 potential products for each one item in a specification so an element should be able to have those 3 linked to an instance/ type if the project is in design
- Product Man. Global Instance ID on an element
- Product Man. Global Type ID on an element
- Product Man. Global Type ID on a Type
- Local Classification1 on Instance ID on an element
- Local Classification1 on a Type
- Local Classification2 on Instance ID on an element
- Local Classification2 on a Type
- Local Classification3 on Instance ID on an element
- Local Classification4 on a Type
- Local Classification3 on Instance ID on an element
- Local Classification4 on a Type
Globally Defined (The Source Application must reference externally defined IDs, that have a global definer - Like a BSI Data Dictionary Property ID)
- Global Type. Global ID for Property on an Instance
- Global Type. Global ID for Property on a Type
Example = “Fire rating” hosted by a data dictionary
Then there are the IDs that must exist on the “plumbing” of the data like concepts in IFC like “Element” that are not translations of real world concerns… that need to be added to this list.
There might be a better way to articulate these…
Excellent questions and points, @stefkeB.
GUIDs lose some of their value if they’re not discoverable. I’m in favor of normalizing GUIDs as URIs. Ideally we’d make those GUID URIs dereferenceable with delegation (this is how GS-1 Digital Link Resolver works, ala DNS), so you can dereference a GUID URI to learn more about it and its links to other assets. WebID offers some similar capabilities. By making all GUIDs URIs, we can leverage existing solutions to resource identification, discovery, and alignment. Are there any parts of the community that would be negatively impacted by such a proposal, assuming we can address backwards compatibility?
I gave some demos of how we can support URI identification of files, instances, entities, and entity attributes in a backwards compatible way at the BSi Summit last week. See my JSON-LD presentation and especially the appendix we didn’t have time to cover. See also a concrete example here, or on the JSON-LD Playground.
I use JSON-LD above, but the concept is general: if you define “base URIs” for instance populations and schema versions, you can then refer to all other identified assets using relative URIs, and build the full URIs as a postprocess (this is what the JSON-LD Processor is doing). I think (and hope) it’s a solvable problem in the short term if we want it.
re: what the name of an object actually is I’ve been thinking about this a lot too, especially as I review the public IFC5 documentation. The proposed “core” schema includes IfcName, IfcIdentifier, and IfcGUID types. I don’t understand the semantic difference between them for exactly the reasons you state. I’ve put out this question to the IFC5 Task Force directly.
Thanks @gschleusner! This breakdown, independent of technical approach, makes sense. Recasting in my own words to check my understanding:
We deal with a lot of assets - walls, doors, entity definitions, instances, files, projects - all of which need identities so we can talk about them unambiguously. Assets can have a canonical/global identities, as well as zero or more variant/local identities. Agents may look up assets by their canonical identity or local identity depending on the use case. Secondarily, assets can have relationships to other assets. “type” is one possible relationship; “hasClassification” might be another. This lets us build up networks of knowledge around unambiguous names (identities).
If we accept that relationships should also get unique identities (so we can distinguish between “type” in a data model and “type” in a lithography application), we end up with the RDF data model the web allows for, along with the rich ontologies (like the Simple Knowledge Organization System) built on top of it.