XSD Optional Attributes

Devon has raised an issue about optional XSD attributes here:

This has been puzzling me too, but the best explanation I’ve heard/identified is that by making all attributes optional, you can use xml validation tools when taking advantage of xml href for common references. Not this isn’t specific just to this attribute, I can see in IfcDoc all attributes are forced to be optional.

Below is a sample of ifcxml. Note that the properties are defined within the first IfcRelDeclares, and then referenced within the property set template definition. If the GlobalID attributes is required, then notepad++ validation tool fails this. I’m not an expert in xml, but I assume there has to be a better way to handle this?

<?xml version="1.0" encoding="utf-8"?>
<ifcXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.buildingsmart-tech.org/ifc/IFC4x1/final" xsi:schemaLocation="http://standards.buildingsmart.org/IFC/RELEASE/IFC4_1/FINAL/XML/IFC4x1.xsd">
	<header>
		<time_stamp>2019-08-01T02:09:24</time_stamp>
	</header>
	<IfcProjectLibrary GlobalId="0xNUMqo4v2JQgvm1eG8byl" Name="DemoLibrary">
		<Declares>
			<IfcRelDeclares GlobalId="1p$MsXBQP4fRog0yDp8OSD" Name="Properties">
				<RelatedDefinitions>
					<IfcSimplePropertyTemplate id="MyProperty1_3eWOEVX55F2v9g_SnOYB3V" GlobalId="3eWOEVX55F2v9g_SnOYB3V" Name="MyProperty1" TemplateType="p_singlevalue" PrimaryMeasureType="IfcLabel" />
					<IfcSimplePropertyTemplate id="MyProperty2_104NlF0cH7JhA_NN2cERA" GlobalId="104NlF0cH7JhA_NN2$cERA" Name="MyProperty2" TemplateType="p_singlevalue" PrimaryMeasureType="IfcLabel" />
				</RelatedDefinitions>
			</IfcRelDeclares>
			<IfcRelDeclares GlobalId="2bm9EKgi9Cxh48FZsSQWev" Name="PropertySets">
				<RelatedDefinitions>
					<IfcPropertySetTemplate GlobalId="01h32OutHFgxkDsa4HdM$w" Name="MyPropertySet" TemplateType="pset_occurrencedriven" ApplicableEntity="IfcSystem">
						<HasPropertyTemplates>
							<IfcSimplePropertyTemplate xsi:nil="true" href="MyProperty1_3eWOEVX55F2v9g_SnOYB3V" />
							<IfcSimplePropertyTemplate xsi:nil="true" href="MyProperty2_104NlF0cH7JhA_NN2cERA" />
						</HasPropertyTemplates>
					</IfcPropertySetTemplate>
				</RelatedDefinitions>
			</IfcRelDeclares>
		</Declares>
	</IfcProjectLibrary>
</ifcXML>

There was this discussion in Github where we had a proposal with minOccurs and maxOccurs.

Thanks both. These links do a good job at capturing the problem. How to go about fixing it?

  1. Do nothing. There may be good reason for the existing spec being the way it is.

  2. Use minOccurs=1 on required attribute references like GlobalId
    Pros:

  • (Appears to be a) relatively small change to schema output
    Cons:
  • Conceptually unclean because use=“optional” attributes aren’t really.
  1. Use xsd:unique and xsd:keyref in place of ID/IDREF
    Pros:
  • More flexibility in joining elements vs ID/IDREF
    Cons:
  • key maps need to be “in scope” of the elements that use them, meaning most of the keys would probably need to bloat something like uos:ifc
  • can’t make use of substitution groups afaik
  1. Use the approach outlined in ISO 10303-28, with either early or late binding, to create distinct types for references.

Late binding (ISO 10303-28, Section 7.4.4):


Early binding (ISO 1303-28, Section 8.1.2):

Pros:

  • Avoids conflicts between required attributes because instance declaration and reference are made from separate types.
    Cons:
  • Every entity now has a corresponding reference type, increasing the size of the schema.

The approach suggested in ISO 10303-28 seems reasonable. What were the design decisions against adopting this approach?

Other perspectives/ideas?

Thanks! – Devon

the original reason to transform all mandatory IFC EXPRESS attributes with simple datatypes, like GlobalId, into optional XSD attributes was, that otherwise, there was always an error, when using the EXPRESS class / xsd Element with xsi:nil=“true”.

Note: in the resulting ifcXML there is always the choice to either use an xs:element by reference of by value - check: ifcXM4 methodology at p. 13

When having an by-reference as in e.g.:

<IfcRelAggregates GlobalId="2YBqaV_8L15eWJ9DA1sGmT">
  <RelatingObject xsi:nil="true" ref="i1895" xsi:type="IfcProject"/>
  <RelatedObjects>
    <IfcBuilding xsi:nil="true" ref="i1928"/>
  </RelatedObjects>
</IfcRelAggregates> 

then having e.g. GlobalId as mandatory attribute would have generated a parsing error here, complaining:

  • if IfcBuilding xsi:nil=“true” ref=“i1928”/, then that the mandatory attribute GlobalId is missing, and
  • if IfcBuilding xsi:nil=“true” ref=“i1928” GlobalId=“2YBqaV_8L15eWJ9DA1sGmF”/, than that the additional attribute GlobalId is not expected, when using xsi:nil

Note: this is based on work/experience back in 2013 - so I would be happy to hear any better proposal or experience that we could use to improve the situation.

@devonsparks - it seems that you are quoting the previous version of ISO 10303-28, ifcXML is based on the second edition from 2007, using the “configured xml-schema binding” to obtain a more “xml-alike” result.

Thanks @TLiebich. That helps clarify things. I was referencing the older spec (2001). The 2007 release (renewed in 2018) is more complete. Is it possible to work around the optional/mandatory issue by following the 10303-28:2007 spec more strictly?

Following the “default XML binding” requirements of *-28:2007, we can map EXPRESS Entities to xs:complexTypes and their attributes to xs:elements. As an example, here’s a toy Entity, MyElement, that has a required EXPRESS attribute GlobalId.

<xs:element name=“MyElement” type=“ifc:MyElement” abstract=“false” substitutionGroup=“ifc:Entity” nillable=“true”/>

<!-- MyElement -->
<xs:complexType name="MyElement" abstract="false">
    <xs:complexContent>
        <xs:extension base="ifc:Entity">
            <xs:all>
                <!-- 
                    IfcGlobalId is not an EXPRESS Entity, so it is handled by ISO 10303-28:2007, clause 7.6.3.1 
                    
                     "If the data type of the EXPRESS attribute is not an entity data type, the data type of the accessor
                      element shall be declared to be the XML schema data type that corresponds to the data type of the
                      EXPRESS attribute."
                -->
                <xs:element name="GlobalId" type="ifc:IfcGloballyUniqueId" />
            </xs:all>

        </xs:extension>
    </xs:complexContent>
</xs:complexType>

We can then make another EXPRESS Entity, MyContainer, that relates to zero or more MyElements by way of an “Elements” attribute:

<xs:element name=“MyContainer” type=“ifc:MyContainer” abstract=“false” substitutionGroup=“ifc:Entity” nillable=“true”/>

<xs:complexType name="MyContainer" abstract="false">
    <xs:complexContent>
        <xs:extension base="ifc:Entity">
            <xs:all>
                <xs:element name="Elements">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:group ref="ifc:MyElement-complexEntity-group" minOccurs="0" maxOccurs="unbounded"/>
                        </xs:sequence>
                        <xs:attribute name="ref" type="xs:IDREF" use="optional"/>
                        <xs:attribute ref="ifc:itemType" fixed="ifc:MyElement"/>
                        <xs:attribute ref="ifc:cType" fixed="list"/>
                        <xs:attribute ref="ifc:arraySize" use="optional"/>
                    </xs:complexType>
                </xs:element>
            </xs:all>
        </xs:extension>
    </xs:complexContent>
</xs:complexType>

We add some other bits required by *-28:2007, like complexityEntity-group types for each Entity to support constraint handling over subtypes, but that’s the gist. The full sample xsd is here.

Then we can use the schema to write MyElement instances that respect both the GlobalId attribute requirement and call-by-reference:

 <MyContainer>
    <Elements>
        <MyElement>
            <GlobalId>0MFlxN6vfEZBQQP9VvN2Cg</GlobalId>
        </MyElement>
        <MyElement ref="e1" xsi:nil="true"/>
    </Elements>
</MyContainer>

<MyElement id="e1">
    <GlobalId>0MFlxN6vfEZBQQP9VvN2Cg</GlobalId>
</MyElement>

Full sample here.

Removing GlobalId from the definition of MyElement will trigger a validation error, but you can still call MyElements by reference. This assumes the default XML schema binding of the spec, but the “configured” binding (clause 8.x) has similar provisions. The spec also has defined strategies for handling uniqueness constraints, keyrefs, and other handy stuff.

Some questions around this:

  1. Would BuildingSmart ever consider updating the ifcXML schema to strictly follow ISO 10303-28:2007? It seems to have figured out a lot of the hard stuff in a reasonable and well-documented way.
  2. If so, do we know (or know someone who would know) of an existing compiler for EXPRESS to ISO 10303-28:2007 STEP-XML? That would let generation of the XSD be (fully?) automated, so changes can be made directly to the EXPRESS schema.
  3. If there is no existing compiler, is there an appetite to write one? The spec provides sections describing how every supported EXPRESS feature is mapped, along with examples we could use to build a test suite. Leveraging the work @ian’s done with IFC-gen might be a starting point, as we could map clauses of the spec to ANTLR listeners - one for each EXPRESS feature.

Thoughts?

Hey Devon,

Thanks for posting. I ran a quick test on your proposal and it seems like an excellent strategy to me.

At present, there is a master definition of the data model which is then published to express and xsd within the ifcdoc tool. You can find the xsd generation code here: https://github.com/buildingSMART/IfcDoc/blob/master/FormatXSD.cs

This should be more carefully reviewed and implemented ASAP in my opinion.

Cheers,

Jon

@devonsparks and @jonm

a few remarks: the conversion between express and xsd is according to ISO 10303-28:2017 (with a work around regarding mandatory express attributes mapped into xs:attribute). And the generation of all xsd schemas published for IFC are always done fully automatic (there is no “hand-written” conversion).

  • it uses ISO 10303-28:2017 chapter 8 “configured XML Schema Binding”
  • the way how EXPRESS attributes are mapped into XSD is governed by the configuration “exp-attribute” as explained in clause 8.6 “XML Schema declarations for EXPRESS attributes”
  • when the configured mapping for IFC4 was created in 2013, the goal was to have rather compact represenations, and therefore the choice was to map by default all attributes having simple data type (beside string) into xs:attribute using exp-attribute=“attribute-content”

The configuration file, that governs the mapping into XSD is published for each IFC schema, for the latest IFC4.0.2.1 it is here.

The line:

<cnf:option inheritance="true" concrete-attribute="attribute-content" naming-convention="preserve-case" generate-keys="false"/>

defines it globally for all express attributes in one schema, see: 10.2.7 “concrete-attribute” in 10303-28. if we would change it to

<cnf:option inheritance="true" concrete-attribute="attribute-tag" naming-convention="preserve-case" generate-keys="false"/>

then the result would be as @devonsparks proposes:

<MyElement>
     <GlobalId>0MFlxN6vfEZBQQP9VvN2Cg</GlobalId>
</MyElement>

and it should run automatically for all 10303:28:2017 aware parsers.

question again is how to deal with upward compatibility. Allthough there are still far less ifcXML files around compared to IFC step files, it would still cause an issue.

to 1) see above
to 2) Jotne edm-developer has the capability to map according to ISO 10303-28:2017, and also steptools st-developer (maybe there are some restrictions, so it should be tested). In general, those tools also conver data files from *.stp or *.ifc to *.xml
to 3) as @jonm wrote, the ifcDoc tool has code to automatically convert to xsd but also to convert sample data files from *.ifc to *.xml. So if there is appetite, one option is to start here.

Thanks @TLiebich. Am I right to think the XML schema output from IfcDoc doesn’t fully conform to 10303-28:2007? I couldn’t find references to the required complexEntity-groups or keyrefs, for example. I ask because (unless I’m reading the spec wrong, which is possible :slight_smile:), strictly conforming to 10303-28 output should solve the optional/mandatory issue at the start of the thread regardless of the configuration chosen.

8.6.3 - Accessor Attributes - says:

So to conform to the configured form of *-28, attributes like GlobalId would need to include use=“required”. That introduces the mandatory/optional issue we’ve been discussing. But another requirement in 8.6.3 seems to clean up the mess:

That means the value of attributes holding other entities have to be id-references. So in Jon’s example above, HasPropertyTemplates would refer to a Seq-IfcSimplePropertyTemplate Entity with a corresponding Id, not include the Entities inline. This turns the whole XML file into a big relational soup, but it ensures required attributes are only defined once on the Entities they describe, with associations mapped to attribute IDREFs.

Some ideas for next steps:

  1. If anyone has access to a strictly conforming EXPRESS -> 10303-28:2007 compiler (edm? st-dev?), run the IFC4x2 spec through it and share the resulting XSD. This only needs to be done once for each IFC release. The hope would be that a single license for one of these tools would be better/faster/cheaper than rolling something new. If nothing exists on a market, consider writing it.
  2. Pick one binding configuration from 10303-28:2007, conform to it strictly, and make it the de facto serialization format for ifcXML. All new IFC releases could generate their XML schema by running through (1). The bet here is that 10303-28 will change slowly, limiting breaking changes between IFC releases. Developers shouldn’t have to tune their XPaths per IFC release.
  3. For every breaking change to the serialization format, consider delivering an XSLT/XQuery script that can migrate instance files from the last format version. For example, if (2) were adopted, an XSLT would be released to migrate IFC4-xml instance files to the new format. Chaining these migration scripts using something like XProc effectively produces a “migration service”, so that the poor folk in 50 years who discover ifcxml files on a ZipDisk in basement somewhere have a strategy for data recovery.

The net of the proposal is to follow 10303-28:2007 verbatim in ifcxml to ensure stronger conformance against the EXPRESS data model. The missing piece is a conforming compiler. I do hope it’s out there, but eager to continue discussions on improving schema integrity even if it isn’t.

Thanks!

Hi, thanks for all the indepth analysis. The intent was that ifcXML XSD is conforming to ISO 10303-28:2007, but surely there are some glitches.

The ISO document is actually difficult to read and not always clear (at least in my view), if you look into 8.6.4 Accessor elements, you find:

If exp-attribute=“double-tag” applies to the attribute, the XML data type of the accessor
element shall be as specified in 8.6.4.3. (note: “double-tag” is the default option).

in 8.6.4.3 you read “This is the default mapping of an EXPRESS attribute whose data type is an entity data type.”, and:

The XML data type of the accessor element shall be an anonymous complexType, of the form:

<xs:complexType>
  <xs:sequence>
    <element-or-group ref="instance-name"/>
  </xs:sequence>
</xs:complexType>

which is exactly what we used (as default mapping for all aggregates of entity data types). The reason was and is that the resulting ifcXML file should not an ID/IDREF only xml.

But coming back to the original question - in my personal view, the best solution would be to change the mapping of all attributes not having an entity (and select) data type into xs:elements (instead of xs:attribute) using concrete-attribute=“attribute-tag”. But then we need some XLST scripts to cross convert and a release strategy about when to introduce the change.