buildingSMART Forums

mvdXML Rule Grammar and utf-8?

I have got a question, the “Lexer Rules” in “Rule Grammar” don’t contain any language specific characters, such as (german) ß, ö, ä, ü, …, why? Why not utf-8 (or something similar)?

1 Like

I am not exactly sure what you are looking for. Could you provide an example on what you would like to achieve? UTF-8 is the default character encoding for XML and mvdXML is XML

For the case that you need some encoding, there is this:
UNICODE_ESC
: ‘\\’ ‘u’ HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT

I’m talking about lexer and parser rules for the string (f.e.) in template rule, provided by the buildingSmart mvdXML documentation. Just in case you want to write a parser for the string part in mvdXML.
You are right mvd is xml format but the string gets its extra grammar, doesn’t it?
So I’m just asking why not using utf-8 in general, just a question out of curiosity.

As I suggested with the snippet for UNICODE_ESC. You can find it in the lexer rules. In my opinion, this is not necessary (due to UTF-8 in XML), but it is available if you would like to encode any special characters.