Use YEAR + ERA for XSD 1.0 calendar year numbering#1685
Use YEAR + ERA for XSD 1.0 calendar year numbering#1685gdesrosiers1805 wants to merge 2 commits into
Conversation
| case class DFDLDate( | ||
| calendar: Calendar, | ||
| override val hasTimeZone: Boolean, | ||
| prolepticBCE: Boolean = false |
There was a problem hiding this comment.
I'm not sure I understand the need for this proleptic flag. Doesn't the Calendar contain all the necessary era information in the ERA field?
We also still have a number of references in our codebase to EXTENDED_YEAR and patterns with uuuu--that feels like a bug to me. Should we instead be setting/getting the YEAR and ERA components everywhere and using yyyy in all of our patterns? And removing all references to EXTENDED_YEAR and uuuu?
So the idea is we let ICU parse things according to dfdl:calendarPattern. When we want to create a string for the infoset or in one of the path fn:year-from-date*() functions we get the YEAR and ERA fields, and if the ERA field is BC we negate the year. Whether or not the pattern has a G in it shouldn't matter I think?. And for unparse, we read a date from the infoset and set the YEAR to the absolute value of the year and set the ERA to BC if the year was negative. It is a little extra work (which it looks like you already have logic for in a few places), but it avoids the potential of confusing EXTENDED_YEAR with YEAR. Seems we don't really want EXTENDED_YEAR or uuuu appearing anywhere in our codebase, except for tests making sure 'uuuu' patterns work correct.
Note I think this change would mean that it's impossible to use dfdl:calendarPattern to model BC years using yyyy without a G indicator (which is maybe the original reason for switching to uuuu and EXTENDED_YEAR many years ago?). But I think that's correct. Many DFDL semantics are based on ICU, and that is just a limitation of the yyyy pattern. If you want a pattern that is of the form yyyy-MM-dd without a G era indicator, and you want negative years to indicate the era, then you must use uuuu with the assumption that your years are astronomical. If your years aren't astronomical, then DFDL can't currently model them, unless it's added a new feature to support negative yyyy patterns.
There was a problem hiding this comment.
So the prolepticBCE flag is not recording the ERA. It looks at which pattern is used, which is why it checks that the pattern does not contain a G. No G means there is no explicit era designator, so for a BCE value the era was carried by the sign and the year was parsed astronomically (the uuuu form). If it is BCE but the pattern contains G, the era is named explicitly and ICU parses it correctly, so the flag stays false and nothing is adjusted. But if it is BCE and the pattern has no G, the date is represented astronomically which has a year zero and so is off by one from XSD 1.0 for BCE, and the flag is set true to mark that the read-out needs an adjustment to match XSD 1.0. Perhaps a better name for that flag would have been astronomicalYear?
The need for that flag stemmed from the fact that I changed EXTENDED_YEAR to YEAR + ERA (for the most part, other than where special logic was needed to help correct the astronomical representation of BCE dates), but still kept the uuuu pattern as the default/implicit pattern. So if we change the default pattern to yyyy, it eliminates the need for that. The only issue, as you already mentioned, is that we won't be able to represent BCE years without the G marker, which seems to be the correct thing to do. So I'll update any BCE tests that previously used the implicit uuuu pattern to explicitly use a yyyy pattern with a G marker and era-worded input (e.g. BC), since negative years won't parse under yyyy on their own. This should simplify things — it removes the flag, the read-side and unparse adjustments, and the EXTENDED_YEAR/uuuu references.
b1e550a to
4ff33bc
Compare
stevedlawrence
left a comment
There was a problem hiding this comment.
Looks like, this feels liek the write approach. But I think we need to take it a bit farther and remove remaining patterns.
4ff33bc to
40cbb3a
Compare
Change date/dateTime year handling from astronomical numbering (uuuu/EXTENDED_YEAR) to year-of-era semantics (yyyy with YEAR/ERA), so years conform to XSD 1.0 (no year zero; 1 BCE = -0001). This covers the implicit/default calendar pattern, lexical infoset conversion, facet parsing/narrowing, and the binaryCalendarEpoch. Parsing, formatting, comparison, and XPath year extraction now read years via YEAR/ERA instead of the astronomical EXTENDED_YEAR. Facet conversion and the binaryCalendarEpoch route through DFDLCalendarConversion rather than hardcoded calendar patterns; the latter gains negative-year support. BCE years require an explicit era designator (G) with yyyy-based patterns. Explicit uuuu patterns still parse astronomically, but extracted years and infoset output are always rendered in XSD 1.0 numbering. DAFFODIL-3084
292e31e to
a5d067f
Compare
Replace the FacetTypes object and its (Facet.Type, Value) tuple approach with a sealed FacetOrdered trait hierarchy where each XSD facet type is its own class carrying both the original XML string value and its converted BigDecimal representation. Value conversion now happens once, eliminating the redundant String->BigDecimal->String->BigDecimal round trips described in the existing TODO comment. Narrowing logic moves into each facet class, replacing scattered helper methods in Facets.scala. NodeInfo.fromXMLString is used as the single source of truth for XSD primitive conversions in value-space facets. Also fixes two pre-existing bugs: - signum() != 1 incorrectly rejected maxLength="0" as invalid since non-negative integers include zero - Invalid facet values that previously threw raw NumberFormatException now produce Schema Definition Errors with facet name context Also removes dead code checkRangeReturnsValue and checkRange from RestrictionUnion.
a5d067f to
929958f
Compare
Change date/dateTime year handling from astronomical numbering
(uuuu/EXTENDED_YEAR) to year-of-era semantics (yyyy with YEAR/ERA), so
years conform to XSD 1.0 (no year zero; 1 BCE = -0001). This covers the
implicit/default calendar pattern, lexical infoset conversion, facet
parsing/narrowing, and the binaryCalendarEpoch.
Parsing, formatting, comparison, and XPath year extraction now read years
via YEAR/ERA instead of the astronomical EXTENDED_YEAR. Facet conversion and
the binaryCalendarEpoch route through DFDLCalendarConversion rather than
hardcoded calendar patterns; the latter gains negative-year support.
BCE years require an explicit era designator (G) with yyyy-based patterns.
Explicit uuuu patterns still parse astronomically, but extracted years and
infoset output are always rendered in XSD 1.0 numbering.
DAFFODIL-3084