Skip to content

Commit 5517b36

Browse files
committed
feat: enhance JSON schema generation to allow empty strings for optional fields
1 parent 5c2c5dd commit 5517b36

3 files changed

Lines changed: 178 additions & 10 deletions

File tree

src/extensions/score_metamodel/README.md

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,127 @@ JSON Schema cannot express:
8181
| `check_id_contains_feature` | `id_contains_feature.py` | Need IDs must contain the feature abbreviation from the file path |
8282
| `check_standards` | `standards.py` | Standard compliance link validation |
8383

84+
### Coverage comparison
85+
86+
Schema column: **yes** = implemented, **feasible** = could be added, **--** = not possible.
87+
88+
| Rule | Schema (`sn_schemas.py` + sphinx-needs) | S-Core metamodel (`checks/`) | Notes |
89+
|---|:---:|:---:|---|
90+
| ID required | yes | -- | `needs_id_required` (sphinx-needs built-in) |
91+
| ID basic regex | yes | -- | `needs_id_regex` (sphinx-needs built-in) |
92+
| Dead link detection | yes | -- | `allow_dead_links` (sphinx-needs built-in) |
93+
| Mandatory field presence | yes | yes | Both enforce `required` |
94+
| Mandatory field regex | yes | yes | Same pattern from metamodel |
95+
| Optional field regex | yes | yes | Schema: only if field present |
96+
| Mandatory link presence | yes | yes | Schema: `minItems: 1` in local |
97+
| Mandatory link target type | yes | yes | Schema: `validate.network` |
98+
| Mandatory link ID regex | feasible | yes | Can add `items.pattern` in local; TODO in code |
99+
| Optional link target type | feasible | yes (info) | Split into separate schema with `severity: "info"` |
100+
| Optional link ID regex | feasible | yes (info) | Same split-severity approach |
101+
| Mixed regex+plain link type | -- | yes | `ValidateSchemaType` has no `anyOf`/`oneOf` |
102+
| ID structure (parts count) | feasible | yes | Per-type pattern from `parts` field; cannot check file-path part |
103+
| Prohibited words | feasible | yes | Negative lookahead regex on `title`; less precise than Python |
104+
| Graph constraints | -- | yes | Cross-need traversals beyond JSON Schema |
105+
| Undefined extra options | -- | yes | `unevaluatedProperties` would reject sphinx-needs internal fields |
106+
107+
#### Rule explanations
108+
109+
**ID required** --
110+
Every need directive must have a manually set ID (e.g. `.. feat_req:: feat_req__my_feature__001`).
111+
Enforced by sphinx-needs' `needs_id_required = True` in `__init__.py`.
112+
113+
**ID basic regex** --
114+
The ID must match `^[A-Za-z0-9_-]{6,}` (at least 6 alphanumeric/underscore/hyphen characters).
115+
Enforced by sphinx-needs' `needs_id_regex` in `__init__.py`. The build stops if a need
116+
has an invalid ID.
117+
118+
**Dead link detection** --
119+
A link like `satisfies: nonexistent_need_id` that points to a need that does not exist
120+
triggers a sphinx-needs warning. Controlled per link type via `allow_dead_links` in
121+
`needs_extra_links`.
122+
123+
**Mandatory field presence** --
124+
A `feat_req` must have a `status` field. If it is missing, both the schema
125+
(`"required": ["status"]`) and the Python check flag it.
126+
127+
**Mandatory field regex** --
128+
The `status` field on `feat_req` must match `^(valid|invalid)$`. Both the schema
129+
(`"pattern": "^(valid|invalid)$"`) and the Python check validate this. Writing
130+
`status: approved` is rejected.
131+
132+
**Optional field regex** --
133+
`document` has `optional_options: { author: ^.*$ }`. If `author` is present, it must
134+
match the pattern. If absent, no error. The schema includes it in `properties` but
135+
not in `required`.
136+
137+
**Mandatory link presence** --
138+
`feat_req` has `mandatory_links: { satisfies: stkh_req }`. At least one target must
139+
be provided. The schema enforces this with `"satisfies": {"type": "array", "minItems": 1}`
140+
in `validate.local`.
141+
142+
**Mandatory link target type** --
143+
`feat_req.satisfies` must point to a need of type `stkh_req`. The schema enforces
144+
this with `validate.network`: each linked need is checked for
145+
`{"type": {"const": "stkh_req"}}`. If a `feat_req` links to a `comp` via `satisfies`,
146+
the schema rejects it.
147+
148+
**Mandatory link ID regex** (feasible) --
149+
`feat` has `mandatory_links: { includes: ^logic_arc_int(_op)*__.+$ }`. The link
150+
target IDs (strings like `logic_arc_int__something`) must match this regex.
151+
Currently the schema only enforces that at least one link exists (`minItems: 1`),
152+
not the ID pattern. *Feasible*: add `"items": {"pattern": "^logic_arc_int(_op)*__.+$"}`
153+
to the local schema. There is a TODO in the code for this.
154+
155+
**Optional link target type** (feasible) --
156+
`tool_req` has `optional_links: { satisfies: gd_req, stkh_req }`. If provided, targets
157+
should be `gd_req` or `stkh_req`. The Python check validates this but treats violations
158+
as informational (non-fatal). The schema currently skips this because all schema entries
159+
use `severity: "violation"` and there is no way to set a different severity for one
160+
rule within the same schema entry. *Feasible*: create a second schema entry for the
161+
same need type with `severity: "info"` that only checks optional link targets.
162+
163+
**Optional link ID regex** (feasible) --
164+
Same as above, but for regex-based link IDs on optional links (e.g.
165+
`optional_links: { links: ^.*$ }` on `tsf`). Same severity-split approach would work.
166+
167+
**Mixed regex+plain link type** (not possible) --
168+
`workproduct` has `optional_links: { complies: std_wp, ^std_req__aspice_40__iic.*$ }`.
169+
A `complies` target is valid if it is either a need of type `std_wp` OR has an ID
170+
matching the regex. The `validate.network` `items` schema applies to ALL linked needs
171+
identically, so it cannot express "match type X *or* match regex Y".
172+
sphinx-needs' `ValidateSchemaType` does not support `anyOf`/`oneOf`.
173+
These mixed fields are validated only by the Python check.
174+
175+
**ID structure (parts count)** (feasible) --
176+
`feat_req` has `parts: 3`, meaning its ID must have 3 segments separated by `__`
177+
(e.g. `feat_req__my_feature__001`). The Python check (`check_id_format`) splits on
178+
`__` and counts parts. *Feasible*: generate a per-type regex like
179+
`^feat_req__[^_]+(__[^_]+){1}$` in the schema. However, the Python check also
180+
validates that the ID contains the feature abbreviation from the file path
181+
(`check_id_contains_feature`), which depends on runtime context and cannot be
182+
expressed in a schema.
183+
184+
**Prohibited words** (feasible) --
185+
The metamodel forbids words like "shall", "must", "will" in need titles (for
186+
requirement types). The Python check splits the title into words and checks each one.
187+
*Feasible*: add a negative lookahead regex on the `title` field, e.g.
188+
`^(?!.*\b(shall|must|will)\b).*$`. This is less precise than the Python check
189+
(which normalizes case, strips punctuation) but catches most violations.
190+
191+
**Graph constraints** (not possible) --
192+
`graph_checks` in the metamodel define rules like "an ASIL_B need must link to at
193+
least one non-QM requirement via `satisfies`". This requires traversing the need
194+
graph across multiple levels, which is fundamentally beyond what JSON Schema can
195+
express. Only the Python check (`check_metamodel_graph`) can do this.
196+
197+
**Undefined extra options** (not possible) --
198+
The Python check (`check_extra_options`) warns when a need has fields not defined
199+
in the metamodel (e.g. a typo like `saftey` instead of `safety`). In theory,
200+
`unevaluatedProperties: false` could reject unknown fields. In practice, sphinx-needs
201+
adds many internal fields to needs (e.g. `docname`, `lineno`, `is_external`, computed
202+
fields from dynamic functions) that are not in the metamodel. Enabling this would
203+
cause false positives on every need.
204+
84205
## File layout
85206

86207
```

src/extensions/score_metamodel/sn_schemas.py

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -136,13 +136,14 @@ def _build_local_validator(
136136
if field in IGNORE_FIELDS:
137137
continue
138138
required.append(field)
139-
properties[field] = get_field_pattern_schema(field, pattern)
139+
properties[field] = get_field_pattern_schema(field, pattern, is_optional=False)
140140

141141
# Optional fields: if present, must match the regex pattern
142+
# Allow empty strings to align with Python checker behavior
142143
for field, pattern in optional_fields.items():
143144
if field in IGNORE_FIELDS:
144145
continue
145-
properties[field] = get_field_pattern_schema(field, pattern)
146+
properties[field] = get_field_pattern_schema(field, pattern, is_optional=True)
146147

147148
# Mandatory links (regex): must have at least one entry
148149
# TODO: regex pattern matching on link IDs is not yet enabled
@@ -265,28 +266,47 @@ def add_network_entry(field: str, target_types: list[str]) -> None:
265266
return type_schema
266267

267268

268-
def get_field_pattern_schema(field: str, pattern: str) -> dict[str, Any]:
269+
def get_field_pattern_schema(field: str, pattern: str, is_optional: bool = False) -> dict[str, Any]:
269270
"""Return the appropriate JSON schema for a field's regex pattern.
270271
271272
Array-valued fields (like ``tags``) get an array-of-strings schema;
272273
scalar fields get a plain string schema.
274+
275+
For optional fields, the schema allows empty strings to align with the
276+
Python metamodel checker's behavior (which treats empty strings as absent).
273277
"""
274278
if field in SN_ARRAY_FIELDS:
275-
return get_array_pattern_schema(pattern)
276-
return get_pattern_schema(pattern)
279+
return get_array_pattern_schema(pattern, is_optional=is_optional)
280+
return get_pattern_schema(pattern, is_optional=is_optional)
281+
277282

283+
def get_pattern_schema(pattern: str, is_optional: bool = False) -> dict[str, Any]:
284+
"""Return a JSON schema that validates a string against a regex pattern.
278285
279-
def get_pattern_schema(pattern: str) -> dict[str, str]:
280-
"""Return a JSON schema that validates a string against a regex pattern."""
286+
For optional fields, allows either an empty string OR a string matching
287+
the pattern, matching the Python checker's behavior where empty strings
288+
are treated as "absent" and not validated.
289+
"""
290+
if is_optional:
291+
# Allow empty strings for optional fields (Python checker treats "" as absent)
292+
# Use regex alternation to match either empty string or the original pattern
293+
return {
294+
"type": "string",
295+
"pattern": f"^$|{pattern}",
296+
}
281297
return {
282298
"type": "string",
283299
"pattern": pattern,
284300
}
285301

286302

287-
def get_array_pattern_schema(pattern: str) -> dict[str, Any]:
288-
"""Return a JSON schema that validates an array where each item matches a regex."""
303+
def get_array_pattern_schema(pattern: str, is_optional: bool = False) -> dict[str, Any]:
304+
"""Return a JSON schema that validates an array where each item matches a regex.
305+
306+
For optional fields, allows empty strings in the array to align with the
307+
Python checker's behavior.
308+
"""
289309
return {
290310
"type": "array",
291-
"items": get_pattern_schema(pattern),
311+
"items": get_pattern_schema(pattern, is_optional=is_optional),
292312
}

src/extensions/score_metamodel/tests/test_sn_schemas.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,16 @@ def test_preserves_complex_regex(self) -> None:
4545
assert result["type"] == "string"
4646
assert result["pattern"] == pattern
4747

48+
def test_optional_allows_empty_string(self) -> None:
49+
result = get_pattern_schema("^https://github.com/.*$", is_optional=True)
50+
assert result == {"type": "string", "pattern": "^$|^https://github.com/.*$"}
51+
52+
def test_mandatory_does_not_allow_empty_string(self) -> None:
53+
result = get_pattern_schema("^[A-Z]+$", is_optional=False)
54+
assert result == {"type": "string", "pattern": "^[A-Z]+$"}
55+
# Should not have alternation with empty string
56+
assert "^$|" not in result.get("pattern", "")
57+
4858

4959
# =============================================================================
5060
# Tests for get_array_pattern_schema
@@ -64,6 +74,11 @@ def test_items_match_get_pattern_schema(self) -> None:
6474
result = get_array_pattern_schema(pattern)
6575
assert result["items"] == get_pattern_schema(pattern)
6676

77+
def test_optional_array_allows_empty_string_items(self) -> None:
78+
result = get_array_pattern_schema("^tag_.*$", is_optional=True)
79+
assert result["type"] == "array"
80+
assert result["items"] == {"type": "string", "pattern": "^$|^tag_.*$"}
81+
6782

6883
# =============================================================================
6984
# Tests for get_field_pattern_schema
@@ -85,6 +100,16 @@ def test_unknown_field_returns_string_schema(self) -> None:
85100
result = get_field_pattern_schema("some_custom_field", "^.*$")
86101
assert result["type"] == "string"
87102

103+
def test_optional_scalar_field_allows_empty_string(self) -> None:
104+
result = get_field_pattern_schema("mitigation_issue", "^https://github.com/.*$", is_optional=True)
105+
assert result == {"type": "string", "pattern": "^$|^https://github.com/.*$"}
106+
107+
def test_mandatory_scalar_field_does_not_allow_empty_string(self) -> None:
108+
result = get_field_pattern_schema("status", "^(valid|invalid)$", is_optional=False)
109+
assert result == {"type": "string", "pattern": "^(valid|invalid)$"}
110+
# Should not have alternation with empty string
111+
assert "^$|" not in result.get("pattern", "")
112+
88113

89114
# =============================================================================
90115
# Tests for _classify_links
@@ -157,6 +182,8 @@ def test_optional_fields_not_required(self) -> None:
157182
result = _build_local_validator({}, optional, {}, {})
158183
assert "comment" not in result["required"]
159184
assert "comment" in result["properties"]
185+
# Optional fields should allow empty strings via pattern alternation
186+
assert result["properties"]["comment"] == {"type": "string", "pattern": "^$|^.*$"}
160187

161188
def test_ignored_fields_excluded(self) -> None:
162189
mandatory = {field: "^.*$" for field in IGNORE_FIELDS}

0 commit comments

Comments
 (0)