Skip to content

Validator: Answer relevance custom LLM judge#109

Open
rkritika1508 wants to merge 4 commits intomainfrom
feat/answer-relevance-llm-judge
Open

Validator: Answer relevance custom LLM judge#109
rkritika1508 wants to merge 4 commits intomainfrom
feat/answer-relevance-llm-judge

Conversation

@rkritika1508
Copy link
Copy Markdown
Collaborator

@rkritika1508 rkritika1508 commented May 8, 2026

Summary

Target issue is #PLEASE_TYPE_ISSUE_NUMBER
Explain the motivation for making this change. What existing problem does the pull request solve?

  • New validator answer_relevance_custom_llm evaluates whether an LLM's answer is relevant to a user query using an LLM as judge (YES/NO).
  • Input to the guardrail must be a JSON string {"query": "...", "answer": "..."}. Uses a configurable prompt template with {query} and {answer} placeholders; defaults to a built-in prompt when none is provided.
  • Custom prompt storage API (/guardrails/answer_relevance_prompts): full CRUD endpoints (multi-tenant, X-API-KEY auth) for NGOs to store, version, and manage domain-specific evaluation prompts. Prompts are validated at write time to enforce both {query} and {answer} placeholders. Reference a stored prompt at runtime via custom_prompt_id in the validator config.
  • Tests: validator unit tests, route unit tests, integration tests against a real DB, validator-config unit tests covering the full stack — validation logic, API CRUD, tenant isolation, pagination, schema enforcement, and config resolution.

Added files:

  • app/core/validators/answer_relevance_custom_llm.py — validator: parses JSON input, formats the prompt, calls LiteLLM, returns PassResult (YES) or FailResult (NO / error)
  • app/core/validators/config/answer_relevance_custom_llm_safety_validator_config.py — config class (type: "answer_relevance_custom_llm"), build() resolves to the validator; raises if OPENAI_API_KEY is missing
    app/models/config/answer_relevance_prompt.py — SQLModel table answer_relevance_prompt scoped to organization_id + project_id
  • app/schemas/answer_relevance_prompt.py — Pydantic schemas with placeholder validation on prompt_template
    app/crud/answer_relevance_prompt.py — standard CRUD
  • app/api/routes/answer_relevance_prompts.py — REST endpoints
  • app/alembic/versions/008_add_answer_relevance_prompt.py — DB migration
  • app/api/docs/answer_relevance_prompts/ — OpenAPI description files
  • app/tests/validators/test_answer_relevance_custom_llm.py — validator unit tests
  • app/tests/test_answer_relevance_prompts_api.py — route unit tests
  • app/tests/test_answer_relevance_prompts_api_integration.py — integration tests

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces answer_relevance_custom_llm, a new multi-tenant validator that uses OpenAI to assess answer relevance. It includes: a database table and migration for storing per-tenant prompts; schemas and CRUD operations; five API endpoints for prompt management; the validator implementation that parses JSON input, formats prompts with query/answer substitution, and calls OpenAI; integration into the guardrails resolver to fetch stored prompts; and comprehensive unit, integration, and validator tests.

Changes

Answer Relevance Custom LLM Validator with Multi-Tenant Prompt Management

Layer / File(s) Summary
Database & Migration
backend/app/alembic/versions/008_add_answer_relevance_prompt.py
Creates answer_relevance_prompt table with UUID primary key, org/project foreign keys, prompt metadata, template, activation flag, and timestamps; adds indexes for tenant/status lookups.
Domain Model
backend/app/models/config/answer_relevance_prompt.py
Defines SQLModel table with all required fields, primary key generation, foreign key references, and automatic timestamp management via onupdate.
Request/Response Schemas
backend/app/schemas/answer_relevance_prompt.py
Pydantic schemas for create (required fields), update (optional fields), response (with persistence fields); validates that prompt_template contains both {query} and {answer} placeholders.
Validator Registration
backend/app/core/enum.py, backend/app/core/validators/config/answer_relevance_custom_llm_safety_validator_config.py
Adds AnswerRelevanceCustomLLM enum member; defines config class with llm_callable, optional prompt_template, and custom_prompt_id fields; build() validates OPENAI_API_KEY presence and returns validator instance.
Validator Implementation
backend/app/core/validators/answer_relevance_custom_llm.py
Core logic: parses JSON input, validates non-empty query/answer, formats prompt with placeholders, calls OpenAI completion, returns PassResult on "YES" or FailResult on "NO"/unexpected responses.
CRUD Operations
backend/app/crud/answer_relevance_prompt.py
Implements create, get, list, update, delete with tenant scoping (org_id/project_id); error handling converts IntegrityError to HTTP 400; pagination via offset/limit; atomic transaction management.
API Routes
backend/app/api/routes/answer_relevance_prompts.py
Five FastAPI endpoints: POST / (create), GET / (list), GET /{id} (retrieve), PATCH /{id} (update), DELETE /{id} (delete); all wrapped in APIResponse and load descriptions from markdown docs.
Route Registration & Guardrails Integration
backend/app/api/main.py, backend/app/api/routes/guardrails.py, backend/app/schemas/guardrail_config.py
Registers answer_relevance_prompts router in main.py; extends _resolve_validator_configs() in guardrails.py to fetch stored prompt via CRUD when custom_prompt_id provided; updates discriminated union to accept new config type.
API Documentation
backend/app/api/API_USAGE.md, backend/app/api/docs/answer_relevance_prompts/*.md, backend/app/api/docs/guardrails/run_guardrails.md, backend/app/core/validators/README.md
Complete guides: API endpoint reference with examples, validator behavior notes, tenant scoping, error cases, placeholder requirements, OPENAI_API_KEY dependency, end-to-end usage patterns.
Unit Tests
backend/app/tests/test_llm_validators.py, backend/app/tests/validators/test_answer_relevance_custom_llm.py, backend/app/tests/test_validate_with_guard.py
Config build tests (API key requirement, template handling); validator tests (YES/NO parsing, input validation, placeholder substitution, LLM error handling, llm_callable forwarding); resolution tests for custom prompt lookup.
API Route Unit Tests
backend/app/tests/test_answer_relevance_prompts_api.py
Mocked tests for each route: verify CRUD delegation with correct args, response wrapping, and success reporting.
Integration Tests
backend/app/tests/test_answer_relevance_prompts_api_integration.py
Full end-to-end tests covering successful CRUD, validation errors (missing/invalid placeholders, length limits, empty fields), pagination, tenant isolation, and 404 handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

enhancement, ready-for-review

Suggested reviewers

  • nishika26
  • AkhileshNegi
  • dennyabrain

Poem

🐰 A bunny hops through prompts so fine,
Checking relevance with OpenAI's sign,
Queries dance with answers bright,
YES or NO—the truth shines light!
Multi-tenant safety, tests in place,
Guardrails bounding every space! 🛡️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately describes the primary change: adding a new validator type for evaluating answer relevance using a custom LLM judge, which aligns with the comprehensive changeset across schemas, CRUD operations, API routes, validator implementation, and tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/answer-relevance-llm-judge

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
backend/app/alembic/versions/008_add_answer_relevance_prompt.py (1)

35-49: ⚡ Quick win

Add a composite index for the tenant-scoped list pattern.

Line 35-49 only adds single-column indexes. For queries filtered by organization_id + project_id and ordered by created_at, id, a composite index will scale better.

Suggested migration change
     op.create_index(
         "idx_answer_relevance_prompt_is_active",
         "answer_relevance_prompt",
         ["is_active"],
     )
+    op.create_index(
+        "idx_answer_relevance_prompt_tenant_created_id",
+        "answer_relevance_prompt",
+        ["organization_id", "project_id", "created_at", "id"],
+    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/alembic/versions/008_add_answer_relevance_prompt.py` around lines
35 - 49, The migration currently only creates single-column indexes via
op.create_index for "idx_answer_relevance_prompt_org",
"idx_answer_relevance_prompt_project", and
"idx_answer_relevance_prompt_is_active" on the answer_relevance_prompt table;
add a composite index for the tenant-scoped list pattern to support queries
filtered by organization_id + project_id and ordered by created_at, id by
creating a new composite index (e.g. name it
"idx_answer_relevance_prompt_org_project_created_at_id") on columns
["organization_id","project_id","created_at","id"]; also ensure the
corresponding downgrade drops that composite index (and keep or remove the
single-column org/project indexes as desired) so the migration is reversible.
backend/app/api/routes/guardrails.py (1)

133-142: ⚡ Quick win

Avoid DB lookup when prompt_template is already provided inline.

Currently, custom_prompt_id triggers a fetch unconditionally. Guarding on missing prompt_template would reduce unnecessary I/O and avoid overriding explicit runtime templates.

Proposed patch
         elif isinstance(validator, AnswerRelevanceCustomLLMSafetyValidatorConfig):
-            if validator.custom_prompt_id is not None:
+            if (
+                validator.custom_prompt_id is not None
+                and not validator.prompt_template
+            ):
                 prompt_config = answer_relevance_prompt_crud.get(
                     session=session,
                     id=validator.custom_prompt_id,
                     organization_id=payload.organization_id,
                     project_id=payload.project_id,
                 )
                 validator.prompt_template = prompt_config.prompt_template
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/api/routes/guardrails.py` around lines 133 - 142, The code
unconditionally looks up a DB prompt when validator.custom_prompt_id is set and
then overwrites validator.prompt_template; change the logic in the
AnswerRelevanceCustomLLMSafetyValidatorConfig branch to only call
answer_relevance_prompt_crud.get (using session, payload.organization_id,
payload.project_id) when validator.custom_prompt_id is present AND
validator.prompt_template is missing/empty, so any inline-provided
validator.prompt_template is preserved and unnecessary I/O is avoided; after the
conditional fetch assign validator.prompt_template only from the retrieved
prompt_config.
backend/app/tests/validators/test_answer_relevance_custom_llm.py (1)

113-155: 💤 Low value

Optional: add non-dict JSON / non-string-field cases.

Consider adding tests for inputs like validator._validate("123"), validator._validate("null"), and {"query": 1, "answer": "x"} so the parsing edge cases (raised on the validator file) stay covered going forward.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/tests/validators/test_answer_relevance_custom_llm.py` around
lines 113 - 155, Add tests to cover non-dict JSON and non-string field cases so
parsing edge cases remain covered: extend
backend/app/tests/validators/test_answer_relevance_custom_llm.py with new test
functions that call validator._validate on JSON primitives (e.g., "123", "null")
and on a JSON object with non-string field types (e.g., {"query": 1, "answer":
"x"}), and assert they return FailResult (using isinstance(result, FailResult))
and include appropriate error messages where relevant; reference
validator._validate and existing test patterns (e.g.,
test_fails_with_non_json_input, test_fails_with_missing_query_key) to mirror
structure and assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/api/docs/answer_relevance_prompts/create_prompt.md`:
- Around line 19-25: The two fenced code blocks in create_prompt.md (the blocks
beginning with the lines "Query: {query} ... Answer only YES or NO." and the one
starting "You are evaluating a maternal health assistant.") need explicit
language identifiers to satisfy markdownlint MD040; update both opening fences
from ``` to ```text so each block reads ```text and leave the block contents
unchanged.

In `@backend/app/api/docs/guardrails/run_guardrails.md`:
- Line 11: Update the docs for the answer_relevance_custom_llm operation to
explicitly state the precedence and mutual-exclusivity behavior when both
custom_prompt_id and prompt_template are provided: specify whether they are
mutually exclusive (reject requests containing both) or define a deterministic
precedence rule (e.g., "custom_prompt_id takes precedence over prompt_template
if both are set"), and show a short example of the accepted input JSON
{"query":"...", "answer":"..."} with the chosen behavior. Ensure the text
mentions the parameter names custom_prompt_id and prompt_template and that
OPENAI_API_KEY is required.

In `@backend/app/core/validators/answer_relevance_custom_llm.py`:
- Around line 44-57: In _validate (in answer_relevance_custom_llm.py) guard
against non-dict JSON and non-string fields by first verifying the result of
json.loads(value) is a dict and returning FailResult if not, then extract query
and answer and ensure both are instances of str before calling .strip(); if
either is missing or not a string (or empty after strip) return FailResult with
the existing error messages. This prevents AttributeError from .get/.strip on
non-dict or non-str values while preserving the current
ValidationResult/FailResult flow.

In `@backend/app/core/validators/README.md`:
- Around line 519-525: The fenced code block containing the prompt that starts
with "Query: {query}" and ends with "Answer only YES or NO." should be annotated
with a language to satisfy markdownlint MD040; update the opening fence from ```
to ```text for that block (the block that contains the lines "Query: {query}"
and "Answer: {answer}") so the README.md stays lint-clean and consistent with
other fenced blocks.

---

Nitpick comments:
In `@backend/app/alembic/versions/008_add_answer_relevance_prompt.py`:
- Around line 35-49: The migration currently only creates single-column indexes
via op.create_index for "idx_answer_relevance_prompt_org",
"idx_answer_relevance_prompt_project", and
"idx_answer_relevance_prompt_is_active" on the answer_relevance_prompt table;
add a composite index for the tenant-scoped list pattern to support queries
filtered by organization_id + project_id and ordered by created_at, id by
creating a new composite index (e.g. name it
"idx_answer_relevance_prompt_org_project_created_at_id") on columns
["organization_id","project_id","created_at","id"]; also ensure the
corresponding downgrade drops that composite index (and keep or remove the
single-column org/project indexes as desired) so the migration is reversible.

In `@backend/app/api/routes/guardrails.py`:
- Around line 133-142: The code unconditionally looks up a DB prompt when
validator.custom_prompt_id is set and then overwrites validator.prompt_template;
change the logic in the AnswerRelevanceCustomLLMSafetyValidatorConfig branch to
only call answer_relevance_prompt_crud.get (using session,
payload.organization_id, payload.project_id) when validator.custom_prompt_id is
present AND validator.prompt_template is missing/empty, so any inline-provided
validator.prompt_template is preserved and unnecessary I/O is avoided; after the
conditional fetch assign validator.prompt_template only from the retrieved
prompt_config.

In `@backend/app/tests/validators/test_answer_relevance_custom_llm.py`:
- Around line 113-155: Add tests to cover non-dict JSON and non-string field
cases so parsing edge cases remain covered: extend
backend/app/tests/validators/test_answer_relevance_custom_llm.py with new test
functions that call validator._validate on JSON primitives (e.g., "123", "null")
and on a JSON object with non-string field types (e.g., {"query": 1, "answer":
"x"}), and assert they return FailResult (using isinstance(result, FailResult))
and include appropriate error messages where relevant; reference
validator._validate and existing test patterns (e.g.,
test_fails_with_non_json_input, test_fails_with_missing_query_key) to mirror
structure and assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 04d6d867-db71-48df-ab0c-a3c4b234f1bc

📥 Commits

Reviewing files that changed from the base of the PR and between 3395096 and 64e40aa.

📒 Files selected for processing (24)
  • backend/app/alembic/versions/008_add_answer_relevance_prompt.py
  • backend/app/api/API_USAGE.md
  • backend/app/api/docs/answer_relevance_prompts/create_prompt.md
  • backend/app/api/docs/answer_relevance_prompts/delete_prompt.md
  • backend/app/api/docs/answer_relevance_prompts/get_prompt.md
  • backend/app/api/docs/answer_relevance_prompts/list_prompts.md
  • backend/app/api/docs/answer_relevance_prompts/update_prompt.md
  • backend/app/api/docs/guardrails/run_guardrails.md
  • backend/app/api/main.py
  • backend/app/api/routes/answer_relevance_prompts.py
  • backend/app/api/routes/guardrails.py
  • backend/app/core/enum.py
  • backend/app/core/validators/README.md
  • backend/app/core/validators/answer_relevance_custom_llm.py
  • backend/app/core/validators/config/answer_relevance_custom_llm_safety_validator_config.py
  • backend/app/crud/answer_relevance_prompt.py
  • backend/app/models/config/answer_relevance_prompt.py
  • backend/app/schemas/answer_relevance_prompt.py
  • backend/app/schemas/guardrail_config.py
  • backend/app/tests/test_answer_relevance_prompts_api.py
  • backend/app/tests/test_answer_relevance_prompts_api_integration.py
  • backend/app/tests/test_llm_validators.py
  • backend/app/tests/test_validate_with_guard.py
  • backend/app/tests/validators/test_answer_relevance_custom_llm.py

Comment on lines +19 to +25
```
Query: {query}
Answer: {answer}

Does the answer fully satisfy the query and constraints?
Answer only YES or NO.
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks to satisfy markdownlint.

Both fenced blocks should declare a language (e.g., text) to clear MD040 warnings.

Proposed patch
-```
+```text
 Query: {query}
 Answer: {answer}
 
 Does the answer fully satisfy the query and constraints?
 Answer only YES or NO.

@@
- +text
You are evaluating a maternal health assistant.
Query: {query}
Answer: {answer}

Does the answer directly address the maternal health query with accurate information?
Answer only YES or NO.

Also applies to: 30-37

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 19-19: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/api/docs/answer_relevance_prompts/create_prompt.md` around lines
19 - 25, The two fenced code blocks in create_prompt.md (the blocks beginning
with the lines "Query: {query} ... Answer only YES or NO." and the one starting
"You are evaluating a maternal health assistant.") need explicit language
identifiers to satisfy markdownlint MD040; update both opening fences from ```
to ```text so each block reads ```text and leave the block contents unchanged.

- For `ban_list`, `ban_list_id` can be resolved to `banned_words` from tenant ban list configs.
- For `topic_relevance`, `topic_relevance_config_id` is required and is resolved to `configuration` + `prompt_schema_version` from tenant topic relevance configs. Requires `OPENAI_API_KEY` to be configured; returns a validation failure with an explicit error if missing.
- For `llm_critic`, `OPENAI_API_KEY` must be configured; returns `success=false` with an explicit error if missing.
- For `answer_relevance_custom_llm`, `input` must be a JSON string `{"query": "...", "answer": "..."}`. Pass `custom_prompt_id` to use a tenant-stored prompt template, or `prompt_template` inline. Requires `OPENAI_API_KEY`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify precedence/mutual exclusivity for custom_prompt_id vs prompt_template.

Line 11 says “or”, but doesn’t define behavior if clients send both. Please document whether they are mutually exclusive or which one wins.

Suggested doc tweak
-- For `answer_relevance_custom_llm`, `input` must be a JSON string `{"query": "...", "answer": "..."}`. Pass `custom_prompt_id` to use a tenant-stored prompt template, or `prompt_template` inline. Requires `OPENAI_API_KEY`.
+- For `answer_relevance_custom_llm`, `input` must be a JSON string `{"query": "...", "answer": "..."}`. Use `custom_prompt_id` for a tenant-stored prompt template or `prompt_template` inline, and document the behavior when both are provided (mutually exclusive vs precedence). Requires `OPENAI_API_KEY`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/api/docs/guardrails/run_guardrails.md` at line 11, Update the
docs for the answer_relevance_custom_llm operation to explicitly state the
precedence and mutual-exclusivity behavior when both custom_prompt_id and
prompt_template are provided: specify whether they are mutually exclusive
(reject requests containing both) or define a deterministic precedence rule
(e.g., "custom_prompt_id takes precedence over prompt_template if both are
set"), and show a short example of the accepted input JSON {"query":"...",
"answer":"..."} with the chosen behavior. Ensure the text mentions the parameter
names custom_prompt_id and prompt_template and that OPENAI_API_KEY is required.

Comment on lines +44 to +57
def _validate(self, value: str, metadata: dict = None) -> ValidationResult:
try:
data = json.loads(value)
query = data.get("query", "")
answer = data.get("answer", "")
except (json.JSONDecodeError, TypeError):
return FailResult(
error_message="Input must be a JSON string with 'query' and 'answer' fields."
)

if not query.strip() or not answer.strip():
return FailResult(
error_message="Both 'query' and 'answer' fields must be non-empty."
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard against non-dict JSON and non-string field values.

json.loads(value) may return a non-dict (e.g., 123, null, [...], "str"), and even on a dict, query/answer may be non-string (null, numbers). In both cases the subsequent .get(...) / .strip() calls raise an unhandled AttributeError, bypassing the intended FailResult and propagating as an exception.

🛡️ Suggested defensive parsing
-        try:
-            data = json.loads(value)
-            query = data.get("query", "")
-            answer = data.get("answer", "")
-        except (json.JSONDecodeError, TypeError):
-            return FailResult(
-                error_message="Input must be a JSON string with 'query' and 'answer' fields."
-            )
-
-        if not query.strip() or not answer.strip():
+        try:
+            data = json.loads(value)
+        except (json.JSONDecodeError, TypeError):
+            return FailResult(
+                error_message="Input must be a JSON string with 'query' and 'answer' fields."
+            )
+
+        if not isinstance(data, dict):
+            return FailResult(
+                error_message="Input must be a JSON object with 'query' and 'answer' fields."
+            )
+
+        query = data.get("query", "")
+        answer = data.get("answer", "")
+        if not isinstance(query, str) or not isinstance(answer, str):
+            return FailResult(
+                error_message="'query' and 'answer' must be strings."
+            )
+
+        if not query.strip() or not answer.strip():
             return FailResult(
                 error_message="Both 'query' and 'answer' fields must be non-empty."
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _validate(self, value: str, metadata: dict = None) -> ValidationResult:
try:
data = json.loads(value)
query = data.get("query", "")
answer = data.get("answer", "")
except (json.JSONDecodeError, TypeError):
return FailResult(
error_message="Input must be a JSON string with 'query' and 'answer' fields."
)
if not query.strip() or not answer.strip():
return FailResult(
error_message="Both 'query' and 'answer' fields must be non-empty."
)
def _validate(self, value: str, metadata: dict = None) -> ValidationResult:
try:
data = json.loads(value)
except (json.JSONDecodeError, TypeError):
return FailResult(
error_message="Input must be a JSON string with 'query' and 'answer' fields."
)
if not isinstance(data, dict):
return FailResult(
error_message="Input must be a JSON object with 'query' and 'answer' fields."
)
query = data.get("query", "")
answer = data.get("answer", "")
if not isinstance(query, str) or not isinstance(answer, str):
return FailResult(
error_message="'query' and 'answer' must be strings."
)
if not query.strip() or not answer.strip():
return FailResult(
error_message="Both 'query' and 'answer' fields must be non-empty."
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/core/validators/answer_relevance_custom_llm.py` around lines 44 -
57, In _validate (in answer_relevance_custom_llm.py) guard against non-dict JSON
and non-string fields by first verifying the result of json.loads(value) is a
dict and returning FailResult if not, then extract query and answer and ensure
both are instances of str before calling .strip(); if either is missing or not a
string (or empty after strip) return FailResult with the existing error
messages. This prevents AttributeError from .get/.strip on non-dict or non-str
values while preserving the current ValidationResult/FailResult flow.

Comment on lines +519 to +525
```
Query: {query}
Answer: {answer}

Does the answer fully satisfy the query and constraints?
Answer only YES or NO.
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language to the fenced default prompt block (markdownlint MD040).

This keeps docs lint-clean and consistent with the rest of the file.

Proposed patch
-```
+```text
 Query: {query}
 Answer: {answer}
 
 Does the answer fully satisfy the query and constraints?
 Answer only YES or NO.
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 519-519: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/core/validators/README.md` around lines 519 - 525, The fenced
code block containing the prompt that starts with "Query: {query}" and ends with
"Answer only YES or NO." should be annotated with a language to satisfy
markdownlint MD040; update the opening fence from ``` to ```text for that block
(the block that contains the lines "Query: {query}" and "Answer: {answer}") so
the README.md stays lint-clean and consistent with other fenced blocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant