Commit ee4d97a
Use context to enhance recognition by default (#25856)
* Implement custom context enhancement for Presidio recognizers
Presidio's default context enhancement relies heavily on NLP and often fails
when analyzing individual values rather than full text. This implements a
custom context enhancement that:
- Boosts recognizer scores to MAX when context keywords match
- Applies a minimum score threshold (0.3) before enhancement
- Skips already-enhanced results to prevent double-boosting
- Introduces a decorator pattern for composing recognizer enhancements
- Adds eager_us_bank_recognizer with higher base scores for better results
The enhancement works by checking if any context words from the recognizer
match the provided context list, then boosting the confidence score to
maximum and setting the IS_SCORE_ENHANCED_BY_CONTEXT_KEY metadata flag.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Integrate context enhancement into recognizer factory
Updates PresidioRecognizerFactory to apply the new decorator pattern:
- All recognizers now use enhance_using_context decorator
- Confidence threshold filtering applied via filter_enhanced_results_below_threshold
- Decorators composed using decorate_recognizer for clean application
- Context passed to PatternRecognizer during creation
This ensures all enabled recognizers benefit from custom context enhancement
while maintaining backward compatibility with confidence thresholds.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Add migration to update PII tag recognizers with enhanced configs
Adds a v1.1.22 migration that updates existing PII tags with improved
recognizer configurations featuring context keywords and optimized patterns.
Changes:
- Add patchRecognizers method in CollectionDAO for updating tag recognizers
- Implement setRecognizersForSensitiveTags in MigrationUtil to load and apply
recognizer configs from piiTagsWithRecognizers.json
- Update piiTagsWithRecognizers.json with context keywords for better
classification accuracy
- Execute migration as post-DDL script for both MySQL and PostgreSQL
This migration ensures existing deployments benefit from the improved context
enhancement logic without manual reconfiguration.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Fix potential NPE and type mismatch in migration utility
Addresses code review feedback:
1. Fix potential NullPointerException in setRecognizersForSensitiveTags
- Use Boolean.TRUE.equals() instead of auto-unboxing for nullable Boolean
- Prevents NPE when autoClassificationEnabled is absent from JSON
- Follows existing pattern from v1120/MigrationUtil.java
2. Fix Boolean vs boolean type mismatch in updateTagRecognizers
- Change isForceMigration parameter from boxed Boolean to primitive boolean
- Matches caller signature and eliminates latent NPE risk
- Maintains consistency across method signatures
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Add context to NHSRecognizer
* Fix typing
* Fix broken unit tests
* Fix broken integration test
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>1 parent 36ec306 commit ee4d97a
10 files changed
Lines changed: 807 additions & 19 deletions
File tree
- ingestion
- src/metadata/pii/algorithms
- tests
- integration/auto_classification
- unit/pii/algorithms
- openmetadata-service/src/main
- java/org/openmetadata/service
- jdbi3
- migration
- mysql/v1122
- postgres/v1122
- utils/v1122
- resources/json/data/tags
Lines changed: 13 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
32 | 34 | | |
33 | 35 | | |
34 | 36 | | |
| |||
84 | 86 | | |
85 | 87 | | |
86 | 88 | | |
87 | | - | |
88 | | - | |
89 | | - | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
90 | 98 | | |
91 | 99 | | |
92 | 100 | | |
| |||
127 | 135 | | |
128 | 136 | | |
129 | 137 | | |
| 138 | + | |
130 | 139 | | |
131 | 140 | | |
132 | 141 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
| 17 | + | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
| |||
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
| 49 | + | |
48 | 50 | | |
49 | 51 | | |
50 | 52 | | |
| |||
288 | 290 | | |
289 | 291 | | |
290 | 292 | | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
291 | 350 | | |
292 | 351 | | |
293 | 352 | | |
| |||
330 | 389 | | |
331 | 390 | | |
332 | 391 | | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
333 | 493 | | |
334 | 494 | | |
335 | 495 | | |
| |||
Lines changed: 7 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
413 | 414 | | |
414 | 415 | | |
415 | 416 | | |
416 | | - | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
417 | 421 | | |
418 | 422 | | |
419 | 423 | | |
| |||
461 | 465 | | |
462 | 466 | | |
463 | 467 | | |
| 468 | + | |
464 | 469 | | |
465 | 470 | | |
466 | 471 | | |
| |||
512 | 517 | | |
513 | 518 | | |
514 | 519 | | |
| 520 | + | |
515 | 521 | | |
516 | 522 | | |
517 | 523 | | |
| |||
0 commit comments