feat : One Dataset, One Result – Smart Deduplication in Knowledge Space Search ✔#99
feat : One Dataset, One Result – Smart Deduplication in Knowledge Space Search ✔#99Areeba-Tahir-18 wants to merge 1 commit intoINCF:mainfrom
Conversation
|
Merging this PR will successfully close the issue #68 . @visakhmr and @QuantumByte-01 . I have updated my PR and removed large fies coming in PR as requested by @QuantumByte-01 . Kindly review PR when you have time . would appreciate your feedback . Thanks !! |
Just a point that came to my mind:When the system shows duplicate datasets, users cannot trust the search results. |
Summary
This PR solves issue #68 and introduces a robust deduplication mechanism for the Knowledge Space search tool to ensure cleaner, more accurate search results.
Problem #68
Search results were showing duplicate datasets due to:
Solution
This PR implements:
Impact of Feature In Real World UseCase
Example
Previously, “Anesthesia EEG Dataset” appeared 3 times from DANDI. Now, only a single clean entry is returned.