|
| 1 | +======================================================================== |
| 2 | +Compute summary for all detected packages. |
| 3 | +======================================================================== |
| 4 | + |
| 5 | + |
| 6 | +| **Organization:** `AboutCode <https://aboutcode.org>`_ |
| 7 | +| **Project:** `Scancode Toolkit <https://github.com/aboutcode-org/scancode-toolkit>`_ |
| 8 | +| **Mentee:** `Swastik Sharma (swastkk) <https://github.com/swastkk>`_ |
| 9 | +| **Mentors:** Philippe Ombredanne, AyanSinhaMahapatra, AvishrantSh, Jonathan Yang, Jay Kumar |
| 10 | +
|
| 11 | +Overview |
| 12 | +-------- |
| 13 | + |
| 14 | +Previously we were computing the summary at the codebase level which involves `license_clarity_score`, |
| 15 | +`declared_holder`, `other_license_expressions` and many more. This project aims to improve scanning accuracy |
| 16 | +by computing summary and license clarity scores for each package and its files, rather than for the entire scan. |
| 17 | +This involves enhancing package models, and ensuring proper attribute collection for all package ecosystems. |
| 18 | + |
| 19 | +Implementation |
| 20 | +-------------- |
| 21 | + |
| 22 | +All the work I did is contained in `this single PR <https://github.com/aboutcode-org/scancode-toolkit/pull/3792>`_. |
| 23 | +I added a new command line option called ``--package-summary`` that someone can use |
| 24 | +to get the package level summary within a single codebase. The package level summary involves the |
| 25 | +``license_clarity_score`` calculation and population of package attributes like ``copyright``, |
| 26 | +``holder``, ``other_license_expression``, ``notice_text``. This option must be called with ``--classify`` |
| 27 | +option that helps ScanCode further classify scanned files/directories, to determine whether |
| 28 | +they fall in these categories `legal`, `readme`, `top-level`, `manifest` & ``--package`` or ``-p`` option |
| 29 | +detects various package manifests, lockfiles and package-like data and then assembles codebase level packages |
| 30 | +and dependencies from these package data detected at files. Also tags files if they are part of the packages. |
| 31 | + |
| 32 | +This change allows users to get the more refined summary for each individual package that is present in a codebase. |
| 33 | +Also this feature improves the package assembly for various package ecosystems like npm, python-whl, rust, rubygems etc. |
| 34 | + |
| 35 | + |
| 36 | +Finally, all these changes are tested through multiple unit tests validating both correct |
| 37 | +behavior and error handling as needed. |
| 38 | + |
| 39 | +Post GSoC |
| 40 | +--------- |
| 41 | + |
| 42 | +I would like to merge this PR into Scancode Toolkit, hopefully allowing users to leverage |
| 43 | +this feature to expand their package/codebase scanning capabilities. |
| 44 | + |
| 45 | +Links |
| 46 | +----- |
| 47 | + |
| 48 | +`Project idea <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2024-Project-Ideas#compute-summary-for-all-detected-packages>`_ |
| 49 | + |
| 50 | +`Official GSoC project page <https://summerofcode.withgoogle.com/programs/2024/projects/JzMlDtnM>`_ |
| 51 | + |
| 52 | +`GSoC Proposal <https://docs.google.com/document/d/1TcGqQVzXhTkz6Pmu9UaXAr4R4q1rlT4tof7H7dsVG0o/edit?usp=sharing>`_ |
| 53 | + |
| 54 | +Acknowledgements |
| 55 | +---------------- |
| 56 | + |
| 57 | +I would like to thank my mentors |
| 58 | +- `@pombredanne <https://github.com/pombredanne>`_ |
| 59 | +- `@AyanSinhaMahapatra <https://github.com/AyanSinhaMahapatra>`_ |
| 60 | +- `@AvishrantSh <https://github.com/AvishrantSsh>`_ |
| 61 | +- `@35C4n0r <https://github.com/35C4n0r>`_ |
| 62 | + |
| 63 | +Weekly calls were greatly helpful and those special 1:1 call with `@AyanSinhaMahapatra` and `@pombredanne` |
| 64 | +were so amazing. Thank you for your time and your patience! |
0 commit comments