-
Notifications
You must be signed in to change notification settings - Fork 16
feat: Add file_converter extension module (Issue #54) #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vaibhav45sktech
wants to merge
9
commits into
dbpedia:main
Choose a base branch
from
vaibhav45sktech:feature/file-converter-extension
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 4 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
e9c7e12
feat: Add file_converter extension module (Issue #54)
vaibhav45sktech 2ef168c
Merge branch 'dbpedia:main' into feature/file-converter-extension
vaibhav45sktech 4698136
pipeline
vaibhav45sktech 3d9ad5c
minorchnages
vaibhav45sktech 3428c92
fixclean
vaibhav45sktech 89a1a17
addedfeatures
vaibhav45sktech 54f059f
statusfix
vaibhav45sktech 0c5682e
ruffcheck
vaibhav45sktech 98dac9f
cleanup
vaibhav45sktech File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,3 @@ | ||
| from .file_converter import FileConverter | ||
|
|
||
| __all__ = ["FileConverter"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| """File format conversion extension for databus-python-client. | ||
|
|
||
| Provides streaming pipeline for file decompression, re-compression, | ||
| and checksum validation during download operations. | ||
| """ | ||
|
|
||
| import gzip | ||
| import hashlib | ||
| from typing import BinaryIO, Optional | ||
|
|
||
|
|
||
| class FileConverter: | ||
| """Handles file format conversion with streaming support.""" | ||
|
|
||
| CHUNK_SIZE = 8192 # 8KB chunks for streaming | ||
|
|
||
| @staticmethod | ||
| def decompress_gzip_stream( | ||
| input_stream: BinaryIO, | ||
| output_stream: BinaryIO, | ||
| validate_checksum: bool = False, | ||
| ) -> Optional[str]: | ||
| """Decompress gzip stream with optional checksum computation. | ||
|
|
||
| Decompresses *input_stream* into *output_stream*. When | ||
| *validate_checksum* is ``True`` the SHA-256 digest of the | ||
| **decompressed** bytes is computed on-the-fly and returned. | ||
|
|
||
| To validate the checksum of the **compressed** input, use | ||
| :meth:`validate_checksum_stream` on the input stream before | ||
| calling this method. | ||
|
|
||
| Args: | ||
| input_stream: Input gzip compressed stream. | ||
| output_stream: Output decompressed stream. | ||
| validate_checksum: Whether to compute a SHA-256 checksum of | ||
| the decompressed output. | ||
|
|
||
| Returns: | ||
| Hex-encoded SHA-256 checksum of the decompressed data when | ||
| *validate_checksum* is ``True``, otherwise ``None``. | ||
| """ | ||
| hasher = hashlib.sha256() if validate_checksum else None | ||
|
|
||
| with gzip.open(input_stream, 'rb') as gz: | ||
| while True: | ||
| chunk = gz.read(FileConverter.CHUNK_SIZE) | ||
| if not chunk: | ||
| break | ||
| output_stream.write(chunk) | ||
| if hasher: | ||
| hasher.update(chunk) | ||
|
|
||
| return hasher.hexdigest() if hasher else None | ||
|
|
||
| @staticmethod | ||
| def compress_gzip_stream( | ||
| input_stream: BinaryIO, | ||
| output_stream: BinaryIO | ||
| ) -> None: | ||
| """Compress stream to gzip format. | ||
|
|
||
| Args: | ||
| input_stream: Input uncompressed stream | ||
| output_stream: Output gzip compressed stream | ||
| """ | ||
| with gzip.open(output_stream, 'wb') as gz: | ||
| while True: | ||
| chunk = input_stream.read(FileConverter.CHUNK_SIZE) | ||
| if not chunk: | ||
| break | ||
| gz.write(chunk) | ||
|
|
||
| @staticmethod | ||
| def validate_checksum_stream( | ||
| input_stream: BinaryIO, | ||
| expected_checksum: str | ||
| ) -> bool: | ||
| """Validate SHA256 checksum of a stream. | ||
|
|
||
| Args: | ||
| input_stream: Input stream to validate. Must be seekable; the stream | ||
| is rewound to position 0 both before reading and after a | ||
| successful validation. | ||
| expected_checksum: Expected SHA256 checksum | ||
|
|
||
| Returns: | ||
| True if checksum matches | ||
|
|
||
| Raises: | ||
| IOError: If checksum validation fails | ||
| """ | ||
| hasher = hashlib.sha256() | ||
| input_stream.seek(0) | ||
|
|
||
| while True: | ||
| chunk = input_stream.read(FileConverter.CHUNK_SIZE) | ||
| if not chunk: | ||
| break | ||
| hasher.update(chunk) | ||
|
|
||
| computed = hasher.hexdigest() | ||
| if computed.lower() != expected_checksum.lower(): | ||
| raise IOError( | ||
| f"Checksum mismatch: expected {expected_checksum}, " | ||
| f"got {computed}" | ||
| ) | ||
| input_stream.seek(0) | ||
| return True | ||
|
coderabbitai[bot] marked this conversation as resolved.
coderabbitai[bot] marked this conversation as resolved.
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.