Skip to content

feat(transport): Destination worker pool#151

Draft
j-g00da wants to merge 1 commit into
mainfrom
j-g00da/transport-worker-pool
Draft

feat(transport): Destination worker pool#151
j-g00da wants to merge 1 commit into
mainfrom
j-g00da/transport-worker-pool

Conversation

@j-g00da

@j-g00da j-g00da commented May 12, 2026

Copy link
Copy Markdown
Collaborator

Implements a per-destination worker pool,
so that connections to the same destination
are not parallelized, but instead queued.
If a queue is full, new messages are immediately
deferred, before mail data is sent from postfix.

Closes: #141

Signed-off-by: Jagoda Ślązak jslazak@jslazak.com

@j-g00da j-g00da force-pushed the j-g00da/transport-worker-pool branch 4 times, most recently from 87b7c85 to b044be9 Compare June 1, 2026 10:11
@j-g00da j-g00da force-pushed the j-g00da/transport-worker-pool branch 2 times, most recently from 16837e5 to f8edba4 Compare June 11, 2026 11:45
@j-g00da j-g00da changed the title WIP: worker pool feat(transport): Destination worker pool Jun 11, 2026
@j-g00da j-g00da requested a review from link2xt June 11, 2026 12:12
@j-g00da

j-g00da commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

@link2xt I'm still in the process of extending the test suite, but I think the non-test parts can be already reviewed.

Comment thread src/transport/worker.rs Outdated
Comment thread src/transport/worker.rs Outdated
Comment thread src/transport.rs
}
#[derive(Debug, Default)]
pub struct TransactionState {
permits: BTreeMap<AddressDomain, Option<OwnedPermit<WorkerMessage>>>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified by removing Option. Worst case we have recipients alice@example.org and bob@example.org and fail to obtain permit for alice@example.org and then successfully obtain it for bob@example.org, but this is very unlikely and is arguably not worse than rejecting bob@example.org. This will also simplify handle_data_start by checking only if permits is empty and other code using permits as well.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downside of that is that we won't have any way to distinguish "tried to get permit but failed" from "didn't try to get permit yet". If we send to a large group, let's say with 100 users on the same relay that stopped responding; then we will call the lock on the WorkerPool as well as worker.tx.clone().try_reserve_owned() hundred times in a row and we only call it once right now. Not sure about the real life performance, but it feels like it can cause trouble.

I can instead change Option to a small, custom, more descriptive type, so we don't have confusing nested options.

@j-g00da j-g00da Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or just use an alias to Result<OwnedPermit<WorkerMessage>, TrySendError<Sender>> that try_reserve_owned() returns instead of calling ok())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the real life performance, but it feels like it can cause trouble.

From performance point of view I think it is cheap to try to acquire permit even 100 times, in the end try_reserve_owned just tries to acquire semaphore permits and when the channel is full failing is essentially running only this code, loading atomic usize and then returning an error because it is full:
https://github.com/tokio-rs/tokio/blob/7892f6020d9c914a41d0c350693fb71937d43c03/tokio/src/sync/batch_semaphore.rs#L273-L283

Even RAM read latency is ~100 ns, less if cached, worst case doing this for 100 recipients you will spend 10 microseconds on it.

Anyway i think it's fine either way, can be just marked as resolved.

Implements a per-destination worker pool,
so that connections to the same destination
are not parallelized, but instead queued.
If a queue is full, new messages are immediately
deferred, before mail data is sent from postfix.

Closes: #141

Signed-off-by: Jagoda Ślązak <jslazak@jslazak.com>
@j-g00da j-g00da force-pushed the j-g00da/transport-worker-pool branch from f8edba4 to 20cd9fa Compare June 12, 2026 09:19
Comment thread Cargo.toml
[dev-dependencies]
rstest = "0.26.1"
testresult = "0.4.1"
dns-mock-server = "0.1.7"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it used?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be, I'm expanding the test suite and for that, I need to mock the DNS records, this seems like the easiest way, I already made it possible to create TransportHandler which uses localhost as DNS so I just need to spawn a server with records I want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

filtermail-transport: worker pool, channels and backpressure

2 participants