feat(transport): Destination worker pool#151
Conversation
87b7c85 to
b044be9
Compare
16837e5 to
f8edba4
Compare
|
@link2xt I'm still in the process of extending the test suite, but I think the non-test parts can be already reviewed. |
| } | ||
| #[derive(Debug, Default)] | ||
| pub struct TransactionState { | ||
| permits: BTreeMap<AddressDomain, Option<OwnedPermit<WorkerMessage>>>, |
There was a problem hiding this comment.
This can be simplified by removing Option. Worst case we have recipients alice@example.org and bob@example.org and fail to obtain permit for alice@example.org and then successfully obtain it for bob@example.org, but this is very unlikely and is arguably not worse than rejecting bob@example.org. This will also simplify handle_data_start by checking only if permits is empty and other code using permits as well.
There was a problem hiding this comment.
Downside of that is that we won't have any way to distinguish "tried to get permit but failed" from "didn't try to get permit yet". If we send to a large group, let's say with 100 users on the same relay that stopped responding; then we will call the lock on the WorkerPool as well as worker.tx.clone().try_reserve_owned() hundred times in a row and we only call it once right now. Not sure about the real life performance, but it feels like it can cause trouble.
I can instead change Option to a small, custom, more descriptive type, so we don't have confusing nested options.
There was a problem hiding this comment.
(or just use an alias to Result<OwnedPermit<WorkerMessage>, TrySendError<Sender>> that try_reserve_owned() returns instead of calling ok())
There was a problem hiding this comment.
Not sure about the real life performance, but it feels like it can cause trouble.
From performance point of view I think it is cheap to try to acquire permit even 100 times, in the end try_reserve_owned just tries to acquire semaphore permits and when the channel is full failing is essentially running only this code, loading atomic usize and then returning an error because it is full:
https://github.com/tokio-rs/tokio/blob/7892f6020d9c914a41d0c350693fb71937d43c03/tokio/src/sync/batch_semaphore.rs#L273-L283
Even RAM read latency is ~100 ns, less if cached, worst case doing this for 100 recipients you will spend 10 microseconds on it.
Anyway i think it's fine either way, can be just marked as resolved.
Implements a per-destination worker pool, so that connections to the same destination are not parallelized, but instead queued. If a queue is full, new messages are immediately deferred, before mail data is sent from postfix. Closes: #141 Signed-off-by: Jagoda Ślązak <jslazak@jslazak.com>
f8edba4 to
20cd9fa
Compare
| [dev-dependencies] | ||
| rstest = "0.26.1" | ||
| testresult = "0.4.1" | ||
| dns-mock-server = "0.1.7" |
There was a problem hiding this comment.
It will be, I'm expanding the test suite and for that, I need to mock the DNS records, this seems like the easiest way, I already made it possible to create TransportHandler which uses localhost as DNS so I just need to spawn a server with records I want.
Implements a per-destination worker pool,
so that connections to the same destination
are not parallelized, but instead queued.
If a queue is full, new messages are immediately
deferred, before mail data is sent from postfix.
Closes: #141
Signed-off-by: Jagoda Ślązak jslazak@jslazak.com