Skip to content

Implement binary search on atomic subintervals#107

Open
henryptung wants to merge 2 commits into
AlexandreDecan:masterfrom
henryptung:htung/binary-search-intersect
Open

Implement binary search on atomic subintervals#107
henryptung wants to merge 2 commits into
AlexandreDecan:masterfrom
henryptung:htung/binary-search-intersect

Conversation

@henryptung

Copy link
Copy Markdown

When Interval.overlaps, __and__, and __contains__ iterate the atomic subintervals, they use linear search to find the start point for candidate intersection. But since the subintervals are already disjoint and sorted, binary search can be used instead.

Fixes #106.

@henryptung henryptung force-pushed the htung/binary-search-intersect branch from 5c9676a to 398f6a4 Compare June 12, 2026 23:41
@henryptung

henryptung commented Jun 13, 2026

Copy link
Copy Markdown
Author

Hmm, I can probably write up a kludge of some kind for 3.9, but FWIW, 3.9 is EOL https://endoflife.date/python so maybe it can be dropped? Can do it in this PR if that helps. EDIT: Done, can revert if unwanted.

Went EOL on 2025-10-31.
@henryptung

Copy link
Copy Markdown
Author

Also...slightly tempted to try using galloping search for some more pathological cases (e.g.

{[0,1), [2,3), ..., [99998, 99999)} & {[0, 1), [99998, 99999)}

but I don't want to make this more complicated/bug-prone than it needs to be right now.

When Interval.overlaps, __and__, and __contains__ iterate the atomic
subintervals, they use linear search to find the start point for
candidate intersection. But since the subintervals are already disjoint
and sorted, binary search can be used instead.

Fixes AlexandreDecan#106.
@henryptung henryptung force-pushed the htung/binary-search-intersect branch from 398f6a4 to ad884bb Compare June 13, 2026 03:18
@AlexandreDecan

Copy link
Copy Markdown
Owner

Hmm, I can probably write up a kludge of some kind for 3.9, but FWIW, 3.9 is EOL https://endoflife.date/python so maybe it can be dropped? Can do it in this PR if that helps. EDIT: Done, can revert if unwanted.

I prefer keeping support for 3.9 as much as possible. What is the problem exactly?

Also, could you do some benchmarks so we can actually check whether the extra "complexity" is worth it? Thanks.

@AlexandreDecan AlexandreDecan added the enhancement New feature or request label Jun 13, 2026
@henryptung

henryptung commented Jun 13, 2026

Copy link
Copy Markdown
Author

For 3.9, it's because bisect doesn't take a key callback until 3.10.

Possible workarounds I can think of:

  • Implement own bisect_left
  • Bisect on [i.upper for i in self._intervals] (ehh, trying to avoid the full O(m+n) cost here when n is very small)
  • Condition code to Python 3.10+ (no optimization for 3.9)

Re: benchmarks, how do you want them added/run? There's only pass/fail unit tests at the moment.

@AlexandreDecan

Copy link
Copy Markdown
Owner

Thanks. I'll check whether there are some backports for 3.9, would be good to keep compatibility. Otherwise, an option would be to have two distinct code paths but I tend to dislike this idea :-)

For the benchmarks, I mostly want to be sure that we are not adding any overhead for smaller cases. Informal benchmarks are enough (and if I remember correctly we already have some code living in some issue/pr for that, I'll try to find it and will post here)

@AlexandreDecan

Copy link
Copy Markdown
Owner

I can't find it but we can easily redo it or even use copilot for that.

@henryptung

Copy link
Copy Markdown
Author

FWIW, as an ad-hoc test:

    def test_overlap_benchmark(self):
        domain = P.Interval(*(P.closed(2*i, 2*i+1) for i in range(100000)))
        for i in [1, 10, 100, 1000, 10000, 100000]:
            result = timeit.timeit(lambda: domain & P.closed(i, i+100), number=100)
            print(f"{i}: {result}")

Before change:

============================== 1 passed in 8.47s ===============================
PASSED      [100%]1: 0.029304584008059464
10: 0.030294540993054397
100: 0.03660045799915679
1000: 0.1031846670084633
10000: 0.744428666002932
100000: 7.182451458997093

After change:

============================== 1 passed in 0.52s ===============================
PASSED      [100%]1: 0.029305957999895327
10: 0.029306875003385358
100: 0.029349834003369324
1000: 0.028838458994869143
10000: 0.028866124994237907
100000: 0.028846333996625617

@AlexandreDecan

Copy link
Copy Markdown
Owner

Thanks. I see little convincing options for 3.9, perhaps dropping it is the "best" we can do given it's eol, as you said.

@AlexandreDecan

Copy link
Copy Markdown
Owner

Would it make sense to integrate the early-out optimisation directly in the iterator?

@henryptung

henryptung commented Jun 13, 2026

Copy link
Copy Markdown
Author

I can add all-or-nothing optimization for _tail_iterator:

        if value <= self.lower:
            return iter(self)
        if value > self.upper:
            return iter(())

It's tricky though. I think keeping the optimization still makes sense for overlaps, __and__ and __contains__, both because they have atomic codepaths this doesn't cover, and because it feels like it couples them too closely to the way this helper method runs.

But, if I leave the early-out optimizations in, I don't think the value > self.upper branch will ever be hit in coverage. Hmmm...need sleep, but will revisit this weekend.

@AlexandreDecan

AlexandreDecan commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Indeed, thanks.

I'm likely to merge this pr "soon", I just need to check it on something else than my phone before :-D (edited: likely this Sunday, not sure I'll have time for it today)

Comment thread CHANGELOG.md
- Improve performance of `Interval` creation and union for large disjunctions of overlapping intervals.
- Improve performance of `Interval.__contains__` for values.
- Drop official support for Python 3.9.
- Improve performance of `Interval.overlaps`, `__and__`, and `__contains__` for large, complex intervals when applied to small subintervals (see [#106](https://github.com/AlexandreDecan/portion/issues/106)).

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may directly refer to this pr instead of the issue, and mention your name if you want (see

- Speed up `repr` and `to_string` for `Interval` instances (see [#76](https://github.com/AlexandreDecan/portion/issues/76), adm271828).
for example)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Perf] Use binary search to find start-point for Interval.overlaps, __and__, __contains__

2 participants