Skip to content

spilled() is very slow #361

@Brogolem35

Description

@Brogolem35

I am the author of the markov_str library and I trying out the SmallVec to speed up the creation of my Markov Chains. But I have encountered this issue while benchmarking with cargo flamegraph: equvalance checks between SmallVecs take too long and almost all the time is spent on spilled().

Example:

MarkovChain type:

pub struct MarkovChain {
	items: HashMap<SmallVec<[Spur; 4]>, ChainItem>,
	state_size: usize,
	regex: Regex,
	cache: Rodeo,
}

benchmarked function:

	/// Adds text as training data. The tokens will be created with the regex of the MarkovChain.
	pub fn add_text(&mut self, text: &str) {
		let tokens: Vec<Spur> = self
			.regex
			.find_iter(text)
			.map(|t| self.cache.get_or_intern(t.as_str()))
			.collect();

		// vec.windows(0) panics for some reason.
		if tokens.is_empty() {
			return;
		}

		// Creating a preallocated buffer and filling and cleaning it instead of creating a new one every loop is way more efficient.
		let mut prevbuf: SmallVec<[Spur; 4]> = SmallVec::with_capacity(self.state_size);
		for win in tokens.windows(tokens.len().min(self.state_size + 1)) {
			let rel = win.last().unwrap();

			for i in 1..win.len() {
				prevbuf.clear();
				for t in win.iter().rev().skip(1).take(i).rev() {
					prevbuf.push(*t);
				}

				match self.items.raw_entry_mut().from_key(&prevbuf) {
					RawEntryMut::Occupied(mut view) => {
						view.get_mut().add(*rel);
					}
					RawEntryMut::Vacant(view) => {
						view.insert(prevbuf.clone(), ChainItem::new(*rel));
					}
				}
			}

Crates used:

hashbrown = "0.15.0"
lasso = {version = "0.7.3", features = ["ahasher", "inline-more"]}
rand = "0.8.5"
regex = "1.11.0"
smallvec = "1.13.2"

Flamegraph output:
image

The code and sample data I used for this benchmark can be found at Brogolem35/markov_str/.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions