Most recommendation engines guess. They take a category — "Indica", "relaxing", "for sleep" — and return whatever happens to rank highest for that tag. Alfy doesn't guess. It computes. Under the hood is a kD-Tree powered by 33-dimensional emotional vectors, a semantic search engine that understands language the way you do, and an AI agent intelligent enough to know which tool to reach for. Here's exactly how it works.
The Database: Thousands of Strains, Scored Across 33 Dimensions
Everything starts with the data. Alfy's database holds thousands of cannabis strains, and for each one, every effect has been scored and normalised — not just named. This isn't a tag list. It's a numerical fingerprint.
Each strain record contains three categories of scored effects, measured as normalised floats between 0 and 100:
Put them together and each strain becomes a point in 33-dimensional space — a coordinate that describes not just what it does, but how much it does it. Blue Dream isn't just "Creative." It's Creative: 87, Happy: 91, Euphoric: 78, Relaxed: 65, Energetic: 72. That precision is what makes the matching meaningful.
The kD-Tree: Finding Your Nearest Neighbour in 33 Dimensions
When you tell Alfy you want to feel "Happy, Creative, and a little Relaxed," you're not giving it a search term — you're defining a coordinate. The system converts your selection into the same kind of 33-dimensional vector that every strain already occupies. Then it asks: which strain is closest?
That's the job of the kD-Tree (k-Dimensional Tree) — a data structure built specifically for finding nearest neighbours in high-dimensional space, fast — a technique whose mathematical foundations were established in Bentley's seminal 1975 paper on multidimensional binary search trees.[1]
"A kD-Tree doesn't search every strain every time. It partitions space into regions, prunes impossible branches, and navigates to the answer in a fraction of the time. Across thousands of strains and 33 dimensions, it's effectively instant."
— How Alfy's RecommendByEffects tool worksBefore the tree is built, every effect score is passed through a MinMaxScaler — normalised to a 0–100 range across the entire dataset. This ensures that a score of 75 on "Euphoric" and a score of 75 on "Pain relief" mean the same thing in geometric terms: 75% of the way from minimum to maximum for that dimension. Without normalisation, dimensions with wider natural ranges would dominate the distance calculation unfairly — a well-documented failure mode in any distance-based model, and the reason scikit-learn's own preprocessing documentation classifies MinMaxScaler as a non-negotiable step before geometric similarity computation.[3]
Here's what that looks like for a real query: "I want to feel Happy, Creative, and Relaxed"
🔍 Query vector vs. Blue Dream's profile
The kD-Tree returns the 5 closest strains by Euclidean distance — not by opinion, not by trending ranking, not by sponsored placement. Pure geometry. Research on dynamic multi-dimensional spatial indexing confirms that balanced kD-Tree structures deliver predictable tail latency precisely because hyperplane pruning eliminates immense volumes of search space before the nearest-neighbour is reached.[2] The strain that lives closest to the emotional coordinate you described wins the recommendation.
And unlike a keyword search, this catches things that a label would miss. A strain that's 88 on Relaxed and 82 on Happy might never appear in results for "chill creative vibes" — but the kD-Tree finds it because its numbers put it right next to yours.
Emotional Literacy as Coordinates
The bridge between how you feel and what the kD-Tree searches is what Alfy calls emotional literacy mapping. Every option you select in the app — "🧘 Meditation", "🧠 Brain Waves", "😴 Zzz" — translates directly into an effect dimension at a specific intensity.
Premium options work inversely — if you select "🍕 Avoid Munchies", the Hungry dimension is set to 0, pulling the results toward strains that score low on that dimension. You're not just describing what you want — you're also specifying what to avoid, and the geometry handles both simultaneously.
Five Specialised Tools — One Intelligent Agent
The kD-Tree is the precision instrument, but it's one of five tools the agent can reach for. Each is designed for a distinct type of question:
- SearchStrainsByMood — natural-language semantic search
- RecommendByEffects — kD-Tree nearest-neighbour on your effect vector
- SearchByMedicalCondition — ranked lookup by medical dimension score
- SearchStrainOnline — live web fallback for strains not in the database
- GetStrainDetails — exact full-profile lookup by strain name
The art is in knowing which tool — or which combination — fits the question. That's the agent's job.
The Agent: Deciding Which Tool to Use
None of these tools matter if they're used at the wrong time. Alfy is built on a LangGraph ReAct agent — a loop that thinks before it acts: Reason → Act → Observe → Repeat. This paradigm — proved in the landmark "ReAct: Synergizing Reasoning and Acting in Language Models" paper (Yao et al., 2022) — dramatically reduces hallucinations by grounding every response in actual tool observations rather than the model's parametric memory.[7]
For each query, the agent reads the conversation, decides which tool fits, calls it, reads the output, and decides whether to act again or synthesise a final answer. It runs this loop up to 20 times per query — deep enough to handle complex multi-condition requests without getting stuck.
🗣️ You say something
A chip selection, a typed message, or both. Your input arrives with 6 turns of conversation history for context.
🧠 Agent reasons
The LangGraph ReAct agent reads the full context and decides: is this a mood query (semantic search), an effects query (kD-Tree), a medical need, or a named strain lookup? It might call multiple tools in sequence.
📐 Tools execute
The chosen tool runs — the kD-Tree computes distances, Chroma finds semantic matches, or Tavily fetches the web. Results come back as structured data.
🔁 Iterate if needed
If the result is incomplete — e.g. a named strain isn't in the database — the agent calls another tool (web search) rather than giving up. Up to 20 tool calls per query.
✨ Synthesise the answer
The agent writes a warm, human response — 2–3 specific strains with explanations grounded in the actual data retrieved. No hallucination: every recommendation came from the database or the web.
Why This Is Better Than a Filter
Most cannabis apps let you filter by type and effect tag. That's useful, but it's brittle. Enterprise vector search analysis has documented how boolean filter logic acts as a binary gatekeeper — dropping matching entities entirely the moment a single criterion isn't met, producing false negatives at scale.[4] Filters work by exact match — if a strain isn't tagged "creative", it won't appear in the creative filter, even if its Creative score is 85. And they can't handle nuance — you can't tell a filter "I want something happy but not too energetic, and I sometimes get headaches."
The kD-Tree handles all of this naturally. "Happy but not too energetic" is just a vector where Happy is high and Energetic is low. "I sometimes get headaches" maps to keeping the Headache negative-effect dimension low. The geometry of the space handles the trade-offs — the nearest neighbour is the strain that best satisfies all your constraints simultaneously, not just the ones that are easy to filter for.
"The difference between a filter and a vector search is the difference between a yes/no question and a distance. Filters exclude. Vectors rank. And ranking always finds something useful — even when nothing is a perfect match."
Vector index platforms have independently demonstrated this property: mapping preferences into a continuous geometric space means results are always ranked by proximity rather than eliminated by rigid Boolean constraints, making the system resilient to partial matches.[5][6]
The semantic search layer adds another dimension: it understands language. "Something for a creative Sunday morning" will find strains whose descriptions and effect profiles cluster around that concept — without you having to know the word "Sativa" or the effect name "Creative." The embedding model has already learned what that phrase means.
The Result: A Recommendation That Earns Its Place
When Alfy recommends 🌿 Blue Dream, it's not because Blue Dream is popular, or because it showed up in the right ad buy. It's because Blue Dream's 33-dimensional vector sat closer to your emotional coordinate than every other strain in the database.
That's a claim that can be verified, challenged, and improved. As the database grows, as the emotional mappings get more refined, as your preferences get captured across sessions — the distances get tighter and the recommendations get sharper.
This is what it looks like when a recommendation engine is built around emotional literacy rather than popularity. Not what people usually want. What you actually need, right now, measured in the most precise language we have: mathematics.
- Bentley, J. L. (1975). "Multidimensional Binary Search Trees Used in Database Applications." DTIC Technical Report. — Foundational proof of kD-Tree nearest-neighbour search efficiency.
- Zhao et al. (2025). Dynamic Multi-Dimensional Spatial Indexing. Parallel Data Laboratory, Carnegie Mellon University. — Documents predictable tail latency via hyperplane pruning in balanced kD-Tree structures.
- scikit-learn. "Preprocessing Data." scikit-learn Documentation. — Establishes MinMaxScaler as the standard normalisation step before Euclidean distance computation in ML pipelines.
- Elasticsearch Labs. "Vector Search Filtering Analysis." Elastic Search Labs Blog. — Breakdown of how hard boolean filters generate false negatives in complex intent matching.
- Qdrant. "Vector Search Filtering." Qdrant Technical Articles. — Documents how vector space proximity ranking outperforms binary filter constraints for nuanced queries.
- Redis. "Vector Search Guide." Redis Blog. — Overview of continuous-spectrum ranking as a superior alternative to rigid tag-based filtering.
- Yao, S., et al. (2022). "ReAct: Synergizing Reasoning and Acting in Language Models." arXiv preprint arXiv:2210.03629. — Proves that interleaving reasoning traces with tool actions reduces hallucination and improves task accuracy in LLM agents.
🧠 See It In Action
Tell Alfy how you're feeling. Watch the reasoning unfold in real time — which tools it calls, what it found, why it chose what it chose.
Try Alfy Free →