Home The Paradox Why Alfy How It Works ✦ Pro ✦ For Business Try Alfy →
⚙️ Under the Hood

Thousands of Strains. 33 Dimensions. Zero Guesswork.

✍️ Day ™ Production
🕐 8 min read
🧠 Deep Dive

Most recommendation engines guess. They take a category — "Indica", "relaxing", "for sleep" — and return whatever happens to rank highest for that tag. Alfy doesn't guess. It computes. Under the hood is a kD-Tree powered by 33-dimensional emotional vectors, a semantic search engine that understands language the way you do, and an AI agent intelligent enough to know which tool to reach for. Here's exactly how it works.

Alfy the botanical AI — a glowing cannabis intelligence avatar at the intersection of plant science and machine learning
Alfy is more than a recommender — it's a botanical intelligence trained across thousands of strains and 33 emotional dimensions.

The Database: Thousands of Strains, Scored Across 33 Dimensions

Everything starts with the data. Alfy's database holds thousands of cannabis strains, and for each one, every effect has been scored and normalised — not just named. This isn't a tag list. It's a numerical fingerprint.

1,000s
Strains in the database
33
Effect dimensions per strain
5
Specialised search tools

Each strain record contains three categories of scored effects, measured as normalised floats between 0 and 100:

✨ 13 Positive Effects
ArousedCreative EnergeticEuphoric FocusedGiggly HappyHungry RelaxedSleepy TalkativeTingly Uplifted
🩹 14 Medical Conditions
CrampsDepression Eye PressureFatigue HeadachesInflammation InsomniaLack of Appetite Muscle SpasmsNausea PainSeizures SpasticityStress
⚠️ 6 Negative Effects
AnxiousDizzy Dry EyesDry Mouth HeadacheParanoid

Put them together and each strain becomes a point in 33-dimensional space — a coordinate that describes not just what it does, but how much it does it. Blue Dream isn't just "Creative." It's Creative: 87, Happy: 91, Euphoric: 78, Relaxed: 65, Energetic: 72. That precision is what makes the matching meaningful.

3D grid of glowing data points representing thousands of strains plotted across 33 effect dimensions
Thousands of strains, each a point in 33-dimensional space — plotted by their effect scores across positive effects, medical conditions, and negatives.

The kD-Tree: Finding Your Nearest Neighbour in 33 Dimensions

When you tell Alfy you want to feel "Happy, Creative, and a little Relaxed," you're not giving it a search term — you're defining a coordinate. The system converts your selection into the same kind of 33-dimensional vector that every strain already occupies. Then it asks: which strain is closest?

That's the job of the kD-Tree (k-Dimensional Tree) — a data structure built specifically for finding nearest neighbours in high-dimensional space, fast — a technique whose mathematical foundations were established in Bentley's seminal 1975 paper on multidimensional binary search trees.[1]

"A kD-Tree doesn't search every strain every time. It partitions space into regions, prunes impossible branches, and navigates to the answer in a fraction of the time. Across thousands of strains and 33 dimensions, it's effectively instant."

— How Alfy's RecommendByEffects tool works

Before the tree is built, every effect score is passed through a MinMaxScaler — normalised to a 0–100 range across the entire dataset. This ensures that a score of 75 on "Euphoric" and a score of 75 on "Pain relief" mean the same thing in geometric terms: 75% of the way from minimum to maximum for that dimension. Without normalisation, dimensions with wider natural ranges would dominate the distance calculation unfairly — a well-documented failure mode in any distance-based model, and the reason scikit-learn's own preprocessing documentation classifies MinMaxScaler as a non-negotiable step before geometric similarity computation.[3]

Here's what that looks like for a real query: "I want to feel Happy, Creative, and Relaxed"

🔍 Query vector vs. Blue Dream's profile

Happy
91
Creative
87
Relaxed
65
Euphoric
78
Energetic
72
Stress
68
Euclidean distance to query 🌿 Blue Dream — distance: 18.4
kD-Tree partitioned space with a glowing green search path navigating to the nearest data point
The kD-Tree partitions space into regions and navigates directly to the nearest neighbour — pruning impossible branches without checking every strain.

The kD-Tree returns the 5 closest strains by Euclidean distance — not by opinion, not by trending ranking, not by sponsored placement. Pure geometry. Research on dynamic multi-dimensional spatial indexing confirms that balanced kD-Tree structures deliver predictable tail latency precisely because hyperplane pruning eliminates immense volumes of search space before the nearest-neighbour is reached.[2] The strain that lives closest to the emotional coordinate you described wins the recommendation.

And unlike a keyword search, this catches things that a label would miss. A strain that's 88 on Relaxed and 82 on Happy might never appear in results for "chill creative vibes" — but the kD-Tree finds it because its numbers put it right next to yours.

Emotional Literacy as Coordinates

The bridge between how you feel and what the kD-Tree searches is what Alfy calls emotional literacy mapping. Every option you select in the app — "🧘 Meditation", "🧠 Brain Waves", "😴 Zzz" — translates directly into an effect dimension at a specific intensity.

# How "Meditation" becomes a search vector MoodOption( ui_label="Meditation", score_type="EffectScores", effect_name="Relaxed", on_select=100, # full intensity ) # "Brain Waves" maps to: MoodOption( ui_label="Brain Waves", score_type="EffectScores", effect_name="Creative", on_select=100, ) # Combined: {"Relaxed": 100, "Creative": 100} # → fed directly into the kD-Tree query vector

Premium options work inversely — if you select "🍕 Avoid Munchies", the Hungry dimension is set to 0, pulling the results toward strains that score low on that dimension. You're not just describing what you want — you're also specifying what to avoid, and the geometry handles both simultaneously.

Wellness vibe finder on a phone with green selection chips
Each chip you tap becomes a coordinate. By the time you hit send, Alfy has already built your emotional vector and is calculating distances.

Five Specialised Tools — One Intelligent Agent

The kD-Tree is the precision instrument, but it's one of five tools the agent can reach for. Each is designed for a distinct type of question:

The art is in knowing which tool — or which combination — fits the question. That's the agent's job.

🔍
SearchStrainsByMood
Semantic vector search via Chroma. Every strain's description, effects, and metadata are embedded into a high-dimensional language vector. Your query is embedded too, and the closest strains by cosine similarity are returned.
Used when: you type in natural language — "something chill for a movie night"
📐
RecommendByEffects
The kD-Tree engine. Takes a numeric effect vector — built from your chip selections or the agent's interpretation — and finds the 5 nearest strains by Euclidean distance in 33-dimensional space.
Used when: specific effects or intensities are requested
🩹
SearchByMedicalCondition
Ranks all strains by their normalised score on a specific medical dimension. Returns the top matches by condition score — direct, ranked, no geometry needed.
Used when: a medical need is mentioned — pain, insomnia, anxiety
🌐
SearchStrainOnline
Tavily web search fallback. When a strain isn't in the local database — a new cultivar, a regional exclusive, something with an unusual name — Alfy reaches out to the web and synthesises what it finds.
Used when: the strain isn't in the local database
🔬
GetStrainDetails
Exact lookup by strain name. Full record: category, rating, description, all effect scores. When you know what you're looking for and just want the full profile.
Used when: you ask about a specific named strain

The Agent: Deciding Which Tool to Use

None of these tools matter if they're used at the wrong time. Alfy is built on a LangGraph ReAct agent — a loop that thinks before it acts: Reason → Act → Observe → Repeat. This paradigm — proved in the landmark "ReAct: Synergizing Reasoning and Acting in Language Models" paper (Yao et al., 2022) — dramatically reduces hallucinations by grounding every response in actual tool observations rather than the model's parametric memory.[7]

For each query, the agent reads the conversation, decides which tool fits, calls it, reads the output, and decides whether to act again or synthesise a final answer. It runs this loop up to 20 times per query — deep enough to handle complex multi-condition requests without getting stuck.

01

🗣️ You say something

A chip selection, a typed message, or both. Your input arrives with 6 turns of conversation history for context.

02

🧠 Agent reasons

The LangGraph ReAct agent reads the full context and decides: is this a mood query (semantic search), an effects query (kD-Tree), a medical need, or a named strain lookup? It might call multiple tools in sequence.

03

📐 Tools execute

The chosen tool runs — the kD-Tree computes distances, Chroma finds semantic matches, or Tavily fetches the web. Results come back as structured data.

04

🔁 Iterate if needed

If the result is incomplete — e.g. a named strain isn't in the database — the agent calls another tool (web search) rather than giving up. Up to 20 tool calls per query.

05

✨ Synthesise the answer

The agent writes a warm, human response — 2–3 specific strains with explanations grounded in the actual data retrieved. No hallucination: every recommendation came from the database or the web.

AI agent at the centre routing queries to multiple specialised tools — semantic search, kD-Tree, medical filter, web search
The ReAct agent at the centre — reading your query, choosing the right tool, observing the result, and deciding whether to iterate or answer.

Why This Is Better Than a Filter

Most cannabis apps let you filter by type and effect tag. That's useful, but it's brittle. Enterprise vector search analysis has documented how boolean filter logic acts as a binary gatekeeper — dropping matching entities entirely the moment a single criterion isn't met, producing false negatives at scale.[4] Filters work by exact match — if a strain isn't tagged "creative", it won't appear in the creative filter, even if its Creative score is 85. And they can't handle nuance — you can't tell a filter "I want something happy but not too energetic, and I sometimes get headaches."

The kD-Tree handles all of this naturally. "Happy but not too energetic" is just a vector where Happy is high and Energetic is low. "I sometimes get headaches" maps to keeping the Headache negative-effect dimension low. The geometry of the space handles the trade-offs — the nearest neighbour is the strain that best satisfies all your constraints simultaneously, not just the ones that are easy to filter for.

"The difference between a filter and a vector search is the difference between a yes/no question and a distance. Filters exclude. Vectors rank. And ranking always finds something useful — even when nothing is a perfect match."

Vector index platforms have independently demonstrated this property: mapping preferences into a continuous geometric space means results are always ranked by proximity rather than eliminated by rigid Boolean constraints, making the system resilient to partial matches.[5][6]

The semantic search layer adds another dimension: it understands language. "Something for a creative Sunday morning" will find strains whose descriptions and effect profiles cluster around that concept — without you having to know the word "Sativa" or the effect name "Creative." The embedding model has already learned what that phrase means.


The Result: A Recommendation That Earns Its Place

When Alfy recommends 🌿 Blue Dream, it's not because Blue Dream is popular, or because it showed up in the right ad buy. It's because Blue Dream's 33-dimensional vector sat closer to your emotional coordinate than every other strain in the database.

That's a claim that can be verified, challenged, and improved. As the database grows, as the emotional mappings get more refined, as your preferences get captured across sessions — the distances get tighter and the recommendations get sharper.

This is what it looks like when a recommendation engine is built around emotional literacy rather than popularity. Not what people usually want. What you actually need, right now, measured in the most precise language we have: mathematics.

References
  1. Bentley, J. L. (1975). "Multidimensional Binary Search Trees Used in Database Applications." DTIC Technical Report. — Foundational proof of kD-Tree nearest-neighbour search efficiency.
  2. Zhao et al. (2025). Dynamic Multi-Dimensional Spatial Indexing. Parallel Data Laboratory, Carnegie Mellon University. — Documents predictable tail latency via hyperplane pruning in balanced kD-Tree structures.
  3. scikit-learn. "Preprocessing Data." scikit-learn Documentation. — Establishes MinMaxScaler as the standard normalisation step before Euclidean distance computation in ML pipelines.
  4. Elasticsearch Labs. "Vector Search Filtering Analysis." Elastic Search Labs Blog. — Breakdown of how hard boolean filters generate false negatives in complex intent matching.
  5. Qdrant. "Vector Search Filtering." Qdrant Technical Articles. — Documents how vector space proximity ranking outperforms binary filter constraints for nuanced queries.
  6. Redis. "Vector Search Guide." Redis Blog. — Overview of continuous-spectrum ranking as a superior alternative to rigid tag-based filtering.
  7. Yao, S., et al. (2022). "ReAct: Synergizing Reasoning and Acting in Language Models." arXiv preprint arXiv:2210.03629. — Proves that interleaving reasoning traces with tool actions reduces hallucination and improves task accuracy in LLM agents.

🧠 See It In Action

Tell Alfy how you're feeling. Watch the reasoning unfold in real time — which tools it calls, what it found, why it chose what it chose.

Try Alfy Free →
🌿

Are you at least 21 years of age, or a qualified medical patient 18 or older?

Alfy provides cannabis strain recommendations. Access is restricted to adults 21+, or qualified medical patients 18 or older.