Semantic Search Overview

GrafitoDB combines the structural power of property graph databases with the semantic understanding of vector embeddings, enabling a new class of intelligent graph applications.

Why Combine Semantic Search with Knowledge Graphs?

Traditional knowledge graphs excel at structural reasoning (finding paths, relationships, patterns), while semantic search excels at understanding meaning. Together, they create a powerful synergy:

Find nodes by meaning, then traverse their relationships:

# Find documents about "machine learning" semantically
results = db.semantic_search("machine learning techniques", k=5)

# Then navigate to related entities
for result in results:
    doc_node = result["node"]
    # Find authors of these documents
    authors = db.get_neighbors(doc_node.id, direction="outgoing", rel_type="AUTHORED_BY")
    # Find cited papers
    citations = db.get_neighbors(doc_node.id, direction="outgoing", rel_type="CITES")

2. Context-Aware Retrieval

Use graph structure to inform semantic search:

# Find papers semantically similar to "neural networks"
papers = db.semantic_search("neural networks", k=10, filter_labels=["Paper"])

# For each paper, get its citation network
for paper_result in papers:
    paper = paper_result["node"]

    # Get papers this paper cites (outgoing edges)
    references = db.get_neighbors(paper.id, direction="outgoing", rel_type="CITES")

    # Get papers that cite this paper (incoming edges)
    cited_by = db.get_neighbors(paper.id, direction="incoming", rel_type="CITES")

    # Find common authors
    authors = db.get_neighbors(paper.id, rel_type="AUTHORED_BY")

3. Multi-Hop Semantic Queries

Combine semantic similarity with graph traversal:

// Find papers semantically similar to a query
CALL db.vector.search('papers_vec', $query_vector, 5)
YIELD node AS paper, score

// Then find co-authors of those papers
MATCH (paper)-[:AUTHORED_BY]->(author)-[:AUTHORED_BY]->(other_paper)
WHERE other_paper <> paper
RETURN paper.title, author.name, collect(other_paper.title) AS coauthor_papers

4. Hybrid Ranking

Combine semantic similarity with graph metrics (PageRank, centrality, citation count):

results = db.semantic_search("deep learning", k=20)

for result in results:
    node = result["node"]
    semantic_score = result["score"]

    # Calculate graph-based importance
    citation_count = len(db.get_neighbors(node.id, direction="incoming", rel_type="CITES"))

    # Hybrid score
    hybrid_score = 0.7 * semantic_score + 0.3 * (citation_count / 100)

    result["hybrid_score"] = hybrid_score

5. Question Answering with Graph Context

Build RAG systems with rich relationship context:

# User question: "Who are the leading researchers in reinforcement learning?"
query_vector = embedder(["reinforcement learning research"])[0]

# Find relevant papers semantically
papers = db.semantic_search(query_vector, k=10, filter_labels=["Paper"])

# Get authors and their collaboration networks
for paper in papers:
    authors = db.get_neighbors(paper["node"].id, rel_type="AUTHORED_BY")
    for author in authors:
        # Get author's other papers
        other_papers = db.get_neighbors(author.id, direction="incoming", rel_type="AUTHORED_BY")
        # Get collaboration network
        collaborators = db.execute("""
            MATCH (a:Author {id: $author_id})-[:AUTHORED_BY]-(p:Paper)-[:AUTHORED_BY]-(coauthor:Author)
            WHERE coauthor <> a
            RETURN coauthor, count(p) AS num_collaborations
            ORDER BY num_collaborations DESC
        """, {"author_id": author.id})

Architecture

How It Works

GrafitoDB's semantic search implementation consists of three main components:

┌─────────────────────────────────────────────────────────────┐
│                       GrafitoDB Database                     │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────┐      ┌──────────────┐     ┌───────────┐  │
│  │   Nodes      │      │  Embeddings  │     │  Vector   │  │
│  │  (SQLite)    │◄────►│   (SQLite)   │────►│  Index    │  │
│  │              │      │              │     │ (Memory)  │  │
│  │ id | props   │      │ node_id |    │     │           │  │
│  │ 1  | {...}   │      │ vector       │     │  FAISS/   │  │
│  │ 2  | {...}   │      │              │     │  HNSW/    │  │
│  └──────────────┘      └──────────────┘     │  Annoy    │  │
│                                              └───────────┘  │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │            Embedding Function                        │  │
│  │  (OpenAI / Cohere / HuggingFace / Ollama / etc.)    │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Data Flow

Node Creation: Create nodes with properties (text, metadata)
Embedding Generation: Convert text properties to vectors using embedding functions
Index Insertion: Store vectors in specialized vector indexes for fast similarity search
Query: Convert query text to vector, search index for nearest neighbors
Retrieval: Return nodes with their properties and similarity scores
Graph Traversal: Navigate relationships from retrieved nodes

Key Capabilities

Feature	Description
Multiple Embedding Providers	OpenAI, Cohere, HuggingFace, Ollama, and more
Multiple ANN Backends	FAISS, HNSWlib, Annoy, LEANN, USearch, Voyager
Similarity Metrics	Cosine similarity, L2 distance, and inner product
Property Filtering	Combine semantic search with graph structure filters
Reranking	Improve precision with exact reranking
Cypher Integration	Native `CALL db.vector.search()` procedure
Persistent Storage	Save and load vector indexes

Real-World Applications

Domain	Use Case
Academic Research	Semantic paper discovery + citation networks
E-commerce	Product similarity + purchase patterns and user behavior graphs
Healthcare	Symptom matching + patient history and treatment pathways
Enterprise Knowledge Management	Document similarity + organizational hierarchies
Recommendation Systems	Content similarity + social graphs and interaction patterns
Fraud Detection	Anomaly detection + transaction networks
Chatbots/Assistants	Semantic understanding + knowledge graphs for context

Next Steps

Learn about Vector Search - Core functionality
Explore ANN Backends - Choose the right backend
Understand Hybrid Search - Combine text and vector search
Set up Embeddings - Configure embedding providers