Use the helix-py Python SDK to interact with HelixDB, a high-performance graph-vector database. The SDK provides a simple query interface and PyTorch-like API for defining custom graph queries and vector operations, making it ideal for similarity search, knowledge graphs, and machine learning pipelines.
How do I install the Python SDK?
Install helix-py using your preferred Python package manager:
How do I connect to HelixDB with Python?
Set up a Client to connect to your HelixDB instance:
import helix
# Connect to a local helix instance
db = helix.Client(local=True, verbose=True)
# Note that the query name is case sensitive
db.query('add_user', {"name": "John", "age": 20})
The default port is 6969, but you can change it by passing in the port parameter.
For cloud instances, you can pass in the api_endpoint parameter.
How do I execute queries with the Python SDK?
Define queries in a PyTorch-like manner, similar to neural network forward passes. Use built-in queries for common operations or define custom queries for complex workflows.
PyTorch-like query definition
Match your HelixQL queries with Python classes:
QUERY add_user(name: String, age: I64) =>
usr <- AddV<User>({name: name, age: age})
RETURN usr
You can define a matching Python class:
from helix.client import Query
from helix.types import Payload
class add_user(Query):
def __init__(self, name: str, age: int):
super().__init__()
self.name = name
self.age = age
def query(self) -> Payload:
return [{ "name": self.name, "age": self.age }]
def response(self, response):
return response
db.query(add_user("John", 20))
Make sure that the Query.query method returns a list of objects.
How do I manage HelixDB instances with Python?
Use the Instance class to manage HelixDB lifecycle automatically within your Python scripts:
from helix.instance import Instance
helix_instance = Instance("helixdb-cfg", 6969, verbose=True)
# Deploy & redeploy instance
helix_instance.deploy()
# Start instance
helix_instance.start()
# Stop instance
helix_instance.stop()
# Delete instance
helix_instance.delete()
# Instance status
print(helix_instance.status())
helixdb-cfg is the directory where the configuration files are stored.
and from there you can interact with the instance using Client.
The instance will be automatically stopped when the script exits.
How do I use LLM providers with HelixDB?
Integrate popular LLM providers directly with HelixDB using built-in provider interfaces.
Available providers:
OpenAIProvider
GeminiProvider
AnthropicProvider
Don’t forget to set the OPENAI_API_KEY, GEMINI_API_KEY, and ANTHROPIC_API_KEY environment variables depending on the provider you are using.
All providers expose two methods:
enable_mcps(name: str, url: str=...) -> bool to enable Helix MCP tools
generate(messages, response_model: BaseModel | None=None) -> str | BaseModel
The generate method supports messages in the 2 formats:
- Free-form text: pass a string
- Message lists: pass a list of
dict or provider-specific Message models
It also supports structured outputs by passing a Pydantic model to get validated results.
Example:
from pydantic import BaseModel
# OpenAI
from helix.providers.openai_client import OpenAIProvider
openai_llm = OpenAIProvider(
name="openai-llm",
instructions="You are a helpful assistant.",
model="gpt-5-nano",
history=True
)
print(openai_llm.generate("Hello!"))
class Person(BaseModel):
name: str
age: int
occupation: str
print(openai_llm.generate([{"role": "user", "content": "Who am I?"}], Person))
To enable MCP tools with a running Helix MCP server (see MCP Feature):
openai_llm.enable_mcps("helix-mcp") # uses default http://localhost:8000/mcp/
gemini_llm.enable_mcps("helix-mcp") # uses default http://localhost:8000/mcp/
anthropic_llm.enable_mcps("helix-mcp", url="https://your-remote-mcp/...")
- OpenAI GPT-5 family models support reasoning while other models use temperature.
- Anthropic local streamable MCP is not supported; use a URL-based MCP.
How do I generate embeddings with HelixDB?
Use built-in embedder interfaces to generate vector embeddings from popular providers.
Available embedders:
OpenAIEmbedder
GeminiEmbedder
VoyageAIEmbedder
Each embedder implements:
embed(text: str, **kwargs) returns a vector [F64]
embed_batch(texts: List[str], **kwargs) returns a list of vectors [F64]
Examples (see examples/llm_providers/providers.ipynb for more):
from helix.embedding.openai_client import OpenAIEmbedder
openai_embedder = OpenAIEmbedder() # requires OPENAI_API_KEY
vec = openai_embedder.embed("Hello world")
batch = openai_embedder.embed_batch(["a", "b", "c"])
from helix.embedding.gemini_client import GeminiEmbedder
gemini_embedder = GeminiEmbedder()
vec = gemini_embedder.embed("doc text", task_type="RETRIEVAL_DOCUMENT")
from helix.embedding.voyageai_client import VoyageAIEmbedder
voyage_embedder = VoyageAIEmbedder()
vec = voyage_embedder.embed("query text", input_type="query")
How do I chunk text for embeddings?
Split text into manageable pieces using Chonkie chunking methods:
from helix import Chunk
text = "Your long document text here..."
chunks = Chunk.token_chunk(text)
semantic_chunks = Chunk.semantic_chunk(text)
code_text = "def hello(): print('world')"
code_chunks = Chunk.code_chunk(code_text, language="python")
texts = ["Document 1...", "Document 2...", "Document 3..."]
batch_chunks = Chunk.sentence_chunk(texts)
You can find all the different chunking examples inside of Chunking Feature.
How do I load data into HelixDB?
The loader supports .parquet, .fvecs, and .csv files. Pass your file path and column names to automatically load data into your queries.
For more information, check out our examples!