What is Vanna AI?

An open-source (MIT) RAG framework for Text-to-SQL that generates accurate SQL from natural language questions. Train it on your database DDL, documentation, and SQL examples, and it uses Agentic Retrieval to produce high-precision queries. Supports major databases including PostgreSQL, Snowflake, and BigQuery, as well as leading LLMs from OpenAI, Anthropic, and others.

Business problems it solves

About "Vanna AI"

How to Use

Install

Add Vanna to your Python environment via pip. It integrates with Jupyter, Flask (FastAPI), Streamlit, Slack, and more depending on your use case.
Choose an LLM and vector store

Configure the LLM you want to use — OpenAI, Anthropic, Google Gemini, or Ollama (local models) — along with a vector store for storing training data.
Train the model

Feed it your database DDL (table definitions), business documentation, and question-SQL pairs. This training data becomes the retrieval corpus for RAG, improving SQL generation accuracy.
Query in natural language

Ask questions like "What were the top 10 products by revenue last month?" Vanna performs Agentic Retrieval to find relevant context and generates the corresponding SQL.
Execute and visualize

Run the generated SQL against your database and retrieve results as tables, charts, or summaries. An embedded web chat component is available for interactive use.

Features

Natural language to SQL (Text-to-SQL)

Generates accurate SQL for your target database from natural language questions. RAG (Retrieval-Augmented Generation) grounds generation in your trained schema and business knowledge.
RAG training (DDL, documents, SQL examples)

Train on table definitions (DDL), business documentation, and question-SQL pairs to optimize generation for your own data.
Multi-database support

Works with PostgreSQL, MySQL, Snowflake, BigQuery, Redshift, SQLite, Oracle, SQL Server, DuckDB, ClickHouse, and other major databases.
Multi-LLM and vector store support

Swap in OpenAI, Anthropic, Google Gemini, Azure, AWS Bedrock, Mistral, or Ollama (local), and choose from a range of vector store backends.
Multi-turn conversation and visualization

Supports multi-turn dialogue and returns results as tables, charts, or summaries. An embedded web chat component is included.
Access control and auditing (higher tiers)

Row-level security (per-user queries), audit logs, and rate limiting for production deployments (Cloud/Enterprise).

Pricing

Pricing as of June 2026. The core framework is MIT-licensed open source and free to self-host (LLM API costs apply separately). Cloud/Enterprise pricing varies — check the official site for the latest.

Plan	Cost	What's included
OSS (self-hosted)	Free	MIT-licensed framework. Install via pip and run in your own environment (LLM API costs billed separately)
Cloud (hosted)	Contact for pricing	Hosted version with access control, observability, audit logs, and other operational features
Enterprise	Contact for pricing	Custom arrangements for large organizations

※ The OSS framework itself is free, but LLM API costs (e.g., OpenAI) apply separately. Note that the official GitHub repository was archived (read-only) on March 29, 2026. Check the official site for the current availability before use.

Pros & Cons

Pros

Query internal data in natural language — accessible even to team members unfamiliar with SQL
MIT-licensed OSS; self-hosting is free and integrates into your own environment
Freely combine any major DB, LLM, or vector store
Training on DDL and business knowledge optimizes generation for your specific data

Cons

Setup and operation require basic knowledge of Python and RAG
SQL accuracy depends on training data quality; validation and guardrails are essential
UI and documentation are primarily in English
The official repository is archived; check the official site for future update plans

Reviews & Reputation

Engineers and data teams praise it: "Once trained on DDL and business knowledge, it generates accurate SQL tailored to our data."
"The flexibility to self-host on OSS and combine with your preferred LLM and DB is a real advantage."
Common caveats: "English-centric," "always validate generated SQL," and "the quality of training data makes or breaks it."

FAQ

Q. Is Vanna AI free to use?

The core framework is MIT-licensed open source and free to self-host. However, LLM API costs (e.g., OpenAI) apply separately. A Cloud/Enterprise version with operational features is also available.

Q. Which databases are supported?

PostgreSQL, MySQL, Snowflake, BigQuery, Redshift, SQLite, Oracle, SQL Server, DuckDB, ClickHouse, and other major databases.

Q. Can I use it without knowing SQL?

Yes. You ask questions in natural language and get back SQL plus results (tables, charts, summaries), so team members unfamiliar with SQL can still analyze internal data. Validating generated SQL is recommended for critical analyses.

Q. Is Japanese supported?

The UI and documentation are primarily in English. A Japanese UI is not provided.

Vanna AI vs. Other Development & Data Tools

Aspect	Vanna AI	General-purpose code generation AI	Best for
Focus	Natural language → SQL (accuracy boosted by RAG)	General-purpose code generation	Vanna for querying internal data
Delivery	OSS (MIT) framework + Cloud	Varies by service	Useful when embedding into your own DB
Optimization	Trained on DDL, business knowledge, and SQL examples	General-purpose training	Best when you need to tune for your own data