Skip to main content
Zo Datasets is a standardized way to organize any data into a format that Zo deeply understands. Once your data is in a dataset, Zo can answer questions about it, run analyses, generate reports, and build things on top of it.

What is a dataset?

A dataset is a folder with a specific structure that Zo recognizes. It includes:
  • Your data in a queryable format (typically DuckDB)
  • Schema documentation describing every table and column
  • Context explaining what the data represents
This structure means Zo always knows exactly what data is available and how to work with it — no guessing, no confusion about column names or data types.

What can you do with datasets?

Once your data is in a dataset, you can:
  • Ask questions — “What was the average interest rate in 2020?” or “Show me all entries from Q4”
  • Generate reports — Create research summaries, trend analyses, or data-backed documents
  • Build artifacts — Charts, dashboards, or apps that use your data
  • Cross-reference — Combine multiple datasets or link to files in your workspace
Zo reads the schema and documentation to understand your data, then writes and executes queries to answer your questions.

Creating datasets

There are several ways to create a dataset:

From a template

Zo includes templates for common data sources. Select a template when creating a new dataset, and Zo will provide pre-built ingestion scripts and instructions for getting your data. Current templates include personal data exports (Spotify, Google Takeout, iMessage, GitHub, Amazon, and more). More templates for research, financial, and public datasets are coming soon.

From any data files

Have CSV files, JSON exports, or database files? Create a blank dataset, drop your files in, and ask Zo to help you build an ingestion script. Zo can parse most common formats and structure them into a queryable database.

Ask Zo to collect data

Need data you don’t have? Just ask. Zo can search the web, download files, and structure them into a dataset. For example:
  • “Find historical S&P 500 data and create a dataset”
  • “Gather population data by country and set up a dataset”
  • “Download weather data for the past 10 years and organize it”
Zo will find the data sources, download the files, and create a properly structured dataset you can query.

Getting started

To create a new dataset, go to the Sites section () and select the Datasets tab. Click New Dataset, give it a name, and optionally select a template. After creating the dataset, Zo will help you:
  1. Get your data into the source/ folder (upload, download, or collect)
  2. Run or create an ingestion script to structure the data
  3. Generate documentation and schema
Datasets are just folders in your workspace, typically under zo-data/. Clicking on a dataset folder opens a dedicated view showing the schema, documentation, and related chats.

Personal data templates

One compelling use of datasets is analyzing your own data exports. Services like Spotify, Google, and Amazon let you export your data (thanks to GDPR and CCPA), but these exports are usually opaque JSON or CSV files that are hard to make sense of. Zo includes templates for these common exports:
SourceWhat’s included
SpotifyLifetime listening history via extended streaming history export
Google TakeoutChrome, YouTube, Calendar, Maps, Gmail, and more
Apple HealthHealth app data exported from iPhone
iMessageMessages database from Mac
GitHubAccount data (commits, issues, repos)
AmazonOrder history and account data
Claude (Anthropic)Conversation history
X (Twitter)Tweet archive
LetterboxdFilm diary and ratings
DoorDashOrder history
23andMeGenetic data
American ExpressTransaction history
AppleGeneral Apple account data
Each template includes instructions for how to export your data and pre-built scripts to parse it.

How datasets work

The folder structure

my-dataset/
├── datapackage.json     # Marks this as a dataset (required)
├── data.duckdb          # The queryable database
├── README.md            # Documentation about the data
├── schema.yaml          # Auto-generated table and column reference
├── PROCESS.md           # Workflow guide for the AI
├── generate_schema.py   # Script to extract schema from database
├── ingest/
│   ├── INGEST.prompt.md # AI instructions for ingestion
│   └── ingest.py        # The ETL script
└── source/              # Raw data files (CSVs, JSONs, exports, etc.)
The only required file is datapackage.json — its presence tells Zo that a folder is a dataset. Everything else enhances the experience.

Why this structure?

The key insight is that Zo needs context to work effectively with data:
  • schema.yaml tells Zo exactly what tables and columns exist, with types and descriptions
  • README.md explains what the data represents, its coverage period, and nuances
  • PROCESS.md guides Zo through ingestion and maintenance
With this context, Zo can write accurate queries, understand edge cases, and generate meaningful analyses — not just guess based on column names.
Zo uses DuckDB by default. DuckDB is fast, portable (your database is just a file), and great for analytical queries. You can also use SQLite or other formats.
Yes. You can ask Zo to run queries, or click on the data.duckdb file to open Zo’s built-in database viewer.
Zo can parse most common formats: CSV, JSON, XML, Parquet, and more. Create a blank dataset, drop your files in source/, and ask Zo to help write an ingestion script.

Example queries

Once your data is in a dataset, just ask questions: Research data:
  • “What’s the correlation between these two variables?”
  • “Show me the trend over the last 5 years”
  • “Generate a summary report of key findings”
Personal data:
  • “What are my top 10 most-played artists on Spotify?”
  • “How much did I spend on Amazon in 2023?”
  • “Who do I text the most on iMessage?”
Business data:
  • “What were our top-selling products last quarter?”
  • “Show customer acquisition by month”
  • “Which regions had the highest growth?”

Tips

Keep your source files. The source/ folder preserves your raw data. You can re-run ingestion or modify scripts to extract additional fields later.
Check the README. After ingestion, Zo generates documentation explaining what data is available, the time range covered, and example queries.
Schema is your reference. Open schema.yaml to see every table, column, type, and description — useful when asking complex questions.

Privacy

Your data stays on your Zo server. Datasets are stored in your workspace like any other files. When you query your data, Zo reads the schema and runs queries locally — your raw data never leaves your Zo.