Work with data

Zo Datasets is a standardized way to organize any data into a format that Zo deeply understands. Once your data is in a dataset, Zo can answer questions about it, run analyses, generate reports, and build things on top of it.

What is a dataset?

A dataset is a folder with a specific structure that Zo recognizes. It includes:

Your data in a queryable format (typically DuckDB)
Schema documentation describing every table and column
Context explaining what the data represents

This structure means Zo always knows exactly what data is available and how to work with it — no guessing, no confusion about column names or data types.

What can you do with datasets?

Once your data is in a dataset, you can:

Ask questions — “What was the average interest rate in 2020?” or “Show me all entries from Q4”
Generate reports — Create research summaries, trend analyses, or data-backed documents
Build artifacts — Charts, dashboards, or apps that use your data
Cross-reference — Combine multiple datasets or link to files in your workspace

Zo reads the schema and documentation to understand your data, then writes and executes queries to answer your questions.

Creating datasets

There are several ways to create a dataset:

From a template

Zo includes templates for common data sources. Select a template when creating a new dataset, and Zo will provide pre-built ingestion scripts and instructions for getting your data. Current templates include personal data exports (Spotify, Google Takeout, iMessage, GitHub, Amazon, and more). More templates for research, financial, and public datasets are coming soon.

From any data files

Have CSV files, JSON exports, or database files? Create a blank dataset, drop your files in, and ask Zo to help you build an ingestion script. Zo can parse most common formats and structure them into a queryable database.

Ask Zo to collect data

Need data you don’t have? Just ask. Zo can search the web, download files, and structure them into a dataset. For example:

“Find historical S&P 500 data and create a dataset”
“Gather population data by country and set up a dataset”
“Download weather data for the past 10 years and organize it”

Zo will find the data sources, download the files, and create a properly structured dataset you can query.

Getting started

To create a new dataset, go to the Sites section () and select the Datasets tab. Click New Dataset, give it a name, and optionally select a template. After creating the dataset, Zo will help you:

Get your data into the source/ folder (upload, download, or collect)
Run or create an ingestion script to structure the data
Generate documentation and schema

Datasets are just folders in your workspace, typically under zo-data/. Clicking on a dataset folder opens a dedicated view showing the schema, documentation, and related chats.

Personal data templates

One compelling use of datasets is analyzing your own data exports. Services like Spotify, Google, and Amazon let you export your data (thanks to GDPR and CCPA), but these exports are usually opaque JSON or CSV files that are hard to make sense of. Zo includes templates for these common exports:

Source	What’s included
Spotify	Lifetime listening history via extended streaming history export
Google Takeout	Chrome, YouTube, Calendar, Maps, Gmail, and more
Apple Health	Health app data exported from iPhone
iMessage	Messages database from Mac
GitHub	Account data (commits, issues, repos)
Amazon	Order history and account data
Claude (Anthropic)	Conversation history
X (Twitter)	Tweet archive
Letterboxd	Film diary and ratings
DoorDash	Order history
23andMe	Genetic data
American Express	Transaction history
Apple	General Apple account data

Each template includes instructions for how to export your data and pre-built scripts to parse it.

How datasets work

The folder structure

my-dataset/
├── datapackage.json     # Marks this as a dataset (required)
├── data.duckdb          # The queryable database
├── README.md            # Documentation about the data
├── schema.yaml          # Auto-generated table and column reference
├── PROCESS.md           # Workflow guide for the AI
├── generate_schema.py   # Script to extract schema from database
├── ingest/
│   ├── INGEST.prompt.md # AI instructions for ingestion
│   └── ingest.py        # The ETL script
└── source/              # Raw data files (CSVs, JSONs, exports, etc.)

The only required file is datapackage.json — its presence tells Zo that a folder is a dataset. Everything else enhances the experience.

Why this structure?

The key insight is that Zo needs context to work effectively with data:

schema.yaml tells Zo exactly what tables and columns exist, with types and descriptions
README.md explains what the data represents, its coverage period, and nuances
PROCESS.md guides Zo through ingestion and maintenance

With this context, Zo can write accurate queries, understand edge cases, and generate meaningful analyses — not just guess based on column names.

What database does Zo use?

Zo uses DuckDB by default. DuckDB is fast, portable (your database is just a file), and great for analytical queries. You can also use SQLite or other formats.

Can I explore the database directly?

Yes. You can ask Zo to run queries, or click on the data.duckdb file to open Zo’s built-in database viewer.

What if my data format isn't supported?

Zo can parse most common formats: CSV, JSON, XML, Parquet, and more. Create a blank dataset, drop your files in source/, and ask Zo to help write an ingestion script.

Example queries

Once your data is in a dataset, just ask questions: Research data:

“What’s the correlation between these two variables?”
“Show me the trend over the last 5 years”
“Generate a summary report of key findings”

Personal data:

“What are my top 10 most-played artists on Spotify?”
“How much did I spend on Amazon in 2023?”
“Who do I text the most on iMessage?”

Business data:

“What were our top-selling products last quarter?”
“Show customer acquisition by month”
“Which regions had the highest growth?”

Tips

Keep your source files. The source/ folder preserves your raw data. You can re-run ingestion or modify scripts to extract additional fields later.

Check the README. After ingestion, Zo generates documentation explaining what data is available, the time range covered, and example queries.

Schema is your reference. Open schema.yaml to see every table, column, type, and description — useful when asking complex questions.

Privacy

Your data stays on your Zo server. Datasets are stored in your workspace like any other files. When you query your data, Zo reads the schema and runs queries locally — your raw data never leaves your Zo.

Getting started

Working with AI

Advanced

Ridiculous but possible

What is a dataset?

What can you do with datasets?

Creating datasets

From a template

From any data files

Ask Zo to collect data

Getting started

Personal data templates

How datasets work

The folder structure

Why this structure?

Example queries

Tips

Privacy

Getting started

Working with AI

Advanced

Ridiculous but possible

​What is a dataset?

​What can you do with datasets?

​Creating datasets

​From a template

​From any data files

​Ask Zo to collect data

​Getting started

​Personal data templates

​How datasets work

​The folder structure

​Why this structure?

​Example queries

​Tips

​Privacy

What is a dataset?

What can you do with datasets?

Creating datasets

From a template

From any data files

Ask Zo to collect data

Getting started

Personal data templates

How datasets work

The folder structure

Why this structure?

Example queries

Tips

Privacy