What is a dataset?
A dataset is a folder with a specific structure that Zo recognizes. It includes:- Your data in a queryable format (typically DuckDB)
- Schema documentation describing every table and column
- Context explaining what the data represents
What can you do with datasets?
Once your data is in a dataset, you can:- Ask questions — “What was the average interest rate in 2020?” or “Show me all entries from Q4”
- Generate reports — Create research summaries, trend analyses, or data-backed documents
- Build artifacts — Charts, dashboards, or apps that use your data
- Cross-reference — Combine multiple datasets or link to files in your workspace
Creating datasets
There are several ways to create a dataset:From a template
Zo includes templates for common data sources. Select a template when creating a new dataset, and Zo will provide pre-built ingestion scripts and instructions for getting your data. Current templates include personal data exports (Spotify, Google Takeout, iMessage, GitHub, Amazon, and more). More templates for research, financial, and public datasets are coming soon.From any data files
Have CSV files, JSON exports, or database files? Create a blank dataset, drop your files in, and ask Zo to help you build an ingestion script. Zo can parse most common formats and structure them into a queryable database.Ask Zo to collect data
Need data you don’t have? Just ask. Zo can search the web, download files, and structure them into a dataset. For example:- “Find historical S&P 500 data and create a dataset”
- “Gather population data by country and set up a dataset”
- “Download weather data for the past 10 years and organize it”
Getting started
To create a new dataset, go to the Sites section () and select the Datasets tab. Click New Dataset, give it a name, and optionally select a template. After creating the dataset, Zo will help you:- Get your data into the
source/folder (upload, download, or collect) - Run or create an ingestion script to structure the data
- Generate documentation and schema
Personal data templates
One compelling use of datasets is analyzing your own data exports. Services like Spotify, Google, and Amazon let you export your data (thanks to GDPR and CCPA), but these exports are usually opaque JSON or CSV files that are hard to make sense of. Zo includes templates for these common exports:| Source | What’s included |
|---|---|
| Spotify | Lifetime listening history via extended streaming history export |
| Google Takeout | Chrome, YouTube, Calendar, Maps, Gmail, and more |
| Apple Health | Health app data exported from iPhone |
| iMessage | Messages database from Mac |
| GitHub | Account data (commits, issues, repos) |
| Amazon | Order history and account data |
| Claude (Anthropic) | Conversation history |
| X (Twitter) | Tweet archive |
| Letterboxd | Film diary and ratings |
| DoorDash | Order history |
| 23andMe | Genetic data |
| American Express | Transaction history |
| Apple | General Apple account data |
How datasets work
The folder structure
datapackage.json — its presence tells Zo that a folder is a dataset. Everything else enhances the experience.
Why this structure?
The key insight is that Zo needs context to work effectively with data:schema.yamltells Zo exactly what tables and columns exist, with types and descriptionsREADME.mdexplains what the data represents, its coverage period, and nuancesPROCESS.mdguides Zo through ingestion and maintenance
What database does Zo use?
What database does Zo use?
Zo uses DuckDB by default. DuckDB is fast, portable (your database is just a file), and great for analytical queries. You can also use SQLite or other formats.
Can I explore the database directly?
Can I explore the database directly?
Yes. You can ask Zo to run queries, or click on the
data.duckdb file to open Zo’s built-in database viewer.What if my data format isn't supported?
What if my data format isn't supported?
Zo can parse most common formats: CSV, JSON, XML, Parquet, and more. Create a blank dataset, drop your files in
source/, and ask Zo to help write an ingestion script.Example queries
Once your data is in a dataset, just ask questions: Research data:- “What’s the correlation between these two variables?”
- “Show me the trend over the last 5 years”
- “Generate a summary report of key findings”
- “What are my top 10 most-played artists on Spotify?”
- “How much did I spend on Amazon in 2023?”
- “Who do I text the most on iMessage?”
- “What were our top-selling products last quarter?”
- “Show customer acquisition by month”
- “Which regions had the highest growth?”