Introduction

What datasets and datapoints are, their types, and how they fit into the Traceway workflow.

A dataset is a named collection of input/output pairs, called datapoints. Datasets serve two purposes in Traceway:

Regression testing — Run evaluations against a dataset to measure model quality and detect regressions when you change prompts or models.
Data collection — Build curated sets of examples from production traffic for fine-tuning, analysis, or review.

Dataset structure

A dataset has:

Field	Type	Description
`id`	string	Unique identifier
`name`	string	Human-readable name (e.g., "qa-golden-set")
`description`	string	Optional description
`datapoint_count`	number	Number of datapoints in the dataset
`created_at`	string	ISO 8601 timestamp
`updated_at`	string	ISO 8601 timestamp

Datapoint kinds

Each datapoint has a kind that determines its schema. There are two kinds:

Generic

A flexible format for any type of input/output pair.

{
  "kind": {
    "Generic": {
      "input": { "question": "What is the capital of France?" },
      "expected_output": { "answer": "Paris" },
      "actual_output": null,
      "score": null,
      "metadata": { "source": "geography-quiz", "difficulty": "easy" }
    }
  }
}

Field	Type	Description
`input`	any JSON	The input to the task
`expected_output`	any JSON	The expected/correct output
`actual_output`	any JSON	Optional. The model's actual output (populated by evals)
`score`	number	Optional. 0.0 to 1.0 score (populated by evals)
`metadata`	object	Optional. Arbitrary key-value pairs

Use Generic datapoints when your inputs and outputs don't follow a chat message format, or when you need the flexibility of arbitrary JSON.

LlmConversation

A structured format specifically for chat-style LLM interactions.

{
  "kind": {
    "LlmConversation": {
      "messages": [
        { "role": "system", "content": "You are a geography expert." },
        { "role": "user", "content": "What is the capital of France?" }
      ],
      "expected": "Paris",
      "metadata": { "topic": "geography" }
    }
  }
}

Field	Type	Description
`messages`	Message[]	Array of `{ role, content }` message objects
`expected`	string	Optional. The expected response text
`metadata`	object	Optional. Arbitrary key-value pairs

Use LlmConversation datapoints when your data is naturally chat-formatted. Evaluations send the messages array directly to the model, making setup simpler.

Datapoint sources

Each datapoint tracks how it was created:

Source	Description
`manual`	Created via the API or dashboard
`span_export`	Exported from a production span
`file_upload`	Imported from a CSV/JSON/JSONL file

For span_export datapoints, the source_span_id field links back to the original span, so you can trace a datapoint to the production request that inspired it.

Creating a dataset

Via the SDK

import { Traceway } from 'traceway';

const tw = new Traceway();

const dataset = await tw.createDataset('qa-golden-set', 'Curated Q&A test cases');
console.log(dataset.id); // "01J..."

Via the API

curl -X POST https://api.traceway.ai/api/datasets \
  -H "Authorization: Bearer tw_sk_..." \
  -H "Content-Type: application/json" \
  -d '{"name": "qa-golden-set", "description": "Curated Q&A test cases"}'

Via the dashboard

In the Datasets tab, click "New Dataset", enter a name and optional description.

Managing datasets

// List all datasets
const { datasets } = await tw.listDatasets();

// Update a dataset's name or description
// (use the REST API: PUT /api/datasets/:id)

// Delete a dataset and all its datapoints
await tw.deleteDataset(dataset.id);

Deleting a dataset also deletes all its datapoints, queue items, and eval runs. This is irreversible.

Dataset structure

Datapoint kinds

Generic

LlmConversation

Datapoint sources

Creating a dataset

Via the SDK

Via the API

Via the dashboard

Managing datasets

Next steps

Adding Data

Capture Rules

Running Evaluations

On this page