Traceway
Datasets

Introduction

What datasets and datapoints are, their types, and how they fit into the Traceway workflow.

A dataset is a named collection of input/output pairs, called datapoints. Datasets serve two purposes in Traceway:

  1. Regression testing — Run evaluations against a dataset to measure model quality and detect regressions when you change prompts or models.
  2. Data collection — Build curated sets of examples from production traffic for fine-tuning, analysis, or review.

Dataset structure

A dataset has:

FieldTypeDescription
idstringUnique identifier
namestringHuman-readable name (e.g., "qa-golden-set")
descriptionstringOptional description
datapoint_countnumberNumber of datapoints in the dataset
created_atstringISO 8601 timestamp
updated_atstringISO 8601 timestamp

Datapoint kinds

Each datapoint has a kind that determines its schema. There are two kinds:

Generic

A flexible format for any type of input/output pair.

{
  "kind": {
    "Generic": {
      "input": { "question": "What is the capital of France?" },
      "expected_output": { "answer": "Paris" },
      "actual_output": null,
      "score": null,
      "metadata": { "source": "geography-quiz", "difficulty": "easy" }
    }
  }
}
FieldTypeDescription
inputany JSONThe input to the task
expected_outputany JSONThe expected/correct output
actual_outputany JSONOptional. The model's actual output (populated by evals)
scorenumberOptional. 0.0 to 1.0 score (populated by evals)
metadataobjectOptional. Arbitrary key-value pairs

Use Generic datapoints when your inputs and outputs don't follow a chat message format, or when you need the flexibility of arbitrary JSON.

LlmConversation

A structured format specifically for chat-style LLM interactions.

{
  "kind": {
    "LlmConversation": {
      "messages": [
        { "role": "system", "content": "You are a geography expert." },
        { "role": "user", "content": "What is the capital of France?" }
      ],
      "expected": "Paris",
      "metadata": { "topic": "geography" }
    }
  }
}
FieldTypeDescription
messagesMessage[]Array of { role, content } message objects
expectedstringOptional. The expected response text
metadataobjectOptional. Arbitrary key-value pairs

Use LlmConversation datapoints when your data is naturally chat-formatted. Evaluations send the messages array directly to the model, making setup simpler.

Datapoint sources

Each datapoint tracks how it was created:

SourceDescription
manualCreated via the API or dashboard
span_exportExported from a production span
file_uploadImported from a CSV/JSON/JSONL file

For span_export datapoints, the source_span_id field links back to the original span, so you can trace a datapoint to the production request that inspired it.

Creating a dataset

Via the SDK

import { Traceway } from 'traceway';

const tw = new Traceway();

const dataset = await tw.createDataset('qa-golden-set', 'Curated Q&A test cases');
console.log(dataset.id); // "01J..."

Via the API

curl -X POST https://api.traceway.ai/api/datasets \
  -H "Authorization: Bearer tw_sk_..." \
  -H "Content-Type: application/json" \
  -d '{"name": "qa-golden-set", "description": "Curated Q&A test cases"}'

Via the dashboard

In the Datasets tab, click "New Dataset", enter a name and optional description.

Managing datasets

// List all datasets
const { datasets } = await tw.listDatasets();

// Update a dataset's name or description
// (use the REST API: PUT /api/datasets/:id)

// Delete a dataset and all its datapoints
await tw.deleteDataset(dataset.id);

Deleting a dataset also deletes all its datapoints, queue items, and eval runs. This is irreversible.

Next steps

On this page