Skip to content

Use Case Recipes

Recipes are a collection of code examples that demonstrate how to leverage Data Designer in specific use cases. Each recipe is a self-contained example that can be run independently.

New to Data Designer?

Recipes provide working code for specific use cases without detailed explanations. If you're learning Data Designer for the first time, we recommend starting with our tutorial notebooks, which offer step-by-step guidance and explain core concepts. Once you're familiar with the basics, return here for practical, ready-to-use implementations.

Tip

These recipes use the Open AI model provider by default. Ensure your OpenAI model provider has been set up using the Data Designer CLI before running a recipe.

  • Text to Python

    Generate a dataset of natural language instructions paired with Python code implementations, with varying complexity levels and industry focuses.


    Demonstrates:

    • Python code generation
    • Python code validation
    • LLM-as-judge

    View Recipe Download Code

  • Text to SQL

    Generate a dataset of natural language instructions paired with SQL code implementations, with varying complexity levels and industry focuses.


    Demonstrates:

    • SQL code generation
    • SQL code validation
    • LLM-as-judge

    View Recipe Download Code

  • Nemotron Super Text to SQL

    Generate enterprise-grade text-to-SQL training data used for Nemotron Super v3 SFT -- dialect-specific SQL, distractor injection, dirty data, 5 LLM judges with 15 scoring dimensions.


    Demonstrates:

    • Dialect-specific SQL generation (SQLite, MySQL, PostgreSQL)
    • Distractor table/column and dirty data injection
    • Conditional sampling with SubcategorySamplerParams
    • 5 LLM judges with 15 score extraction columns

    View Recipe Download Code

  • Product Info QA

    Generate a dataset that contains information about products and associated question/answer pairs.


    Demonstrates:

    • Structured outputs
    • Expression columns
    • LLM-as-judge

    View Recipe Download Code

  • Multi-Turn Chat

    Generate a dataset of multi-turn chat conversations between a user and an AI assistant.


    Demonstrates:

    • Structured outputs
    • Expression columns
    • LLM-as-judge

    View Recipe Download Code

  • Agent Rollout Trace Distillation

    Read agent rollout traces from disk and turn each imported rollout into a structured workflow record inside a Data Designer pipeline.


    Demonstrates:

    • AgentRolloutSeedSource across Claude Code and Codex rollout formats
    • Using normalized trace columns in generation prompts
    • Distilling agent traces into reusable structured records

    View Recipe Download Code

  • Basic MCP Tool Use

    Minimal example of MCP tool calling with Data Designer. Defines a simple MCP server with basic tools and generates data that requires tool calls to complete.


    Demonstrates:

    • MCP tool calling with LocalStdioMCPProvider
    • Simple tool server definition
    • Tool-augmented text generation

    View Recipe Download Code

  • PDF Document QA (MCP + Tool Use)

    Generate grounded Q&A pairs from PDF documents using MCP tool calls and BM25 search.


    Demonstrates:

    • MCP tool calling with LocalStdioMCPProvider
    • BM25 lexical search for retrieval
    • Retrieval-grounded QA generation
    • Per-column trace capture

    View Recipe Download Code

  • Nemotron Super Search Agent (MCP + Tool Use)

    Generate multi-turn search agent trajectories used for Nemotron Super post-training -- Tavily web search via MCP, Wikidata KG seeding, BrowseComp-style question generation.


    Demonstrates:

    • MCP tool calling with Tavily web search
    • Wikidata knowledge graph seeding
    • Two-stage question generation (draft + BrowseComp obfuscation)
    • Full trajectory capture with traces
    • Structured output formatting

    View Recipe Download Code

  • Markdown Section Seed Reader

    Define a custom FileSystemSeedReader inline and turn Markdown files into one seed row per heading section.


    Demonstrates:

    • Single-file custom seed reader pattern
    • hydrate_row() fanout from 1 -> N
    • Manifest-based file selection semantics
    • DirectorySeedSource customization without a new seed_type

    View Recipe Download Code