Executions

Track, monitor, and analyze every prompt and chain execution with detailed metrics, costs, and performance data.

What is an Execution?

An Execution is a recorded instance of running a prompt or chain. Every time you execute a prompt or chain, Prompt Forge creates an execution record with comprehensive metrics:

  • Input & Output - What was sent and what was received
  • Status - Success, error, or pending
  • Performance - Latency in milliseconds
  • Token Usage - Input and output tokens consumed
  • Cost - Exact cost in USD
  • Timestamp - When the execution occurred
  • Version - Which prompt/chain version was used

Execution Lifecycle

Status Values

pending

Execution has been initiated but not yet completed. Typically only visible for long-running chains.

success

Execution completed successfully. Output is available in the response.

error

Execution failed. Error message and stack trace are available for debugging.

Viewing Executions

Dashboard View

The dashboard provides a comprehensive view of all executions with metrics:

  • Total executions across all prompts and chains
  • Success rate percentage
  • Average latency
  • Total token usage (input + output)
  • Total cost in USD
  • Daily activity charts
  • Most frequently used prompts and chains

Via GraphQL API

Fetch Recent Executions
graphql
query GetExecutions {
  executions {
    id
    status
    output
    error
    latencyMs
    tokenIn
    tokenOut
    costUsd
    createdAt
    promptVersion {
      version
      prompt {
        id
        name
      }
    }
  }
}

Chain Executions

query GetChainExecutions {
  chainExecutions {
    id
    status
    output
    latencyMs
    totalCost
    stepCount
    createdAt
    chain {
      id
      name
    }
    steps {
      order
      type
      output
      latency
      cost
    }
  }
}

Execution Metrics

Performance Metrics

ParameterTypeDescription
latencyMsnumberTotal execution time in milliseconds (including API calls, model inference, etc.)
tokenInnumberNumber of tokens in the input (prompt + variables)
tokenOutnumberNumber of tokens generated in the output
costUsdstringExact cost in USD (4 decimal precision)
All metrics are automatically calculated and stored. You don't need to track them manually.

Cost Calculation

Costs are calculated based on the model used and token consumption:

Claude 3.5 Sonnet Pricing (as of Jan 2025):
Input:  $3.00 / 1M tokens
Output: $15.00 / 1M tokens

Example calculation:
Input tokens:  500 × $3.00 / 1,000,000 = $0.0015
Output tokens: 200 × $15.00 / 1,000,000 = $0.0030
Total cost: $0.0045
Costs vary by model. GPT-4 and Claude Opus are typically more expensive than Haiku or GPT-3.5 Turbo.

Filtering and Analytics

Time-Based Filtering

The dashboard allows filtering executions by time period:

  • Last 7 days
  • Last 30 days
  • All time

Success Rate Analysis

Success Rate = (Successful Executions / Total Executions) × 100

Example:
Total: 1,250 executions
Successful: 1,187 executions
Failed: 63 executions
Success Rate: 94.96%

Cost Optimization

Use execution data to optimize costs:

  1. Review prompts with high token usage
  2. Consider using cheaper models for simple tasks
  3. Optimize prompt templates to reduce input tokens
  4. Set appropriate max_tokens limits
  5. Cache results for repeated queries when possible

Error Handling

When an execution fails, the error details are captured:

{
  "id": "exec_abc123",
  "status": "error",
  "output": null,
  "error": "Invalid API key for provider 'anthropic'",
  "latencyMs": 150,
  "tokenIn": 0,
  "tokenOut": 0,
  "costUsd": "0.0000",
  "createdAt": "2025-01-21T10:30:00Z"
}

Common Error Types

Authentication Errors

Invalid API keys, expired credentials, or insufficient permissions

Validation Errors

Missing required variables, invalid input types, or schema mismatches

Rate Limit Errors

Too many requests to the AI provider within a time window

Timeout Errors

Execution exceeded maximum allowed time

Execution History

Every prompt and chain maintains a complete execution history, allowing you to:

  • Track performance over time
  • Identify degradation or improvements
  • Compare different prompt versions
  • Audit usage for billing purposes
  • Debug issues by reviewing past executions
  • Analyze user patterns and common inputs
Use the execution history to A/B test different prompt versions. Compare metrics side-by-side to see which performs better.

Best Practices

Monitor Regularly - Check your dashboard weekly to catch issues early and optimize costs.
Set Alerts - Watch for sudden spikes in error rates or costs.
Review Failed Executions - Investigate errors to improve prompt robustness and input validation.
Track Version Performance - When you create a new prompt version, compare its metrics to previous versions.

Next Steps

Execution API

Complete API reference for querying executions

API Reference →

Optimize Costs

Learn strategies to reduce token usage and costs

Prompt Best Practices →