Executions

Track, monitor, and analyze every prompt and chain execution with detailed metrics, costs, and performance data.

What is an Execution?

An Execution is a recorded instance of running a prompt or chain. Every time you execute a prompt or chain, Prompt Forge creates an execution record with comprehensive metrics:

Input & Output - What was sent and what was received
Status - Success, error, or pending
Performance - Latency in milliseconds
Token Usage - Input and output tokens consumed
Cost - Exact cost in USD
Timestamp - When the execution occurred
Version - Which prompt/chain version was used

Execution Lifecycle

Status Values

pending

Execution has been initiated but not yet completed. Typically only visible for long-running chains.

success

Execution completed successfully. Output is available in the response.

error

Execution failed. Error message and stack trace are available for debugging.

Viewing Executions

Dashboard View

The dashboard provides a comprehensive view of all executions with metrics:

Total executions across all prompts and chains
Success rate percentage
Average latency
Total token usage (input + output)
Total cost in USD
Daily activity charts
Most frequently used prompts and chains

Via GraphQL API

Fetch Recent Executions

graphql

query GetExecutions {
  executions {
    id
    status
    output
    error
    latencyMs
    tokenIn
    tokenOut
    costUsd
    createdAt
    promptVersion {
      version
      prompt {
        id
        name
      }
    }
  }
}

Chain Executions

query GetChainExecutions {
  chainExecutions {
    id
    status
    output
    latencyMs
    totalCost
    stepCount
    createdAt
    chain {
      id
      name
    }
    steps {
      order
      type
      output
      latency
      cost
    }
  }
}

Execution Metrics

Performance Metrics

Parameter	Type	Description
`latencyMs`	`number`	Total execution time in milliseconds (including API calls, model inference, etc.)
`tokenIn`	`number`	Number of tokens in the input (prompt + variables)
`tokenOut`	`number`	Number of tokens generated in the output
`costUsd`	`string`	Exact cost in USD (4 decimal precision)

All metrics are automatically calculated and stored. You don't need to track them manually.

Cost Calculation

Costs are calculated based on the model used and token consumption:

Claude 3.5 Sonnet Pricing (as of Jan 2025):
Input:  $3.00 / 1M tokens
Output: $15.00 / 1M tokens

Example calculation:
Input tokens:  500 × $3.00 / 1,000,000 = $0.0015
Output tokens: 200 × $15.00 / 1,000,000 = $0.0030
Total cost: $0.0045

Costs vary by model. GPT-4 and Claude Opus are typically more expensive than Haiku or GPT-3.5 Turbo.

Filtering and Analytics

Time-Based Filtering

The dashboard allows filtering executions by time period:

Last 7 days
Last 30 days
All time

Success Rate Analysis

Success Rate = (Successful Executions / Total Executions) × 100

Example:
Total: 1,250 executions
Successful: 1,187 executions
Failed: 63 executions
Success Rate: 94.96%

Cost Optimization

Use execution data to optimize costs:

Review prompts with high token usage
Consider using cheaper models for simple tasks
Optimize prompt templates to reduce input tokens
Set appropriate max_tokens limits
Cache results for repeated queries when possible

Error Handling

When an execution fails, the error details are captured:

{
  "id": "exec_abc123",
  "status": "error",
  "output": null,
  "error": "Invalid API key for provider 'anthropic'",
  "latencyMs": 150,
  "tokenIn": 0,
  "tokenOut": 0,
  "costUsd": "0.0000",
  "createdAt": "2025-01-21T10:30:00Z"
}

Common Error Types

Authentication Errors

Invalid API keys, expired credentials, or insufficient permissions

Validation Errors

Missing required variables, invalid input types, or schema mismatches

Rate Limit Errors

Too many requests to the AI provider within a time window

Timeout Errors

Execution exceeded maximum allowed time

Execution History

Every prompt and chain maintains a complete execution history, allowing you to:

Track performance over time
Identify degradation or improvements
Compare different prompt versions
Audit usage for billing purposes
Debug issues by reviewing past executions
Analyze user patterns and common inputs

Use the execution history to A/B test different prompt versions. Compare metrics side-by-side to see which performs better.

Best Practices

Monitor Regularly - Check your dashboard weekly to catch issues early and optimize costs.

Set Alerts - Watch for sudden spikes in error rates or costs.

Review Failed Executions - Investigate errors to improve prompt robustness and input validation.

Track Version Performance - When you create a new prompt version, compare its metrics to previous versions.

Next Steps

Execution API

Complete API reference for querying executions

API Reference →

Optimize Costs

Learn strategies to reduce token usage and costs

Prompt Best Practices →