[
  {
    "id": "code_generate",
    "category": "coding",
    "name": "Python 함수 생성",
    "model": "qwen27b",
    "response": "",
    "tokens": 800,
    "time": 47.56,
    "tps": 16.82,
    "eval_criteria": [
      "correctness",
      "type_hints",
      "docstring",
      "edge_cases"
    ]
  },
  {
    "id": "code_debug",
    "category": "coding",
    "name": "버그 찾기 & 수정",
    "model": "qwen27b",
    "response": "# Fixed Code\n\n```python\ndef find_duplicates(arr):\n    count = {}\n    for item in arr:\n        count[item] = count.get(item, 0) + 1\n    \n    # Filter to only include items that appear more than once\n    duplicates = {item: count for item, count in count.items() if count > 1}\n    return duplicates\n```\n\n## What Was Fixed\n\n1. **Changed return type**: Now returns a dictionary `{value: count}` instead of a list\n2. **Fixed counting logic**: Counts all occurrences of each item, not just marks as seen\n3. **Handles empty lists**: Returns `{}` instead of crashing\n4. **Removed unnecessary set conversion**: No need to convert to set since we're returning a dict\n\n## Test Cases\n\n```python\nprint(find_duplicates([1, 2, 2, 3, 3, 3]))  # {2: 2, 3: 3}\nprint(find_duplicates([]))                  # {}\nprint(find_duplicates([1, 1, 1]))           # {1: 3}\nprint(find_duplicates([1, 2, 3]))           # {}\nprint(find_duplicates([5]))                 # {}\n```\n\n## Alternative (More Pythonic)\n\n```python\nfrom collections import Counter\n\ndef find_duplicates(arr):\n    count = Counter(arr)\n    return {item: count for item, count in count.items() if count > 1}\n```\n\nThis version uses `Counter` from the `collections`",
    "tokens": 800,
    "time": 48.01,
    "tps": 16.66,
    "eval_criteria": [
      "bug_identified",
      "correct_fix",
      "clean_code"
    ]
  },
  {
    "id": "code_refactor",
    "category": "coding",
    "name": "TypeScript 리팩토링",
    "model": "qwen27b",
    "response": "",
    "tokens": 800,
    "time": 47.99,
    "tps": 16.67,
    "eval_criteria": [
      "types",
      "error_handling",
      "backoff",
      "production_quality"
    ]
  },
  {
    "id": "korean_schedule",
    "category": "assistant_kr",
    "name": "한국어 일정 관리",
    "model": "qwen27b",
    "response": "",
    "tokens": 800,
    "time": 47.75,
    "tps": 16.75,
    "eval_criteria": [
      "korean_fluency",
      "schedule_analysis",
      "practical_advice"
    ]
  },
  {
    "id": "korean_email",
    "category": "assistant_kr",
    "name": "한국어 이메일 요약",
    "model": "qwen27b",
    "response": "",
    "tokens": 800,
    "time": 48.05,
    "tps": 16.65,
    "eval_criteria": [
      "korean_summary",
      "action_items",
      "conciseness"
    ]
  },
  {
    "id": "tool_calling",
    "category": "tool_use",
    "name": "Function Calling (JSON)",
    "model": "qwen27b",
    "response": "[{\"tool\": \"get_calendar\", \"args\": {\"date\": \"tomorrow\"}}, {\"tool\": \"search_web\", \"args\": {\"query\": \"latest quarterly report\"}}, {\"tool\": \"send_email\", \"args\": {\"to\": \"john@example.com\", \"subject\": \"Quarterly Report Summary\", \"body\": \"Summary of the latest quarterly report attached for your review.\"}}]",
    "tokens": 719,
    "time": 43.06,
    "tps": 16.7,
    "eval_criteria": [
      "correct_sequence",
      "valid_json",
      "complete_args"
    ]
  },
  {
    "id": "structured_output",
    "category": "tool_use",
    "name": "구조화 출력 (JSON)",
    "model": "qwen27b",
    "response": "",
    "tokens": 800,
    "time": 48.01,
    "tps": 16.66,
    "eval_criteria": [
      "correct_parsing",
      "valid_json",
      "completeness"
    ]
  },
  {
    "id": "reasoning",
    "category": "reasoning",
    "name": "논리 추론",
    "model": "qwen27b",
    "response": "",
    "tokens": 800,
    "time": 47.67,
    "tps": 16.78,
    "eval_criteria": [
      "correct_answer",
      "clear_steps",
      "math_accuracy"
    ]
  }
]