fix(extension): UTF-8 encoding + noise filter enhancement (v0.5.39)

- http-bridge.ts: add req.setEncoding('utf8') to all POST handlers
  to fix Korean text corruption in pending/chat/dump payloads
- observer-script.ts: add inline pre-strip in cleanLines() for
  Material icon names concatenated by textContent without newlines
- observer-script.ts: apply cleanLines() to codeText extraction
- known-issues: document UTF-8 encoding and noise filter issues
This commit is contained in:
Variet Worker
2026-04-13 12:56:25 +09:00
parent 5a76e30993
commit 2a1ebf1020
7 changed files with 157 additions and 8 deletions

View File

@@ -4,12 +4,30 @@
> **<2A><20><EFBFBD><EFBFBD><EC94AA><EFBFBD> SSOT(Single Source of Truth)<29><EFBFBD><EFBFBD>떎.**
> <EFBFBD>뵒踰꾧퉭<EFBFBD><EFBFBD>굹 援ы쁽 <20><EFBFBD>뿉 **諛섎뱶<EC848E>떆** <20><20><EFBFBD><EFBFBD><20><EFBFBD><EFBFBD><EFBFBD><EFBFBD>슂.
> <EFBFBD><EFBFBD>뀡 醫낅즺 <20><20>깉濡<EAB989> 諛쒓껄<EC9293><20><EFBFBD>뒋瑜<EB928B> <20><20><EFBFBD><EFBFBD>뿉 異붽<E795B0><EBB6BD><EFBFBD><EFBFBD><EFBFBD>떎.
# Known Issues & Lessons Learned
> <EFBFBD><EFBFBD>뀡 醫낅즺 <20><20>깉濡<EAB989> 諛쒓껄<EC9293><20><EFBFBD>뒋瑜<EB928B> <20><20><EFBFBD><EFBFBD>뿉 異붽<E795B0><EBB6BD><EFBFBD><EFBFBD><EFBFBD>떎.
> **씠 뙆씪 SSOT(Single Source of Truth)엯땲떎.**
> 뵒踰꾧퉭씠굹 援ы쁽 쟾뿉 **諛섎뱶떆** 씠 뙆뙆씪쓣 솗씤븯꽭슂.
> 꽭뀡 醫낅즺 떆 깉濡 諛쒓껄맂 씠뒋瑜 씠 뙆뙆씪뿉 異붽빀땲떎.
> [!TIP]
> 빐寃 셿猷뚮맂 怨쇨굅 씠뒋뒗 [`known-issues-archive.md`](file:///c:/Users/Variet-Worker/Desktop/gravity_control/.agents/references/known-issues-archive.md)뿉 蹂닿릺뼱 엳뒿땲떎.
> 鍮꾩듂븳 臾몄젣媛 옱諛쒗븯硫 archive뿉꽌 寃깋븯꽭슂.
---
### [2026-04-13] [Extension] HTTP Bridge UTF-8 인코딩 깨짐 — 한글 description 손실
- **증상**: pending/ 파일의 description 필드에서 한글이 `[AI ]`처럼 깨져서 저장됨. Discord로 전달되는 승인 요청 본문도 깨짐
- **원인**: Node.js HTTP 서버의 `req.on('data', chunk)` 콜백에서 chunk가 Buffer 타입으로 전달되는데, `body += chunk`로 string 결합 시 Buffer의 기본 인코딩(latin1)이 사용되어 multi-byte UTF-8 문자가 손실됨
- **해결**: 모든 POST 핸들러(`/pending`, `/dump-html`, `/chat`, `/deep-inspect-result`, `/test-rpc`)에 `req.setEncoding('utf8')` 추가 (v0.5.39)
- **주의**: Node.js HTTP 서버에서 POST body를 문자열로 수집할 때는 반드시 `req.setEncoding('utf8')`을 호출하거나, Buffer를 배열로 모은 후 `Buffer.concat().toString('utf8')`로 변환해야 함
### [2026-04-13] [Extension] Observer noise 필터 미작동 — textContent가 아이콘 텍스트를 줄바꿈 없이 합침
- **증상**: pending description에 `Thought for 1s`, `chevron_right` 등 Material 아이콘명과 UI 노이즈가 그대로 남아있음
- **원인**: DOM `textContent`는 block 요소 사이에 newline을 삽입하지 않아 `[AI 본문 요약]Thought for 1schevron_right[결행 명령]`처럼 한 줄로 합쳐짐. `cleanLines()`의 줄 단위 noise 필터(`^pattern$`)가 매칭 실패. 또한 `codeText` 추출에는 `cleanLines()`가 아예 미적용

View File

@@ -2,4 +2,5 @@
| NNN | HH:MM | 작업 설명 | `커밋해시` | 완료여부 |
|-------|-------|----------|-----------|----------|
| 001 | 09:50 | Observer v8 검증 — Extension POLL 정상, HTML 패치 확인, V8 캐시 삭제(24MB), BEACON 미수신(AG 재시작 필요) | 없음 | 🔧 |
| 001 | 09:50 | Observer v8 검증 — Extension POLL 확인, HTML 패치 확인, V8 캐시 삭제(24MB), BEACON 미수신(AG 재시작 필요) | 없음 | 🔧 |
| 002 | 12:34 | DOM Observer 데이터 품질 검증 + UTF-8 인코딩 수정 + noise 필터 강화 (v0.5.39) | `pending` | ✅ |

View File

@@ -0,0 +1,27 @@
# DOM Observer 데이터 품질 검증 + UTF-8/noise 수정
- **시간**: 2026-04-13 12:34~12:52
- **Commit**: `pending`
- **Vikunja**: #619, #620 (진행 중)
## 검증 결과
- DOM Observer v8 **동작 확인**: `pending/`에 45개 시그널 생성됨 (`source: "dom_observer"`)
- 버튼 분류 정상: `command`(30), `permission`(15)
- 명령어/conversation_id/버튼(Allow/Deny/Cancel) 추출 정상
- **한글 인코딩 깨짐** 발견: description 필드에 `[AI 본문 요약]``[AI <20> <20>]`
## 변경 사항 (v0.5.39)
### http-bridge.ts
- 모든 POST 핸들러에 `req.setEncoding('utf8')` 추가
- Node.js HTTP 서버의 Buffer→string latin1 기본 인코딩으로 인한 multi-byte UTF-8 손실 수정
### observer-script.ts
- `cleanLines()`에 인라인 pre-strip 추가: Material 아이콘명 18종을 regex로 `\n`으로 치환
- `Thought for Xs` 패턴 인라인 제거 추가
- `codeText` 추출에 `cleanLines()` 적용 (이전 미적용)
## 미완료
- AG 재시작 후 v0.5.39 적용 검증 (한글 정상 출력 확인)
- DOM dump 추출 검증
- Discord 릴레이 E2E 검증

View File

@@ -2,7 +2,7 @@
"name": "gravity-bridge",
"displayName": "Gravity Bridge",
"description": "Discord-based unified approval system for Antigravity AI interactions.",
"version": "0.5.38",
"version": "0.5.39",
"publisher": "variet",
"engines": {
"vscode": "^1.100.0"

View File

@@ -128,6 +128,7 @@ export function startHttpBridge(ctx: HttpBridgeContext, sdk: any): Promise<numbe
if (req.method === 'POST' && url.pathname === '/dump-html') {
let dumpBody = '';
req.setEncoding('utf8');
req.on('data', (c: string) => dumpBody += c);
req.on('end', () => {
try {
@@ -145,6 +146,7 @@ export function startHttpBridge(ctx: HttpBridgeContext, sdk: any): Promise<numbe
if (req.method === 'POST' && url.pathname === '/test-rpc') {
let rpcBody = '';
req.setEncoding('utf8');
req.on('data', (c: string) => rpcBody += c);
req.on('end', async () => {
try {
@@ -248,6 +250,7 @@ export function startHttpBridge(ctx: HttpBridgeContext, sdk: any): Promise<numbe
function _handlePending(req: any, res: any, ctx: HttpBridgeContext) {
let body = '';
req.setEncoding('utf8');
req.on('data', (c: string) => body += c);
req.on('end', () => {
try {
@@ -398,6 +401,7 @@ function _handleDeepInspectTrigger(res: any) {
function _handleDeepInspectResult(req: any, res: any, ctx: HttpBridgeContext) {
let body = '';
req.setEncoding('utf8');
req.on('data', (c: string) => body += c);
req.on('end', () => {
try {
@@ -420,6 +424,7 @@ function _handleDeepInspectResult(req: any, res: any, ctx: HttpBridgeContext) {
function _handleChatSnapshot(req: any, res: any, ctx: HttpBridgeContext) {
let body = '';
req.setEncoding('utf8');
req.on('data', (c: string) => body += c);
req.on('end', () => {
try {

View File

@@ -40,7 +40,7 @@ export function generateApprovalObserverScript(_port: number): string {
'arrow_forward|arrow_back|expand_more|expand_less|close|more_horiz|more_vert|' +
'content_copy|content_paste|check|check_circle|error|warning|info|' +
'keyboard_arrow_up|keyboard_arrow_down|keyboard_arrow_left|keyboard_arrow_right|' +
'Thought for \\\\d+|Show more|Show less|Copy|Copied!|Edit|Cancel|' +
'Thought for \\\\d+s?|Thought for a few seconds|Show more|Show less|Copy|Copied!|Edit|Cancel|' +
'Always run|Always allow|Running command|Running \\\\d+ commands?|' +
'Deny|Allow|Allow Once|Allow This Conversation|' +
'Run|Send|Stop|Review Changes|Accept all|Reject all|Accept|Reject' +
@@ -60,6 +60,10 @@ export function generateApprovalObserverScript(_port: number): string {
function cleanLines(text) {
if (!text) return '';
// Pre-strip: inline removal of icon names and UI noise that textContent concatenates without newlines
text = text.replace(/\\b(chevron_right|chevron_left|arrow_drop_down|arrow_drop_up|arrow_right|arrow_left|arrow_forward|arrow_back|expand_more|expand_less|more_horiz|more_vert|content_copy|content_paste|keyboard_arrow_up|keyboard_arrow_down|keyboard_arrow_left|keyboard_arrow_right|slow_motion_video|open_in_new)\\b/g, '\\n');
text = text.replace(/Thought for \\d+s?/gi, '');
text = text.replace(/Thought for a few seconds/gi, '');
var lines = text.split('\\n');
var clean = [];
for (var i = 0; i < lines.length; i++) {
@@ -130,7 +134,7 @@ export function generateApprovalObserverScript(_port: number): string {
var codeEl = stepEl.querySelector('pre, code');
var codeText = '';
if (codeEl) {
codeText = (codeEl.textContent || '').trim().substring(0, 400);
codeText = cleanLines((codeEl.textContent || '').trim().substring(0, 400));
}
// Try aria-label on button

94
scratch_patch_verify.js Normal file
View File

@@ -0,0 +1,94 @@
/**
* html-patcher 수정 검증 스크립트
* 실제 workbench.html + 실제 observer-script 출력물로 패치 시뮬레이션
*/
const fs = require('fs');
const path = require('path');
// 1. 실제 깨끗한 workbench.html 읽기
const htmlPath = path.join(
process.env.LOCALAPPDATA,
'Programs', 'Antigravity', 'resources', 'app', 'out',
'vs', 'code', 'electron-browser', 'workbench', 'workbench.html'
);
let html = fs.readFileSync(htmlPath, 'utf8');
console.log(`[1] Clean HTML: ${html.length} chars, ${html.split('\n').length} lines`);
console.log(` Has AG SDK: ${html.includes('AG SDK')}`);
// 2. 실제 observer-script.ts의 출력 시뮬레이션 (generateApprovalObserverScript)
const observerModule = require('./extension/out/observer-script');
const observerJS = observerModule.generateApprovalObserverScript(34332);
console.log(`[2] Observer JS: ${observerJS.length} chars`);
console.log(` Contains $': ${observerJS.includes("$'")}`);
console.log(` Contains ')$': ${observerJS.includes("')$")}`);
// 3. 패치 시뮬레이션 — 수정 전 (BUG)
const inlineBlock_buggy = `<!-- AG SDK INLINE [variet-gravity-bridge] -->\n<script>\n${observerJS}\n</script>\n<!-- /AG SDK INLINE [variet-gravity-bridge] -->`;
let html_buggy = html.replace('</body>', `\n${inlineBlock_buggy}\n</body>`);
// 4. 패치 시뮬레이션 — 수정 후 (FIX)
const inlineBlock = `<!-- AG SDK INLINE [variet-gravity-bridge] -->\n<script>\n${observerJS}\n</script>\n<!-- /AG SDK INLINE [variet-gravity-bridge] -->`;
const safeInlineBlock = inlineBlock.replace(/\$/g, '$$$$');
let html_fixed = html.replace('</body>', `\n${safeInlineBlock}\n</body>`);
console.log(`\n[3] BUGGY result: ${html_buggy.length} chars`);
console.log(`[4] FIXED result: ${html_fixed.length} chars`);
// 5. JS 코드 추출 및 SyntaxError 검증
function extractAndCheckJS(patchedHtml, label) {
const match = patchedHtml.match(/<script>\n([\s\S]*?)\n<\/script>/);
if (!match) {
console.log(`[${label}] ERROR: <script> block not found!`);
return false;
}
const jsCode = match[1];
// Check if original HTML structure leaked into JS
const hasStartupComment = jsCode.includes('<!-- Startup');
const hasWorkbenchJS = jsCode.includes('<script src="./workbench.js"');
const hasClosingHtml = jsCode.includes('</html>') && !jsCode.includes("'</html>'");
console.log(`[${label}] JS code: ${jsCode.length} chars`);
console.log(` Leaked <!-- Startup -->: ${hasStartupComment} ${hasStartupComment ? '❌ CORRUPT' : '✅ OK'}`);
console.log(` Leaked <script src=workbench.js>: ${hasWorkbenchJS} ${hasWorkbenchJS ? '❌ CORRUPT' : '✅ OK'}`);
console.log(` Leaked </html>: ${hasClosingHtml} ${hasClosingHtml ? '❌ CORRUPT' : '✅ OK'}`);
// Check NOISE_RE is intact: should contain ')$', 'i'
const hasNoiseRE = jsCode.includes("')$', 'i'");
console.log(` NOISE_RE ')$', 'i' preserved: ${hasNoiseRE} ${hasNoiseRE ? '✅ OK' : '❌ BROKEN'}`);
// Try to parse JS
try {
new Function(jsCode);
console.log(` JS Syntax: ✅ VALID — no SyntaxError`);
return true;
} catch (e) {
console.log(` JS Syntax: ❌ SyntaxError — ${e.message}`);
// Find the problematic line
const lines = jsCode.split('\n');
const lineMatch = e.message.match(/line (\d+)/);
if (lineMatch) {
const lineNum = parseInt(lineMatch[1]);
console.log(` Around line ${lineNum}:`);
for (let i = Math.max(0, lineNum - 3); i < Math.min(lines.length, lineNum + 3); i++) {
console.log(` ${i+1}: ${lines[i].substring(0, 100)}`);
}
}
return false;
}
}
console.log('\n===== BUGGY VERSION (before fix) =====');
const buggyOK = extractAndCheckJS(html_buggy, 'BUGGY');
console.log('\n===== FIXED VERSION (after fix) =====');
const fixedOK = extractAndCheckJS(html_fixed, 'FIXED');
console.log('\n===== VERDICT =====');
if (!buggyOK && fixedOK) {
console.log('✅ FIX CONFIRMED: Buggy version has SyntaxError, fixed version is clean.');
} else if (buggyOK && fixedOK) {
console.log('⚠️ Both versions work — the bug may not reproduce in this environment.');
} else if (!fixedOK) {
console.log('❌ FIX FAILED: Fixed version still has errors!');
}