llama_bin_run\llama-server.exe : ggml_cuda_init: found 2 CUDA devices (Total VRAM: 24575 MiB): 위치 줄:1 문자:58 + ... rt-Sleep 3; llama_bin_run\llama-server.exe --model model s\Qwen3.5-35B ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (ggml_cuda_init:. ..AM: 24575 MiB)::String) [], RemoteException + FullyQualifiedErrorId : NativeCommandError Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, V MM: yes, VRAM: 12287 MiB Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, V MM: yes, VRAM: 12287 MiB load_backend: loaded CUDA backend from C:\Users\Variet-Worker\ Desktop\variet-llm\llama_bin_run\ggml-cuda.dll load_backend: loaded RPC backend from C:\Users\Variet-Worker\D esktop\variet-llm\llama_bin_run\ggml-rpc.dll load_backend: loaded CPU backend from C:\Users\Variet-Worker\D esktop\variet-llm\llama_bin_run\ggml-cpu-haswell.dll system info: n_threads = 6, n_threads_batch = 6, total_threads = 16 system_info: n_threads = 6 (n_threads_batch = 6) / 16 | CUDA : ARCHS = 500,610,700,750,800,860,890 | USE_GRAPHS = 1 | PEER_M AX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | A VX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPEN MP = 1 | REPACK = 1 | Running without SSL init: using 15 threads for HTTP server start: binding port with default address family main: loading model srv load_model: loading model 'models\Qwen3.5-35B-A3B-Q4_K_ M.gguf' common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provi de --verbose logs if the bug only occurs with -fit on llama_params_fit_impl: projected memory use with initial param eters [MiB]: llama_params_fit_impl: - CUDA0 (NVIDIA GeForce RTX 3060): 1 2287 total, 10746 used, 466 free vs. target of 1024 llama_params_fit_impl: - CUDA1 (NVIDIA GeForce RTX 3060): 1 2287 total, 12214 used, -1006 free vs. target of 1024 llama_params_fit_impl: projected to use 22961 MiB of device me mory vs. 22421 MiB of free device memory llama_params_fit_impl: cannot meet free memory targets on all devices, need to use 2588 MiB less in total llama_params_fit_impl: context size set by user to 262144 -> n o change llama_params_fit: failed to fit params to free device memory: n_gpu_layers already set by user to 999, abort llama_params_fit: fitting params to free memory took 0.54 seco nds llama_model_load_from_file_impl: using device CUDA0 (NVIDIA Ge Force RTX 3060) (0000:04:00.0) - 11245 MiB free llama_model_load_from_file_impl: using device CUDA1 (NVIDIA Ge Force RTX 3060) (0000:06:00.0) - 11240 MiB free llama_model_loader: loaded meta data with 52 key-value pairs a nd 733 tensors from models\Qwen3.5-35B-A3B-Q4_K_M.gguf (versio n GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV ove rrides do not apply in this output. llama_model_loader: - kv 0: general.ar chitecture str = qwen35moe llama_model_loader: - kv 1: ge neral.type str = model llama_model_loader: - kv 2: general.samp ling.top_k i32 = 20 llama_model_loader: - kv 3: general.samp ling.top_p f32 = 0.950000 llama_model_loader: - kv 4: general.sam pling.temp f32 = 1.000000 llama_model_loader: - kv 5: ge neral.name str = Qwen3.5-35B-A3B llama_model_loader: - kv 6: genera l.basename str = Qwen3.5-35B-A3B llama_model_loader: - kv 7: general.qu antized_by str = Unsloth llama_model_loader: - kv 8: general. size_label str = 35B-A3B llama_model_loader: - kv 9: gener al.license str = apache-2.0 llama_model_loader: - kv 10: general.li cense.link str = https://huggingface.co/Qwen/Qwen 3.5-3... llama_model_loader: - kv 11: genera l.repo_url str = https://huggingface.co/unsloth llama_model_loader: - kv 12: general.base_m odel.count u32 = 1 llama_model_loader: - kv 13: general.base_mo del.0.name str = Qwen3.5 35B A3B llama_model_loader: - kv 14: general.base_model.0.or ganization str = Qwen