variet_llm/scripts/boot_log4.txt

llama_bin_run\llama-server.exe : ggml_cuda_init: found 2 CUDA
devices (Total VRAM: 24575 MiB):
위치 줄:1 문자:58
+ ... rt-Sleep 3; llama_bin_run\llama-server.exe --model model
s\Qwen3.5-35B ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (ggml_cuda_init:.
   ..AM: 24575 MiB)::String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, V
MM: yes, VRAM: 12287 MiB
  Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, V
MM: yes, VRAM: 12287 MiB
load_backend: loaded CUDA backend from C:\Users\Variet-Worker\
Desktop\variet-llm\llama_bin_run\ggml-cuda.dll
load_backend: loaded RPC backend from C:\Users\Variet-Worker\D
esktop\variet-llm\llama_bin_run\ggml-rpc.dll
load_backend: loaded CPU backend from C:\Users\Variet-Worker\D
esktop\variet-llm\llama_bin_run\ggml-cpu-haswell.dll
system info: n_threads = 6, n_threads_batch = 6, total_threads
 = 16

system_info: n_threads = 6 (n_threads_batch = 6) / 16 | CUDA :
 ARCHS = 500,610,700,750,800,860,890 | USE_GRAPHS = 1 | PEER_M
AX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | A
VX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPEN
MP = 1 | REPACK = 1 |

Running without SSL
init: using 15 threads for HTTP server
start: binding port with default address family
main: loading model
srv    load_model: loading model 'models\Qwen3.5-35B-A3B-Q4_K_
M.gguf'
common_init_result: fitting params to device memory, for bugs
during this step try to reproduce them with -fit off, or provi
de --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected memory use with initial param
eters [MiB]:
llama_params_fit_impl:   - CUDA0 (NVIDIA GeForce RTX 3060):  1
2287 total,  10746 used,    466 free vs. target of   1024
llama_params_fit_impl:   - CUDA1 (NVIDIA GeForce RTX 3060):  1
2287 total,  12214 used,  -1006 free vs. target of   1024
llama_params_fit_impl: projected to use 22961 MiB of device me
mory vs. 22421 MiB of free device memory
llama_params_fit_impl: cannot meet free memory targets on all
devices, need to use 2588 MiB less in total
llama_params_fit_impl: context size set by user to 262144 -> n
o change
llama_params_fit: failed to fit params to free device memory:
n_gpu_layers already set by user to 999, abort
llama_params_fit: fitting params to free memory took 0.54 seco
nds
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA Ge
Force RTX 3060) (0000:04:00.0) - 11245 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA Ge
Force RTX 3060) (0000:06:00.0) - 11240 MiB free
llama_model_loader: loaded meta data with 52 key-value pairs a
nd 733 tensors from models\Qwen3.5-35B-A3B-Q4_K_M.gguf (versio
n GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV ove
rrides do not apply in this output.
llama_model_loader: - kv   0:                       general.ar
chitecture str              = qwen35moe
llama_model_loader: - kv   1:                               ge
neral.type str              = model
llama_model_loader: - kv   2:                     general.samp
ling.top_k i32              = 20
llama_model_loader: - kv   3:                     general.samp
ling.top_p f32              = 0.950000
llama_model_loader: - kv   4:                      general.sam
pling.temp f32              = 1.000000
llama_model_loader: - kv   5:                               ge
neral.name str              = Qwen3.5-35B-A3B
llama_model_loader: - kv   6:                           genera
l.basename str              = Qwen3.5-35B-A3B
llama_model_loader: - kv   7:                       general.qu
antized_by str              = Unsloth
llama_model_loader: - kv   8:                         general.
size_label str              = 35B-A3B
llama_model_loader: - kv   9:                            gener
al.license str              = apache-2.0
llama_model_loader: - kv  10:                       general.li
cense.link str              = https://huggingface.co/Qwen/Qwen
3.5-3...
llama_model_loader: - kv  11:                           genera
l.repo_url str              = https://huggingface.co/unsloth
llama_model_loader: - kv  12:                   general.base_m
odel.count u32              = 1
llama_model_loader: - kv  13:                  general.base_mo
del.0.name str              = Qwen3.5 35B A3B
llama_model_loader: - kv  14:          general.base_model.0.or
ganization str              = Qwen