Files
variet_llm/scripts/boot_log4.txt

96 lines
9.0 KiB
Plaintext

llama_bin_run\llama-server.exe : ggml_cuda_init: found 2 CUDA
devices (Total VRAM: 24575 MiB):
위치 줄:1 문자:58
+ ... rt-Sleep 3; llama_bin_run\llama-server.exe --model model
s\Qwen3.5-35B ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (ggml_cuda_init:.
..AM: 24575 MiB)::String) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, V
MM: yes, VRAM: 12287 MiB
Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, V
MM: yes, VRAM: 12287 MiB
load_backend: loaded CUDA backend from C:\Users\Variet-Worker\
Desktop\variet-llm\llama_bin_run\ggml-cuda.dll
load_backend: loaded RPC backend from C:\Users\Variet-Worker\D
esktop\variet-llm\llama_bin_run\ggml-rpc.dll
load_backend: loaded CPU backend from C:\Users\Variet-Worker\D
esktop\variet-llm\llama_bin_run\ggml-cpu-haswell.dll
system info: n_threads = 6, n_threads_batch = 6, total_threads
= 16
system_info: n_threads = 6 (n_threads_batch = 6) / 16 | CUDA :
ARCHS = 500,610,700,750,800,860,890 | USE_GRAPHS = 1 | PEER_M
AX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | A
VX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPEN
MP = 1 | REPACK = 1 |
Running without SSL
init: using 15 threads for HTTP server
start: binding port with default address family
main: loading model
srv load_model: loading model 'models\Qwen3.5-35B-A3B-Q4_K_
M.gguf'
common_init_result: fitting params to device memory, for bugs
during this step try to reproduce them with -fit off, or provi
de --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected memory use with initial param
eters [MiB]:
llama_params_fit_impl: - CUDA0 (NVIDIA GeForce RTX 3060): 1
2287 total, 10746 used, 466 free vs. target of 1024
llama_params_fit_impl: - CUDA1 (NVIDIA GeForce RTX 3060): 1
2287 total, 12214 used, -1006 free vs. target of 1024
llama_params_fit_impl: projected to use 22961 MiB of device me
mory vs. 22421 MiB of free device memory
llama_params_fit_impl: cannot meet free memory targets on all
devices, need to use 2588 MiB less in total
llama_params_fit_impl: context size set by user to 262144 -> n
o change
llama_params_fit: failed to fit params to free device memory:
n_gpu_layers already set by user to 999, abort
llama_params_fit: fitting params to free memory took 0.54 seco
nds
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA Ge
Force RTX 3060) (0000:04:00.0) - 11245 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA Ge
Force RTX 3060) (0000:06:00.0) - 11240 MiB free
llama_model_loader: loaded meta data with 52 key-value pairs a
nd 733 tensors from models\Qwen3.5-35B-A3B-Q4_K_M.gguf (versio
n GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV ove
rrides do not apply in this output.
llama_model_loader: - kv 0: general.ar
chitecture str = qwen35moe
llama_model_loader: - kv 1: ge
neral.type str = model
llama_model_loader: - kv 2: general.samp
ling.top_k i32 = 20
llama_model_loader: - kv 3: general.samp
ling.top_p f32 = 0.950000
llama_model_loader: - kv 4: general.sam
pling.temp f32 = 1.000000
llama_model_loader: - kv 5: ge
neral.name str = Qwen3.5-35B-A3B
llama_model_loader: - kv 6: genera
l.basename str = Qwen3.5-35B-A3B
llama_model_loader: - kv 7: general.qu
antized_by str = Unsloth
llama_model_loader: - kv 8: general.
size_label str = 35B-A3B
llama_model_loader: - kv 9: gener
al.license str = apache-2.0
llama_model_loader: - kv 10: general.li
cense.link str = https://huggingface.co/Qwen/Qwen
3.5-3...
llama_model_loader: - kv 11: genera
l.repo_url str = https://huggingface.co/unsloth
llama_model_loader: - kv 12: general.base_m
odel.count u32 = 1
llama_model_loader: - kv 13: general.base_mo
del.0.name str = Qwen3.5 35B A3B
llama_model_loader: - kv 14: general.base_model.0.or
ganization str = Qwen