96 lines
9.0 KiB
Plaintext
96 lines
9.0 KiB
Plaintext
llama_bin_run\llama-server.exe : ggml_cuda_init: found 2 CUDA
|
|
devices (Total VRAM: 24575 MiB):
|
|
위치 줄:1 문자:58
|
|
+ ... rt-Sleep 3; llama_bin_run\llama-server.exe --model model
|
|
s\Qwen3.5-35B ...
|
|
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
~~~~~~~~~~~~~
|
|
+ CategoryInfo : NotSpecified: (ggml_cuda_init:.
|
|
..AM: 24575 MiB)::String) [], RemoteException
|
|
+ FullyQualifiedErrorId : NativeCommandError
|
|
|
|
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, V
|
|
MM: yes, VRAM: 12287 MiB
|
|
Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, V
|
|
MM: yes, VRAM: 12287 MiB
|
|
load_backend: loaded CUDA backend from C:\Users\Variet-Worker\
|
|
Desktop\variet-llm\llama_bin_run\ggml-cuda.dll
|
|
load_backend: loaded RPC backend from C:\Users\Variet-Worker\D
|
|
esktop\variet-llm\llama_bin_run\ggml-rpc.dll
|
|
load_backend: loaded CPU backend from C:\Users\Variet-Worker\D
|
|
esktop\variet-llm\llama_bin_run\ggml-cpu-haswell.dll
|
|
system info: n_threads = 6, n_threads_batch = 6, total_threads
|
|
= 16
|
|
|
|
system_info: n_threads = 6 (n_threads_batch = 6) / 16 | CUDA :
|
|
ARCHS = 500,610,700,750,800,860,890 | USE_GRAPHS = 1 | PEER_M
|
|
AX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | A
|
|
VX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPEN
|
|
MP = 1 | REPACK = 1 |
|
|
|
|
Running without SSL
|
|
init: using 15 threads for HTTP server
|
|
start: binding port with default address family
|
|
main: loading model
|
|
srv load_model: loading model 'models\Qwen3.5-35B-A3B-Q4_K_
|
|
M.gguf'
|
|
common_init_result: fitting params to device memory, for bugs
|
|
during this step try to reproduce them with -fit off, or provi
|
|
de --verbose logs if the bug only occurs with -fit on
|
|
llama_params_fit_impl: projected memory use with initial param
|
|
eters [MiB]:
|
|
llama_params_fit_impl: - CUDA0 (NVIDIA GeForce RTX 3060): 1
|
|
2287 total, 10746 used, 466 free vs. target of 1024
|
|
llama_params_fit_impl: - CUDA1 (NVIDIA GeForce RTX 3060): 1
|
|
2287 total, 12214 used, -1006 free vs. target of 1024
|
|
llama_params_fit_impl: projected to use 22961 MiB of device me
|
|
mory vs. 22421 MiB of free device memory
|
|
llama_params_fit_impl: cannot meet free memory targets on all
|
|
devices, need to use 2588 MiB less in total
|
|
llama_params_fit_impl: context size set by user to 262144 -> n
|
|
o change
|
|
llama_params_fit: failed to fit params to free device memory:
|
|
n_gpu_layers already set by user to 999, abort
|
|
llama_params_fit: fitting params to free memory took 0.54 seco
|
|
nds
|
|
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA Ge
|
|
Force RTX 3060) (0000:04:00.0) - 11245 MiB free
|
|
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA Ge
|
|
Force RTX 3060) (0000:06:00.0) - 11240 MiB free
|
|
llama_model_loader: loaded meta data with 52 key-value pairs a
|
|
nd 733 tensors from models\Qwen3.5-35B-A3B-Q4_K_M.gguf (versio
|
|
n GGUF V3 (latest))
|
|
llama_model_loader: Dumping metadata keys/values. Note: KV ove
|
|
rrides do not apply in this output.
|
|
llama_model_loader: - kv 0: general.ar
|
|
chitecture str = qwen35moe
|
|
llama_model_loader: - kv 1: ge
|
|
neral.type str = model
|
|
llama_model_loader: - kv 2: general.samp
|
|
ling.top_k i32 = 20
|
|
llama_model_loader: - kv 3: general.samp
|
|
ling.top_p f32 = 0.950000
|
|
llama_model_loader: - kv 4: general.sam
|
|
pling.temp f32 = 1.000000
|
|
llama_model_loader: - kv 5: ge
|
|
neral.name str = Qwen3.5-35B-A3B
|
|
llama_model_loader: - kv 6: genera
|
|
l.basename str = Qwen3.5-35B-A3B
|
|
llama_model_loader: - kv 7: general.qu
|
|
antized_by str = Unsloth
|
|
llama_model_loader: - kv 8: general.
|
|
size_label str = 35B-A3B
|
|
llama_model_loader: - kv 9: gener
|
|
al.license str = apache-2.0
|
|
llama_model_loader: - kv 10: general.li
|
|
cense.link str = https://huggingface.co/Qwen/Qwen
|
|
3.5-3...
|
|
llama_model_loader: - kv 11: genera
|
|
l.repo_url str = https://huggingface.co/unsloth
|
|
llama_model_loader: - kv 12: general.base_m
|
|
odel.count u32 = 1
|
|
llama_model_loader: - kv 13: general.base_mo
|
|
del.0.name str = Qwen3.5 35B A3B
|
|
llama_model_loader: - kv 14: general.base_model.0.or
|
|
ganization str = Qwen
|