|
qwen3-coder-next % ./run --log-file log.txt |
|
ggml_cuda_init: found 1 CUDA devices: |
|
Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes |
|
build: 7939 (b536eb023) with GNU 13.3.0 for Linux x86_64 |
|
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16 |
|
|
|
system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CUDA : ARCHS = 1200 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | BLACKWELL_NATIVE_FP4 = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | |
|
|
|
Running without SSL |
|
init: using 15 threads for HTTP server |
|
start: binding port with default address family |
|
main: loading model |
|
srv load_model: loading model './Qwen3-Coder-Next-Q3_K_M.gguf' |
|
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on |
|
llama_params_fit_impl: projected to use 37528 MiB of device memory vs. 15082 MiB of free device memory |
|
llama_params_fit_impl: cannot meet free memory target of 1024 MiB, need to reduce device memory by 23470 MiB |
|
llama_params_fit_impl: context size set by user to 32768 -> no change |
|
llama_params_fit_impl: with only dense weights in device memory there is a total surplus of 11494 MiB |
|
llama_params_fit_impl: filling dense-only layers back-to-front: |
|
llama_params_fit_impl: - CUDA0 (NVIDIA GeForce RTX 5060 Ti): 49 layers, 3355 MiB used, 11726 MiB free |
|
llama_params_fit_impl: converting dense-only layers to full layers and filling them front-to-back with overflow to next device/system memory: |
|
llama_params_fit_impl: - CUDA0 (NVIDIA GeForce RTX 5060 Ti): 49 layers (34 overflowing), 13834 MiB used, 1247 MiB free |
|
llama_params_fit: successfully fit params to free device memory |
|
llama_params_fit: fitting params to free memory took 3.91 seconds |
|
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5060 Ti) (0000:01:00.0) - 15158 MiB free |
|
llama_model_loader: loaded meta data with 52 key-value pairs and 843 tensors from ./Qwen3-Coder-Next-Q3_K_M.gguf (version GGUF V3 (latest)) |
|
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. |
|
llama_model_loader: - kv 0: general.architecture str = qwen3next |
|
llama_model_loader: - kv 1: general.type str = model |
|
llama_model_loader: - kv 2: general.sampling.top_k i32 = 40 |
|
llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000 |
|
llama_model_loader: - kv 4: general.sampling.temp f32 = 1.000000 |
|
llama_model_loader: - kv 5: general.name str = Qwen3-Coder-Next |
|
llama_model_loader: - kv 6: general.basename str = Qwen3-Coder-Next |
|
llama_model_loader: - kv 7: general.quantized_by str = Unsloth |
|
llama_model_loader: - kv 8: general.size_label str = 512x2.5B |
|
llama_model_loader: - kv 9: general.license str = apache-2.0 |
|
llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/Qwen/Qwen3-Cod... |
|
llama_model_loader: - kv 11: general.repo_url str = https://huggingface.co/unsloth |
|
llama_model_loader: - kv 12: general.base_model.count u32 = 1 |
|
llama_model_loader: - kv 13: general.base_model.0.name str = Qwen3 Coder Next |
|
llama_model_loader: - kv 14: general.base_model.0.organization str = Qwen |
|
llama_model_loader: - kv 15: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-Cod... |
|
llama_model_loader: - kv 16: general.tags arr[str,2] = ["unsloth", "text-generation"] |
|
llama_model_loader: - kv 17: qwen3next.block_count u32 = 48 |
|
llama_model_loader: - kv 18: qwen3next.context_length u32 = 262144 |
|
llama_model_loader: - kv 19: qwen3next.embedding_length u32 = 2048 |
|
llama_model_loader: - kv 20: qwen3next.feed_forward_length u32 = 5120 |
|
llama_model_loader: - kv 21: qwen3next.attention.head_count u32 = 16 |
|
llama_model_loader: - kv 22: qwen3next.attention.head_count_kv u32 = 2 |
|
llama_model_loader: - kv 23: qwen3next.rope.freq_base f32 = 5000000.000000 |
|
llama_model_loader: - kv 24: qwen3next.attention.layer_norm_rms_epsilon f32 = 0.000001 |
|
llama_model_loader: - kv 25: qwen3next.expert_used_count u32 = 10 |
|
llama_model_loader: - kv 26: qwen3next.attention.key_length u32 = 256 |
|
llama_model_loader: - kv 27: qwen3next.attention.value_length u32 = 256 |
|
llama_model_loader: - kv 28: qwen3next.expert_count u32 = 512 |
|
llama_model_loader: - kv 29: qwen3next.expert_feed_forward_length u32 = 512 |
|
llama_model_loader: - kv 30: qwen3next.expert_shared_feed_forward_length u32 = 512 |
|
llama_model_loader: - kv 31: qwen3next.ssm.conv_kernel u32 = 4 |
|
llama_model_loader: - kv 32: qwen3next.ssm.state_size u32 = 128 |
|
llama_model_loader: - kv 33: qwen3next.ssm.group_count u32 = 16 |
|
llama_model_loader: - kv 34: qwen3next.ssm.time_step_rank u32 = 32 |
|
llama_model_loader: - kv 35: qwen3next.ssm.inner_size u32 = 4096 |
|
llama_model_loader: - kv 36: qwen3next.rope.dimension_count u32 = 64 |
|
llama_model_loader: - kv 37: tokenizer.ggml.model str = gpt2 |
|
llama_model_loader: - kv 38: tokenizer.ggml.pre str = qwen2 |
|
llama_model_loader: - kv 39: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... |
|
llama_model_loader: - kv 40: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... |
|
llama_model_loader: - kv 41: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... |
|
llama_model_loader: - kv 42: tokenizer.ggml.eos_token_id u32 = 151645 |
|
llama_model_loader: - kv 43: tokenizer.ggml.padding_token_id u32 = 151654 |
|
llama_model_loader: - kv 44: tokenizer.ggml.add_bos_token bool = false |
|
llama_model_loader: - kv 45: tokenizer.chat_template str = {% macro render_extra_keys(json_dict,... |
|
llama_model_loader: - kv 46: general.quantization_version u32 = 2 |
|
llama_model_loader: - kv 47: general.file_type u32 = 12 |
|
llama_model_loader: - kv 48: quantize.imatrix.file str = Qwen3-Coder-Next-GGUF/imatrix_unsloth... |
|
llama_model_loader: - kv 49: quantize.imatrix.dataset str = unsloth_calibration_Qwen3-Coder-Next.txt |
|
llama_model_loader: - kv 50: quantize.imatrix.entries_count u32 = 576 |
|
llama_model_loader: - kv 51: quantize.imatrix.chunks_count u32 = 154 |
|
llama_model_loader: - type f32: 313 tensors |
|
llama_model_loader: - type q3_K: 229 tensors |
|
llama_model_loader: - type q4_K: 245 tensors |
|
llama_model_loader: - type q5_K: 7 tensors |
|
llama_model_loader: - type q6_K: 1 tensors |
|
llama_model_loader: - type bf16: 48 tensors |
|
print_info: file format = GGUF V3 (latest) |
|
print_info: file type = Q3_K - Medium |
|
print_info: file size = 35.65 GiB (3.84 BPW) |
|
load: 0 unused tokens |
|
load: printing all EOG tokens: |
|
load: - 151643 ('<|endoftext|>') |
|
load: - 151645 ('<|im_end|>') |
|
load: - 151662 ('<|fim_pad|>') |
|
load: - 151663 ('<|repo_name|>') |
|
load: - 151664 ('<|file_sep|>') |
|
load: special tokens cache size = 26 |
|
load: token to piece cache size = 0.9311 MB |
|
print_info: arch = qwen3next |
|
print_info: vocab_only = 0 |
|
print_info: no_alloc = 0 |
|
print_info: n_ctx_train = 262144 |
|
print_info: n_embd = 2048 |
|
print_info: n_embd_inp = 2048 |
|
print_info: n_layer = 48 |
|
print_info: n_head = 16 |
|
print_info: n_head_kv = 2 |
|
print_info: n_rot = 64 |
|
print_info: n_swa = 0 |
|
print_info: is_swa_any = 0 |
|
print_info: n_embd_head_k = 256 |
|
print_info: n_embd_head_v = 256 |
|
print_info: n_gqa = 8 |
|
print_info: n_embd_k_gqa = 512 |
|
print_info: n_embd_v_gqa = 512 |
|
print_info: f_norm_eps = 0.0e+00 |
|
print_info: f_norm_rms_eps = 1.0e-06 |
|
print_info: f_clamp_kqv = 0.0e+00 |
|
print_info: f_max_alibi_bias = 0.0e+00 |
|
print_info: f_logit_scale = 0.0e+00 |
|
print_info: f_attn_scale = 0.0e+00 |
|
print_info: n_ff = 5120 |
|
print_info: n_expert = 512 |
|
print_info: n_expert_used = 10 |
|
print_info: n_expert_groups = 0 |
|
print_info: n_group_used = 0 |
|
print_info: causal attn = 1 |
|
print_info: pooling type = 0 |
|
print_info: rope type = 2 |
|
print_info: rope scaling = linear |
|
print_info: freq_base_train = 5000000.0 |
|
print_info: freq_scale_train = 1 |
|
print_info: n_ctx_orig_yarn = 262144 |
|
print_info: rope_yarn_log_mul = 0.0000 |
|
print_info: rope_finetuned = unknown |
|
print_info: ssm_d_conv = 4 |
|
print_info: ssm_d_inner = 4096 |
|
print_info: ssm_d_state = 128 |
|
print_info: ssm_dt_rank = 32 |
|
print_info: ssm_n_group = 16 |
|
print_info: ssm_dt_b_c_rms = 0 |
|
print_info: model type = 80B.A3B |
|
print_info: model params = 79.67 B |
|
print_info: general.name = Qwen3-Coder-Next |
|
print_info: vocab type = BPE |
|
print_info: n_vocab = 151936 |
|
print_info: n_merges = 151387 |
|
print_info: BOS token = 11 ',' |
|
print_info: EOS token = 151645 '<|im_end|>' |
|
print_info: EOT token = 151645 '<|im_end|>' |
|
print_info: PAD token = 151654 '<|vision_pad|>' |
|
print_info: LF token = 198 'Ċ' |
|
print_info: FIM PRE token = 151659 '<|fim_prefix|>' |
|
print_info: FIM SUF token = 151661 '<|fim_suffix|>' |
|
print_info: FIM MID token = 151660 '<|fim_middle|>' |
|
print_info: FIM PAD token = 151662 '<|fim_pad|>' |
|
print_info: FIM REP token = 151663 '<|repo_name|>' |
|
print_info: FIM SEP token = 151664 '<|file_sep|>' |
|
print_info: EOG token = 151643 '<|endoftext|>' |
|
print_info: EOG token = 151645 '<|im_end|>' |
|
print_info: EOG token = 151662 '<|fim_pad|>' |
|
print_info: EOG token = 151663 '<|repo_name|>' |
|
print_info: EOG token = 151664 '<|file_sep|>' |
|
print_info: max token length = 256 |
|
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false) |
|
load_tensors: offloading output layer to GPU |
|
load_tensors: offloading 47 repeating layers to GPU |
|
load_tensors: offloaded 49/49 layers to GPU |
|
load_tensors: CPU_Mapped model buffer size = 36263.77 MiB |
|
load_tensors: CUDA0 model buffer size = 12571.13 MiB |
|
.................................................................................................... |
|
common_init_result: added <|endoftext|> logit bias = -inf |
|
common_init_result: added <|im_end|> logit bias = -inf |
|
common_init_result: added <|fim_pad|> logit bias = -inf |
|
common_init_result: added <|repo_name|> logit bias = -inf |
|
common_init_result: added <|file_sep|> logit bias = -inf |
|
llama_context: constructing llama_context |
|
llama_context: n_seq_max = 1 |
|
llama_context: n_ctx = 32768 |
|
llama_context: n_ctx_seq = 32768 |
|
llama_context: n_batch = 2048 |
|
llama_context: n_ubatch = 512 |
|
llama_context: causal_attn = 1 |
|
llama_context: flash_attn = enabled |
|
llama_context: kv_unified = false |
|
llama_context: freq_base = 5000000.0 |
|
llama_context: freq_scale = 1 |
|
llama_context: n_ctx_seq (32768) < n_ctx_train (262144) -- the full capacity of the model will not be utilized |
|
llama_context: CUDA_Host output buffer size = 0.58 MiB |
|
llama_kv_cache: CUDA0 KV buffer size = 768.00 MiB |
|
llama_kv_cache: size = 768.00 MiB ( 32768 cells, 12 layers, 1/1 seqs), K (f16): 384.00 MiB, V (f16): 384.00 MiB |
|
llama_memory_recurrent: CUDA0 RS buffer size = 75.38 MiB |
|
llama_memory_recurrent: size = 75.38 MiB ( 1 cells, 48 layers, 1 seqs), R (f32): 3.38 MiB, S (f32): 72.00 MiB |
|
sched_reserve: reserving ... |
|
sched_reserve: CUDA0 compute buffer size = 420.06 MiB |
|
sched_reserve: CUDA_Host compute buffer size = 72.01 MiB |
|
sched_reserve: graph nodes = 14666 (with bs=512), 5918 (with bs=1) |
|
sched_reserve: graph splits = 104 (with bs=512), 74 (with bs=1) |
|
sched_reserve: reserve took 587.99 ms, sched copies = 1 |
|
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) |
|
srv load_model: initializing slots, n_slots = 1 |
|
no implementations specified for speculative decoding |
|
slot load_model: id 0 | task -1 | speculative decoding context not initialized |
|
slot load_model: id 0 | task -1 | new slot, n_ctx = 32768 |
|
srv load_model: prompt cache is enabled, size limit: 8192 MiB |
|
srv load_model: use `--cache-ram 0` to disable the prompt cache |
|
srv load_model: for more info see https://github.com/ggml-org/llama.cpp/pull/16391 |
|
init: chat template, example_format: '<|im_start|>system |
|
You are a helpful assistant<|im_end|> |
|
<|im_start|>user |
|
Hello<|im_end|> |
|
<|im_start|>assistant |
|
Hi there<|im_end|> |
|
<|im_start|>user |
|
How are you?<|im_end|> |
|
<|im_start|>assistant |
|
' |
|
srv init: init: chat template, thinking = 0 |
|
main: model loaded |
|
main: server is listening on http://127.0.0.1:8080 |
|
main: starting the main loop... |
|
srv update_slots: all slots are idle |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = -1 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 0 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 158 |
|
slot update_slots: id 0 | task 0 | n_tokens = 0, memory_seq_rm [0, end) |
|
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 94, batch.n_tokens = 94, progress = 0.594937 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot update_slots: id 0 | task 0 | n_tokens = 94, memory_seq_rm [94, end) |
|
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 158, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 0 | prompt done, n_tokens = 158, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 0 | init sampler, took 0.02 ms, tokens: text = 158, total = 158 |
|
slot update_slots: id 0 | task 0 | created context checkpoint 1 of 8 (pos_min = 93, pos_max = 93, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 0 | |
|
prompt eval time = 20427.03 ms / 158 tokens ( 129.29 ms per token, 7.73 tokens per second) |
|
eval time = 852.28 ms / 11 tokens ( 77.48 ms per token, 12.91 tokens per second) |
|
total time = 21279.31 ms / 169 tokens |
|
slot release: id 0 | task 0 | stop processing: n_tokens = 168, truncated = 0 |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 10688094175 |
|
srv get_availabl: updating prompt cache |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv prompt_save: - saving prompt with length 168, total state size = 79.316 MiB |
|
srv load: - looking for better prompt, base f_keep = 0.125, sim = 0.001 |
|
srv update: - cache state: 1 prompts, 154.691 MiB (limits: 8192.000 MiB, 32768 tokens, 32768 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv get_availabl: prompt cache update took 422.29 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 2 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 2 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 16962 |
|
slot update_slots: id 0 | task 2 | n_past = 21, slot.prompt.tokens.size() = 168, seq_id = 0, pos_min = 167, n_swa = 1 |
|
slot update_slots: id 0 | task 2 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) |
|
slot update_slots: id 0 | task 2 | erased invalidated context checkpoint (pos_min = 93, pos_max = 93, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 2 | n_tokens = 0, memory_seq_rm [0, end) |
|
slot update_slots: id 0 | task 2 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.120740 |
|
slot update_slots: id 0 | task 2 | n_tokens = 2048, memory_seq_rm [2048, end) |
|
slot update_slots: id 0 | task 2 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.241481 |
|
slot update_slots: id 0 | task 2 | n_tokens = 4096, memory_seq_rm [4096, end) |
|
slot update_slots: id 0 | task 2 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.362221 |
|
slot update_slots: id 0 | task 2 | n_tokens = 6144, memory_seq_rm [6144, end) |
|
slot update_slots: id 0 | task 2 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, progress = 0.482962 |
|
slot update_slots: id 0 | task 2 | n_tokens = 8192, memory_seq_rm [8192, end) |
|
slot update_slots: id 0 | task 2 | prompt processing progress, n_tokens = 10240, batch.n_tokens = 2048, progress = 0.603702 |
|
slot update_slots: id 0 | task 2 | n_tokens = 10240, memory_seq_rm [10240, end) |
|
slot update_slots: id 0 | task 2 | prompt processing progress, n_tokens = 12288, batch.n_tokens = 2048, progress = 0.724443 |
|
slot update_slots: id 0 | task 2 | n_tokens = 12288, memory_seq_rm [12288, end) |
|
slot update_slots: id 0 | task 2 | prompt processing progress, n_tokens = 14336, batch.n_tokens = 2048, progress = 0.845183 |
|
slot update_slots: id 0 | task 2 | n_tokens = 14336, memory_seq_rm [14336, end) |
|
slot update_slots: id 0 | task 2 | prompt processing progress, n_tokens = 16384, batch.n_tokens = 2048, progress = 0.965924 |
|
slot update_slots: id 0 | task 2 | n_tokens = 16384, memory_seq_rm [16384, end) |
|
slot update_slots: id 0 | task 2 | prompt processing progress, n_tokens = 16898, batch.n_tokens = 514, progress = 0.996227 |
|
slot update_slots: id 0 | task 2 | n_tokens = 16898, memory_seq_rm [16898, end) |
|
slot update_slots: id 0 | task 2 | prompt processing progress, n_tokens = 16962, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 2 | prompt done, n_tokens = 16962, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 2 | init sampler, took 1.31 ms, tokens: text = 16962, total = 16962 |
|
slot update_slots: id 0 | task 2 | created context checkpoint 1 of 8 (pos_min = 16897, pos_max = 16897, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 2 | |
|
prompt eval time = 110084.95 ms / 16962 tokens ( 6.49 ms per token, 154.08 tokens per second) |
|
eval time = 9879.50 ms / 167 tokens ( 59.16 ms per token, 16.90 tokens per second) |
|
total time = 119964.45 ms / 17129 tokens |
|
slot release: id 0 | task 2 | stop processing: n_tokens = 17128, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 10808634380 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 17128, total state size = 477.010 MiB |
|
srv load: - looking for better prompt, base f_keep = 0.002, sim = 0.004 |
|
srv update: - cache state: 2 prompts, 707.077 MiB (limits: 8192.000 MiB, 32768 tokens, 200386 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e42624d83a0: 17128 tokens, checkpoints: 1, 552.385 MiB |
|
srv get_availabl: prompt cache update took 3980.01 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 190 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 190 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 9779 |
|
slot update_slots: id 0 | task 190 | n_past = 40, slot.prompt.tokens.size() = 17128, seq_id = 0, pos_min = 17127, n_swa = 1 |
|
slot update_slots: id 0 | task 190 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) |
|
slot update_slots: id 0 | task 190 | erased invalidated context checkpoint (pos_min = 16897, pos_max = 16897, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 190 | n_tokens = 0, memory_seq_rm [0, end) |
|
slot update_slots: id 0 | task 190 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.209428 |
|
slot update_slots: id 0 | task 190 | n_tokens = 2048, memory_seq_rm [2048, end) |
|
slot update_slots: id 0 | task 190 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.418857 |
|
slot update_slots: id 0 | task 190 | n_tokens = 4096, memory_seq_rm [4096, end) |
|
slot update_slots: id 0 | task 190 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.628285 |
|
slot update_slots: id 0 | task 190 | n_tokens = 6144, memory_seq_rm [6144, end) |
|
slot update_slots: id 0 | task 190 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, progress = 0.837713 |
|
slot update_slots: id 0 | task 190 | n_tokens = 8192, memory_seq_rm [8192, end) |
|
slot update_slots: id 0 | task 190 | prompt processing progress, n_tokens = 9715, batch.n_tokens = 1523, progress = 0.993455 |
|
slot update_slots: id 0 | task 190 | n_tokens = 9715, memory_seq_rm [9715, end) |
|
slot update_slots: id 0 | task 190 | prompt processing progress, n_tokens = 9779, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 190 | prompt done, n_tokens = 9779, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 190 | init sampler, took 0.80 ms, tokens: text = 9779, total = 9779 |
|
slot update_slots: id 0 | task 190 | created context checkpoint 1 of 8 (pos_min = 9714, pos_max = 9714, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 190 | |
|
prompt eval time = 43435.43 ms / 9779 tokens ( 4.44 ms per token, 225.14 tokens per second) |
|
eval time = 2675.64 ms / 45 tokens ( 59.46 ms per token, 16.82 tokens per second) |
|
total time = 46111.07 ms / 9824 tokens |
|
slot release: id 0 | task 190 | stop processing: n_tokens = 9823, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 10859055595 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 9823, total state size = 305.715 MiB |
|
srv params_from_: Chat format: Qwen3 Coder |
|
srv load: - looking for better prompt, base f_keep = 0.002, sim = 0.031 |
|
srv update: - cache state: 3 prompts, 1088.168 MiB (limits: 8192.000 MiB, 32768 tokens, 204158 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e42624d83a0: 17128 tokens, checkpoints: 1, 552.385 MiB |
|
srv update: - prompt 0x5e42681ad3f0: 9823 tokens, checkpoints: 1, 381.091 MiB |
|
srv get_availabl: prompt cache update took 1279.15 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 241 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 241 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 639 |
|
slot update_slots: id 0 | task 241 | n_past = 20, slot.prompt.tokens.size() = 9823, seq_id = 0, pos_min = 9822, n_swa = 1 |
|
slot update_slots: id 0 | task 241 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) |
|
slot update_slots: id 0 | task 241 | erased invalidated context checkpoint (pos_min = 9714, pos_max = 9714, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 241 | n_tokens = 0, memory_seq_rm [0, end) |
|
slot update_slots: id 0 | task 241 | prompt processing progress, n_tokens = 575, batch.n_tokens = 575, progress = 0.899844 |
|
slot update_slots: id 0 | task 241 | n_tokens = 575, memory_seq_rm [575, end) |
|
slot update_slots: id 0 | task 241 | prompt processing progress, n_tokens = 639, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 241 | prompt done, n_tokens = 639, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 241 | init sampler, took 0.06 ms, tokens: text = 639, total = 639 |
|
slot update_slots: id 0 | task 241 | created context checkpoint 1 of 8 (pos_min = 574, pos_max = 574, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 241 | |
|
prompt eval time = 5353.33 ms / 639 tokens ( 8.38 ms per token, 119.36 tokens per second) |
|
eval time = 1218.51 ms / 23 tokens ( 52.98 ms per token, 18.88 tokens per second) |
|
total time = 6571.85 ms / 662 tokens |
|
slot release: id 0 | task 241 | stop processing: n_tokens = 661, truncated = 0 |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 10867785632 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 661, total state size = 90.876 MiB |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv load: - looking for better prompt, base f_keep = 0.030, sim = 0.002 |
|
srv load: - found better prompt with f_keep = 1.000, sim = 0.964 |
|
srv update: - cache state: 3 prompts, 873.329 MiB (limits: 8192.000 MiB, 32768 tokens, 168440 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e42624d83a0: 17128 tokens, checkpoints: 1, 552.385 MiB |
|
srv update: - prompt 0x5e4267889560: 661 tokens, checkpoints: 1, 166.252 MiB |
|
srv get_availabl: prompt cache update took 1476.50 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 242 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 242 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 10190 |
|
slot update_slots: id 0 | task 242 | n_tokens = 9823, memory_seq_rm [9823, end) |
|
slot update_slots: id 0 | task 242 | prompt processing progress, n_tokens = 10126, batch.n_tokens = 303, progress = 0.993719 |
|
slot update_slots: id 0 | task 242 | n_tokens = 10126, memory_seq_rm [10126, end) |
|
slot update_slots: id 0 | task 242 | prompt processing progress, n_tokens = 10190, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 242 | prompt done, n_tokens = 10190, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 242 | init sampler, took 0.81 ms, tokens: text = 10190, total = 10190 |
|
slot update_slots: id 0 | task 242 | created context checkpoint 2 of 8 (pos_min = 10125, pos_max = 10125, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 242 | |
|
prompt eval time = 3482.72 ms / 367 tokens ( 9.49 ms per token, 105.38 tokens per second) |
|
eval time = 3469.33 ms / 59 tokens ( 58.80 ms per token, 17.01 tokens per second) |
|
total time = 6952.05 ms / 426 tokens |
|
slot release: id 0 | task 242 | stop processing: n_tokens = 10248, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 10876215074 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 10248, total state size = 315.681 MiB |
|
srv params_from_: Chat format: Qwen3 Coder |
|
srv load: - looking for better prompt, base f_keep = 0.002, sim = 0.035 |
|
srv update: - cache state: 4 prompts, 1339.761 MiB (limits: 8192.000 MiB, 32768 tokens, 172460 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e42624d83a0: 17128 tokens, checkpoints: 1, 552.385 MiB |
|
srv update: - prompt 0x5e4267889560: 661 tokens, checkpoints: 1, 166.252 MiB |
|
srv update: - prompt 0x5e42699ac270: 10248 tokens, checkpoints: 2, 466.433 MiB |
|
srv get_availabl: prompt cache update took 1718.98 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 327 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 327 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 573 |
|
slot update_slots: id 0 | task 327 | n_past = 20, slot.prompt.tokens.size() = 10248, seq_id = 0, pos_min = 10247, n_swa = 1 |
|
slot update_slots: id 0 | task 327 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) |
|
slot update_slots: id 0 | task 327 | erased invalidated context checkpoint (pos_min = 9714, pos_max = 9714, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 327 | erased invalidated context checkpoint (pos_min = 10125, pos_max = 10125, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 327 | n_tokens = 0, memory_seq_rm [0, end) |
|
slot update_slots: id 0 | task 327 | prompt processing progress, n_tokens = 509, batch.n_tokens = 509, progress = 0.888307 |
|
slot update_slots: id 0 | task 327 | n_tokens = 509, memory_seq_rm [509, end) |
|
slot update_slots: id 0 | task 327 | prompt processing progress, n_tokens = 573, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 327 | prompt done, n_tokens = 573, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 327 | init sampler, took 0.05 ms, tokens: text = 573, total = 573 |
|
slot update_slots: id 0 | task 327 | created context checkpoint 1 of 8 (pos_min = 508, pos_max = 508, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 327 | |
|
prompt eval time = 4815.77 ms / 573 tokens ( 8.40 ms per token, 118.98 tokens per second) |
|
eval time = 1245.78 ms / 23 tokens ( 54.16 ms per token, 18.46 tokens per second) |
|
total time = 6061.55 ms / 596 tokens |
|
slot release: id 0 | task 327 | stop processing: n_tokens = 595, truncated = 0 |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 10884113885 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 595, total state size = 89.328 MiB |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv load: - looking for better prompt, base f_keep = 0.034, sim = 0.002 |
|
srv load: - found better prompt with f_keep = 1.000, sim = 0.972 |
|
srv update: - cache state: 4 prompts, 1038.033 MiB (limits: 8192.000 MiB, 32768 tokens, 146409 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e42624d83a0: 17128 tokens, checkpoints: 1, 552.385 MiB |
|
srv update: - prompt 0x5e4267889560: 661 tokens, checkpoints: 1, 166.252 MiB |
|
srv update: - prompt 0x5e4268394340: 595 tokens, checkpoints: 1, 164.704 MiB |
|
srv get_availabl: prompt cache update took 908.73 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 328 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 328 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 10538 |
|
slot update_slots: id 0 | task 328 | n_tokens = 10248, memory_seq_rm [10248, end) |
|
slot update_slots: id 0 | task 328 | prompt processing progress, n_tokens = 10474, batch.n_tokens = 226, progress = 0.993927 |
|
slot update_slots: id 0 | task 328 | n_tokens = 10474, memory_seq_rm [10474, end) |
|
slot update_slots: id 0 | task 328 | prompt processing progress, n_tokens = 10538, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 328 | prompt done, n_tokens = 10538, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 328 | init sampler, took 0.82 ms, tokens: text = 10538, total = 10538 |
|
slot update_slots: id 0 | task 328 | created context checkpoint 3 of 8 (pos_min = 10473, pos_max = 10473, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 328 | |
|
prompt eval time = 3459.38 ms / 290 tokens ( 11.93 ms per token, 83.83 tokens per second) |
|
eval time = 2195.82 ms / 39 tokens ( 56.30 ms per token, 17.76 tokens per second) |
|
total time = 5655.20 ms / 329 tokens |
|
slot release: id 0 | task 328 | stop processing: n_tokens = 10576, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.995 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 393 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 393 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 10631 |
|
slot update_slots: id 0 | task 393 | n_tokens = 10576, memory_seq_rm [10576, end) |
|
slot update_slots: id 0 | task 393 | prompt processing progress, n_tokens = 10631, batch.n_tokens = 55, progress = 1.000000 |
|
slot update_slots: id 0 | task 393 | prompt done, n_tokens = 10631, batch.n_tokens = 55 |
|
slot init_sampler: id 0 | task 393 | init sampler, took 0.86 ms, tokens: text = 10631, total = 10631 |
|
slot update_slots: id 0 | task 393 | created context checkpoint 4 of 8 (pos_min = 10575, pos_max = 10575, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 393 | |
|
prompt eval time = 1321.29 ms / 55 tokens ( 24.02 ms per token, 41.63 tokens per second) |
|
eval time = 1632.40 ms / 30 tokens ( 54.41 ms per token, 18.38 tokens per second) |
|
total time = 2953.69 ms / 85 tokens |
|
slot release: id 0 | task 393 | stop processing: n_tokens = 10660, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.802 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 424 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 424 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13296 |
|
slot update_slots: id 0 | task 424 | n_tokens = 10660, memory_seq_rm [10660, end) |
|
slot update_slots: id 0 | task 424 | prompt processing progress, n_tokens = 12708, batch.n_tokens = 2048, progress = 0.955776 |
|
slot update_slots: id 0 | task 424 | n_tokens = 12708, memory_seq_rm [12708, end) |
|
slot update_slots: id 0 | task 424 | prompt processing progress, n_tokens = 13232, batch.n_tokens = 524, progress = 0.995187 |
|
slot update_slots: id 0 | task 424 | n_tokens = 13232, memory_seq_rm [13232, end) |
|
slot update_slots: id 0 | task 424 | prompt processing progress, n_tokens = 13296, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 424 | prompt done, n_tokens = 13296, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 424 | init sampler, took 1.08 ms, tokens: text = 13296, total = 13296 |
|
slot update_slots: id 0 | task 424 | created context checkpoint 5 of 8 (pos_min = 13231, pos_max = 13231, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 424 | |
|
prompt eval time = 19034.14 ms / 2636 tokens ( 7.22 ms per token, 138.49 tokens per second) |
|
eval time = 1692.73 ms / 32 tokens ( 52.90 ms per token, 18.90 tokens per second) |
|
total time = 20726.87 ms / 2668 tokens |
|
slot release: id 0 | task 424 | stop processing: n_tokens = 13327, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.985 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 459 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 459 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13529 |
|
slot update_slots: id 0 | task 459 | n_tokens = 13327, memory_seq_rm [13327, end) |
|
slot update_slots: id 0 | task 459 | prompt processing progress, n_tokens = 13465, batch.n_tokens = 138, progress = 0.995269 |
|
slot update_slots: id 0 | task 459 | n_tokens = 13465, memory_seq_rm [13465, end) |
|
slot update_slots: id 0 | task 459 | prompt processing progress, n_tokens = 13529, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 459 | prompt done, n_tokens = 13529, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 459 | init sampler, took 1.10 ms, tokens: text = 13529, total = 13529 |
|
slot update_slots: id 0 | task 459 | created context checkpoint 6 of 8 (pos_min = 13464, pos_max = 13464, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 459 | |
|
prompt eval time = 3199.59 ms / 202 tokens ( 15.84 ms per token, 63.13 tokens per second) |
|
eval time = 3666.16 ms / 62 tokens ( 59.13 ms per token, 16.91 tokens per second) |
|
total time = 6865.74 ms / 264 tokens |
|
slot release: id 0 | task 459 | stop processing: n_tokens = 13590, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 523 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 523 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13668 |
|
slot update_slots: id 0 | task 523 | n_tokens = 13590, memory_seq_rm [13590, end) |
|
slot update_slots: id 0 | task 523 | prompt processing progress, n_tokens = 13604, batch.n_tokens = 14, progress = 0.995318 |
|
slot update_slots: id 0 | task 523 | n_tokens = 13604, memory_seq_rm [13604, end) |
|
slot update_slots: id 0 | task 523 | prompt processing progress, n_tokens = 13668, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 523 | prompt done, n_tokens = 13668, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 523 | init sampler, took 1.31 ms, tokens: text = 13668, total = 13668 |
|
slot update_slots: id 0 | task 523 | created context checkpoint 7 of 8 (pos_min = 13603, pos_max = 13603, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 523 | |
|
prompt eval time = 1302.44 ms / 78 tokens ( 16.70 ms per token, 59.89 tokens per second) |
|
eval time = 3741.01 ms / 67 tokens ( 55.84 ms per token, 17.91 tokens per second) |
|
total time = 5043.44 ms / 145 tokens |
|
slot release: id 0 | task 523 | stop processing: n_tokens = 13734, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 592 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 592 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13823 |
|
slot update_slots: id 0 | task 592 | n_tokens = 13734, memory_seq_rm [13734, end) |
|
slot update_slots: id 0 | task 592 | prompt processing progress, n_tokens = 13759, batch.n_tokens = 25, progress = 0.995370 |
|
slot update_slots: id 0 | task 592 | n_tokens = 13759, memory_seq_rm [13759, end) |
|
slot update_slots: id 0 | task 592 | prompt processing progress, n_tokens = 13823, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 592 | prompt done, n_tokens = 13823, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 592 | init sampler, took 1.36 ms, tokens: text = 13823, total = 13823 |
|
slot update_slots: id 0 | task 592 | created context checkpoint 8 of 8 (pos_min = 13758, pos_max = 13758, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 592 | |
|
prompt eval time = 2094.38 ms / 89 tokens ( 23.53 ms per token, 42.49 tokens per second) |
|
eval time = 4097.58 ms / 73 tokens ( 56.13 ms per token, 17.82 tokens per second) |
|
total time = 6191.97 ms / 162 tokens |
|
slot release: id 0 | task 592 | stop processing: n_tokens = 13895, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 667 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 667 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13976 |
|
slot update_slots: id 0 | task 667 | n_tokens = 13895, memory_seq_rm [13895, end) |
|
slot update_slots: id 0 | task 667 | prompt processing progress, n_tokens = 13912, batch.n_tokens = 17, progress = 0.995421 |
|
slot update_slots: id 0 | task 667 | n_tokens = 13912, memory_seq_rm [13912, end) |
|
slot update_slots: id 0 | task 667 | prompt processing progress, n_tokens = 13976, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 667 | prompt done, n_tokens = 13976, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 667 | init sampler, took 1.12 ms, tokens: text = 13976, total = 13976 |
|
slot update_slots: id 0 | task 667 | erasing old context checkpoint (pos_min = 9714, pos_max = 9714, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 667 | created context checkpoint 8 of 8 (pos_min = 13911, pos_max = 13911, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 667 | |
|
prompt eval time = 1541.54 ms / 81 tokens ( 19.03 ms per token, 52.54 tokens per second) |
|
eval time = 2121.52 ms / 37 tokens ( 57.34 ms per token, 17.44 tokens per second) |
|
total time = 3663.06 ms / 118 tokens |
|
slot release: id 0 | task 667 | stop processing: n_tokens = 14012, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.991 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 706 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 706 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 14137 |
|
slot update_slots: id 0 | task 706 | n_tokens = 14012, memory_seq_rm [14012, end) |
|
slot update_slots: id 0 | task 706 | prompt processing progress, n_tokens = 14073, batch.n_tokens = 61, progress = 0.995473 |
|
slot update_slots: id 0 | task 706 | n_tokens = 14073, memory_seq_rm [14073, end) |
|
slot update_slots: id 0 | task 706 | prompt processing progress, n_tokens = 14137, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 706 | prompt done, n_tokens = 14137, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 706 | init sampler, took 1.06 ms, tokens: text = 14137, total = 14137 |
|
slot update_slots: id 0 | task 706 | erasing old context checkpoint (pos_min = 10125, pos_max = 10125, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 706 | created context checkpoint 8 of 8 (pos_min = 14072, pos_max = 14072, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 706 | |
|
prompt eval time = 2541.72 ms / 125 tokens ( 20.33 ms per token, 49.18 tokens per second) |
|
eval time = 1790.02 ms / 31 tokens ( 57.74 ms per token, 17.32 tokens per second) |
|
total time = 4331.74 ms / 156 tokens |
|
slot release: id 0 | task 706 | stop processing: n_tokens = 14167, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.987 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 739 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 739 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 14349 |
|
slot update_slots: id 0 | task 739 | n_tokens = 14167, memory_seq_rm [14167, end) |
|
slot update_slots: id 0 | task 739 | prompt processing progress, n_tokens = 14285, batch.n_tokens = 118, progress = 0.995540 |
|
slot update_slots: id 0 | task 739 | n_tokens = 14285, memory_seq_rm [14285, end) |
|
slot update_slots: id 0 | task 739 | prompt processing progress, n_tokens = 14349, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 739 | prompt done, n_tokens = 14349, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 739 | init sampler, took 1.27 ms, tokens: text = 14349, total = 14349 |
|
slot update_slots: id 0 | task 739 | erasing old context checkpoint (pos_min = 10473, pos_max = 10473, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 739 | created context checkpoint 8 of 8 (pos_min = 14284, pos_max = 14284, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 739 | |
|
prompt eval time = 2861.86 ms / 182 tokens ( 15.72 ms per token, 63.59 tokens per second) |
|
eval time = 1670.05 ms / 31 tokens ( 53.87 ms per token, 18.56 tokens per second) |
|
total time = 4531.91 ms / 213 tokens |
|
slot release: id 0 | task 739 | stop processing: n_tokens = 14379, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.987 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 772 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 772 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 14573 |
|
slot update_slots: id 0 | task 772 | n_tokens = 14379, memory_seq_rm [14379, end) |
|
slot update_slots: id 0 | task 772 | prompt processing progress, n_tokens = 14509, batch.n_tokens = 130, progress = 0.995608 |
|
slot update_slots: id 0 | task 772 | n_tokens = 14509, memory_seq_rm [14509, end) |
|
slot update_slots: id 0 | task 772 | prompt processing progress, n_tokens = 14573, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 772 | prompt done, n_tokens = 14573, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 772 | init sampler, took 1.18 ms, tokens: text = 14573, total = 14573 |
|
slot update_slots: id 0 | task 772 | erasing old context checkpoint (pos_min = 10575, pos_max = 10575, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 772 | created context checkpoint 8 of 8 (pos_min = 14508, pos_max = 14508, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 772 | |
|
prompt eval time = 2477.22 ms / 194 tokens ( 12.77 ms per token, 78.31 tokens per second) |
|
eval time = 2595.48 ms / 48 tokens ( 54.07 ms per token, 18.49 tokens per second) |
|
total time = 5072.70 ms / 242 tokens |
|
slot release: id 0 | task 772 | stop processing: n_tokens = 14620, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 10957331969 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 14620, total state size = 418.200 MiB |
|
srv params_from_: Chat format: Qwen3 Coder |
|
srv load: - looking for better prompt, base f_keep = 0.001, sim = 0.061 |
|
srv load: - found better prompt with f_keep = 0.451, sim = 0.903 |
|
srv update: - cache state: 4 prompts, 1892.987 MiB (limits: 8192.000 MiB, 32768 tokens, 140693 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e42624d83a0: 17128 tokens, checkpoints: 1, 552.385 MiB |
|
srv update: - prompt 0x5e4268394340: 595 tokens, checkpoints: 1, 164.704 MiB |
|
srv update: - prompt 0x5e424eb36820: 14620 tokens, checkpoints: 8, 1021.206 MiB |
|
srv get_availabl: prompt cache update took 7377.53 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 822 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 822 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 330 |
|
slot update_slots: id 0 | task 822 | n_past = 298, slot.prompt.tokens.size() = 661, seq_id = 0, pos_min = 660, n_swa = 1 |
|
slot update_slots: id 0 | task 822 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) |
|
slot update_slots: id 0 | task 822 | erased invalidated context checkpoint (pos_min = 574, pos_max = 574, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 822 | n_tokens = 0, memory_seq_rm [0, end) |
|
slot update_slots: id 0 | task 822 | prompt processing progress, n_tokens = 266, batch.n_tokens = 266, progress = 0.806061 |
|
slot update_slots: id 0 | task 822 | n_tokens = 266, memory_seq_rm [266, end) |
|
slot update_slots: id 0 | task 822 | prompt processing progress, n_tokens = 330, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 822 | prompt done, n_tokens = 330, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 822 | init sampler, took 0.03 ms, tokens: text = 330, total = 330 |
|
slot update_slots: id 0 | task 822 | created context checkpoint 1 of 8 (pos_min = 265, pos_max = 265, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 822 | |
|
prompt eval time = 4005.07 ms / 330 tokens ( 12.14 ms per token, 82.40 tokens per second) |
|
eval time = 1264.52 ms / 23 tokens ( 54.98 ms per token, 18.19 tokens per second) |
|
total time = 5269.59 ms / 353 tokens |
|
slot release: id 0 | task 822 | stop processing: n_tokens = 352, truncated = 0 |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 10970068296 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 352, total state size = 83.630 MiB |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv load: - looking for better prompt, base f_keep = 0.057, sim = 0.001 |
|
srv load: - found better prompt with f_keep = 1.000, sim = 0.996 |
|
srv update: - cache state: 4 prompts, 1030.787 MiB (limits: 8192.000 MiB, 32768 tokens, 144983 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e42624d83a0: 17128 tokens, checkpoints: 1, 552.385 MiB |
|
srv update: - prompt 0x5e4268394340: 595 tokens, checkpoints: 1, 164.704 MiB |
|
srv update: - prompt 0x5e426ad8a3f0: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv get_availabl: prompt cache update took 283.02 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 823 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 823 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 14674 |
|
slot update_slots: id 0 | task 823 | n_tokens = 14620, memory_seq_rm [14620, end) |
|
slot update_slots: id 0 | task 823 | prompt processing progress, n_tokens = 14674, batch.n_tokens = 54, progress = 1.000000 |
|
slot update_slots: id 0 | task 823 | prompt done, n_tokens = 14674, batch.n_tokens = 54 |
|
slot init_sampler: id 0 | task 823 | init sampler, took 1.45 ms, tokens: text = 14674, total = 14674 |
|
slot update_slots: id 0 | task 823 | erasing old context checkpoint (pos_min = 13231, pos_max = 13231, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 823 | created context checkpoint 8 of 8 (pos_min = 14619, pos_max = 14619, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 823 | |
|
prompt eval time = 1103.75 ms / 54 tokens ( 20.44 ms per token, 48.92 tokens per second) |
|
eval time = 37767.49 ms / 662 tokens ( 57.05 ms per token, 17.53 tokens per second) |
|
total time = 38871.24 ms / 716 tokens |
|
slot release: id 0 | task 823 | stop processing: n_tokens = 15335, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 11009224197 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 15335, total state size = 434.966 MiB |
|
srv load: - looking for better prompt, base f_keep = 0.003, sim = 0.002 |
|
srv load: - found better prompt with f_keep = 1.000, sim = 0.961 |
|
srv update: - cache state: 4 prompts, 1516.374 MiB (limits: 8192.000 MiB, 32768 tokens, 88868 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e4268394340: 595 tokens, checkpoints: 1, 164.704 MiB |
|
srv update: - prompt 0x5e426ad8a3f0: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e426a0950c0: 15335 tokens, checkpoints: 8, 1037.972 MiB |
|
srv get_availabl: prompt cache update took 6211.05 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 1510 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 1510 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 17825 |
|
slot update_slots: id 0 | task 1510 | n_tokens = 17128, memory_seq_rm [17128, end) |
|
slot update_slots: id 0 | task 1510 | prompt processing progress, n_tokens = 17761, batch.n_tokens = 633, progress = 0.996410 |
|
slot update_slots: id 0 | task 1510 | n_tokens = 17761, memory_seq_rm [17761, end) |
|
slot update_slots: id 0 | task 1510 | prompt processing progress, n_tokens = 17825, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 1510 | prompt done, n_tokens = 17825, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 1510 | init sampler, took 1.38 ms, tokens: text = 17825, total = 17825 |
|
slot update_slots: id 0 | task 1510 | created context checkpoint 2 of 8 (pos_min = 17760, pos_max = 17760, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 1510 | |
|
prompt eval time = 8561.37 ms / 697 tokens ( 12.28 ms per token, 81.41 tokens per second) |
|
eval time = 2397.15 ms / 44 tokens ( 54.48 ms per token, 18.36 tokens per second) |
|
total time = 10958.52 ms / 741 tokens |
|
slot release: id 0 | task 1510 | stop processing: n_tokens = 17868, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.873 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 1556 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 1556 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 20474 |
|
slot update_slots: id 0 | task 1556 | n_tokens = 17868, memory_seq_rm [17868, end) |
|
slot update_slots: id 0 | task 1556 | prompt processing progress, n_tokens = 19916, batch.n_tokens = 2048, progress = 0.972746 |
|
slot update_slots: id 0 | task 1556 | n_tokens = 19916, memory_seq_rm [19916, end) |
|
slot update_slots: id 0 | task 1556 | prompt processing progress, n_tokens = 20410, batch.n_tokens = 494, progress = 0.996874 |
|
slot update_slots: id 0 | task 1556 | n_tokens = 20410, memory_seq_rm [20410, end) |
|
slot update_slots: id 0 | task 1556 | prompt processing progress, n_tokens = 20474, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 1556 | prompt done, n_tokens = 20474, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 1556 | init sampler, took 1.64 ms, tokens: text = 20474, total = 20474 |
|
slot update_slots: id 0 | task 1556 | created context checkpoint 3 of 8 (pos_min = 20409, pos_max = 20409, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 1556 | |
|
prompt eval time = 22336.34 ms / 2606 tokens ( 8.57 ms per token, 116.67 tokens per second) |
|
eval time = 1963.70 ms / 34 tokens ( 57.76 ms per token, 17.31 tokens per second) |
|
total time = 24300.04 ms / 2640 tokens |
|
slot release: id 0 | task 1556 | stop processing: n_tokens = 20507, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.951 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 1593 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 1593 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21552 |
|
slot update_slots: id 0 | task 1593 | n_past = 20503, slot.prompt.tokens.size() = 20507, seq_id = 0, pos_min = 20506, n_swa = 1 |
|
slot update_slots: id 0 | task 1593 | restored context checkpoint (pos_min = 20409, pos_max = 20409, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 1593 | n_tokens = 20410, memory_seq_rm [20410, end) |
|
slot update_slots: id 0 | task 1593 | prompt processing progress, n_tokens = 21488, batch.n_tokens = 1078, progress = 0.997030 |
|
slot update_slots: id 0 | task 1593 | n_tokens = 21488, memory_seq_rm [21488, end) |
|
slot update_slots: id 0 | task 1593 | prompt processing progress, n_tokens = 21552, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 1593 | prompt done, n_tokens = 21552, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 1593 | init sampler, took 1.66 ms, tokens: text = 21552, total = 21552 |
|
slot update_slots: id 0 | task 1593 | created context checkpoint 4 of 8 (pos_min = 21487, pos_max = 21487, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 1593 | |
|
prompt eval time = 6580.21 ms / 1142 tokens ( 5.76 ms per token, 173.55 tokens per second) |
|
eval time = 20194.20 ms / 380 tokens ( 53.14 ms per token, 18.82 tokens per second) |
|
total time = 26774.40 ms / 1522 tokens |
|
slot release: id 0 | task 1593 | stop processing: n_tokens = 21931, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 11078044457 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 21931, total state size = 589.635 MiB |
|
srv load: - looking for better prompt, base f_keep = 0.002, sim = 0.004 |
|
srv update: - cache state: 5 prompts, 2407.512 MiB (limits: 8192.000 MiB, 32768 tokens, 130598 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e4268394340: 595 tokens, checkpoints: 1, 164.704 MiB |
|
srv update: - prompt 0x5e426ad8a3f0: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e426a0950c0: 15335 tokens, checkpoints: 8, 1037.972 MiB |
|
srv update: - prompt 0x5e426bba13c0: 21931 tokens, checkpoints: 4, 891.138 MiB |
|
srv get_availabl: prompt cache update took 3427.35 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 1975 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 1975 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 10107 |
|
slot update_slots: id 0 | task 1975 | n_past = 40, slot.prompt.tokens.size() = 21931, seq_id = 0, pos_min = 21930, n_swa = 1 |
|
slot update_slots: id 0 | task 1975 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) |
|
slot update_slots: id 0 | task 1975 | erased invalidated context checkpoint (pos_min = 16897, pos_max = 16897, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 1975 | erased invalidated context checkpoint (pos_min = 17760, pos_max = 17760, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 1975 | erased invalidated context checkpoint (pos_min = 20409, pos_max = 20409, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 1975 | erased invalidated context checkpoint (pos_min = 21487, pos_max = 21487, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 1975 | n_tokens = 0, memory_seq_rm [0, end) |
|
slot update_slots: id 0 | task 1975 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.202632 |
|
slot update_slots: id 0 | task 1975 | n_tokens = 2048, memory_seq_rm [2048, end) |
|
slot update_slots: id 0 | task 1975 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.405264 |
|
slot update_slots: id 0 | task 1975 | n_tokens = 4096, memory_seq_rm [4096, end) |
|
slot update_slots: id 0 | task 1975 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.607895 |
|
slot update_slots: id 0 | task 1975 | n_tokens = 6144, memory_seq_rm [6144, end) |
|
slot update_slots: id 0 | task 1975 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, progress = 0.810527 |
|
slot update_slots: id 0 | task 1975 | n_tokens = 8192, memory_seq_rm [8192, end) |
|
slot update_slots: id 0 | task 1975 | prompt processing progress, n_tokens = 10043, batch.n_tokens = 1851, progress = 0.993668 |
|
slot update_slots: id 0 | task 1975 | n_tokens = 10043, memory_seq_rm [10043, end) |
|
slot update_slots: id 0 | task 1975 | prompt processing progress, n_tokens = 10107, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 1975 | prompt done, n_tokens = 10107, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 1975 | init sampler, took 0.81 ms, tokens: text = 10107, total = 10107 |
|
slot update_slots: id 0 | task 1975 | created context checkpoint 1 of 8 (pos_min = 10042, pos_max = 10042, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 1975 | |
|
prompt eval time = 48887.82 ms / 10107 tokens ( 4.84 ms per token, 206.74 tokens per second) |
|
eval time = 2005.96 ms / 37 tokens ( 54.22 ms per token, 18.45 tokens per second) |
|
total time = 50893.78 ms / 10144 tokens |
|
slot release: id 0 | task 1975 | stop processing: n_tokens = 10143, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.967 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 2018 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 2018 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 10493 |
|
slot update_slots: id 0 | task 2018 | n_tokens = 10143, memory_seq_rm [10143, end) |
|
slot update_slots: id 0 | task 2018 | prompt processing progress, n_tokens = 10429, batch.n_tokens = 286, progress = 0.993901 |
|
slot update_slots: id 0 | task 2018 | n_tokens = 10429, memory_seq_rm [10429, end) |
|
slot update_slots: id 0 | task 2018 | prompt processing progress, n_tokens = 10493, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 2018 | prompt done, n_tokens = 10493, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 2018 | init sampler, took 0.82 ms, tokens: text = 10493, total = 10493 |
|
slot update_slots: id 0 | task 2018 | created context checkpoint 2 of 8 (pos_min = 10428, pos_max = 10428, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 2018 | |
|
prompt eval time = 4169.57 ms / 350 tokens ( 11.91 ms per token, 83.94 tokens per second) |
|
eval time = 2379.56 ms / 43 tokens ( 55.34 ms per token, 18.07 tokens per second) |
|
total time = 6549.12 ms / 393 tokens |
|
slot release: id 0 | task 2018 | stop processing: n_tokens = 10535, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.800 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 2063 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 2063 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13171 |
|
slot update_slots: id 0 | task 2063 | n_tokens = 10535, memory_seq_rm [10535, end) |
|
slot update_slots: id 0 | task 2063 | prompt processing progress, n_tokens = 12583, batch.n_tokens = 2048, progress = 0.955356 |
|
slot update_slots: id 0 | task 2063 | n_tokens = 12583, memory_seq_rm [12583, end) |
|
slot update_slots: id 0 | task 2063 | prompt processing progress, n_tokens = 13107, batch.n_tokens = 524, progress = 0.995141 |
|
slot update_slots: id 0 | task 2063 | n_tokens = 13107, memory_seq_rm [13107, end) |
|
slot update_slots: id 0 | task 2063 | prompt processing progress, n_tokens = 13171, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 2063 | prompt done, n_tokens = 13171, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 2063 | init sampler, took 1.07 ms, tokens: text = 13171, total = 13171 |
|
slot update_slots: id 0 | task 2063 | created context checkpoint 3 of 8 (pos_min = 13106, pos_max = 13106, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 2063 | |
|
prompt eval time = 15695.50 ms / 2636 tokens ( 5.95 ms per token, 167.95 tokens per second) |
|
eval time = 1647.42 ms / 32 tokens ( 51.48 ms per token, 19.42 tokens per second) |
|
total time = 17342.92 ms / 2668 tokens |
|
slot release: id 0 | task 2063 | stop processing: n_tokens = 13202, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.985 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 2098 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 2098 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13404 |
|
slot update_slots: id 0 | task 2098 | n_tokens = 13202, memory_seq_rm [13202, end) |
|
slot update_slots: id 0 | task 2098 | prompt processing progress, n_tokens = 13340, batch.n_tokens = 138, progress = 0.995225 |
|
slot update_slots: id 0 | task 2098 | n_tokens = 13340, memory_seq_rm [13340, end) |
|
slot update_slots: id 0 | task 2098 | prompt processing progress, n_tokens = 13404, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 2098 | prompt done, n_tokens = 13404, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 2098 | init sampler, took 1.07 ms, tokens: text = 13404, total = 13404 |
|
slot update_slots: id 0 | task 2098 | created context checkpoint 4 of 8 (pos_min = 13339, pos_max = 13339, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 2098 | |
|
prompt eval time = 2317.17 ms / 202 tokens ( 11.47 ms per token, 87.18 tokens per second) |
|
eval time = 2776.01 ms / 49 tokens ( 56.65 ms per token, 17.65 tokens per second) |
|
total time = 5093.19 ms / 251 tokens |
|
slot release: id 0 | task 2098 | stop processing: n_tokens = 13452, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.978 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 2149 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 2149 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13753 |
|
slot update_slots: id 0 | task 2149 | n_tokens = 13452, memory_seq_rm [13452, end) |
|
slot update_slots: id 0 | task 2149 | prompt processing progress, n_tokens = 13689, batch.n_tokens = 237, progress = 0.995346 |
|
^Oslot update_slots: id 0 | task 2149 | n_tokens = 13689, memory_seq_rm [13689, end) |
|
slot update_slots: id 0 | task 2149 | prompt processing progress, n_tokens = 13753, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 2149 | prompt done, n_tokens = 13753, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 2149 | init sampler, took 1.14 ms, tokens: text = 13753, total = 13753 |
|
slot update_slots: id 0 | task 2149 | created context checkpoint 5 of 8 (pos_min = 13688, pos_max = 13688, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 2149 | |
|
prompt eval time = 3427.18 ms / 301 tokens ( 11.39 ms per token, 87.83 tokens per second) |
|
eval time = 2794.28 ms / 46 tokens ( 60.75 ms per token, 16.46 tokens per second) |
|
total time = 6221.46 ms / 347 tokens |
|
slot release: id 0 | task 2149 | stop processing: n_tokens = 13798, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 11171476651 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 13798, total state size = 398.925 MiB |
|
srv params_from_: Chat format: Qwen3 Coder |
|
srv load: - looking for better prompt, base f_keep = 0.001, sim = 0.063 |
|
srv update: - cache state: 6 prompts, 3183.316 MiB (limits: 8192.000 MiB, 32768 tokens, 134278 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e4268394340: 595 tokens, checkpoints: 1, 164.704 MiB |
|
srv update: - prompt 0x5e426ad8a3f0: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e426a0950c0: 15335 tokens, checkpoints: 8, 1037.972 MiB |
|
srv update: - prompt 0x5e426bba13c0: 21931 tokens, checkpoints: 4, 891.138 MiB |
|
srv update: - prompt 0x5e42692b4330: 13798 tokens, checkpoints: 5, 775.804 MiB |
|
srv get_availabl: prompt cache update took 2569.31 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 2197 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 2197 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 319 |
|
slot update_slots: id 0 | task 2197 | n_past = 20, slot.prompt.tokens.size() = 13798, seq_id = 0, pos_min = 13797, n_swa = 1 |
|
slot update_slots: id 0 | task 2197 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) |
|
slot update_slots: id 0 | task 2197 | erased invalidated context checkpoint (pos_min = 10042, pos_max = 10042, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 2197 | erased invalidated context checkpoint (pos_min = 10428, pos_max = 10428, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 2197 | erased invalidated context checkpoint (pos_min = 13106, pos_max = 13106, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 2197 | erased invalidated context checkpoint (pos_min = 13339, pos_max = 13339, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 2197 | erased invalidated context checkpoint (pos_min = 13688, pos_max = 13688, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 2197 | n_tokens = 0, memory_seq_rm [0, end) |
|
slot update_slots: id 0 | task 2197 | prompt processing progress, n_tokens = 255, batch.n_tokens = 255, progress = 0.799373 |
|
slot update_slots: id 0 | task 2197 | n_tokens = 255, memory_seq_rm [255, end) |
|
slot update_slots: id 0 | task 2197 | prompt processing progress, n_tokens = 319, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 2197 | prompt done, n_tokens = 319, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 2197 | init sampler, took 0.03 ms, tokens: text = 319, total = 319 |
|
slot update_slots: id 0 | task 2197 | created context checkpoint 1 of 8 (pos_min = 254, pos_max = 254, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 2197 | |
|
prompt eval time = 6916.62 ms / 319 tokens ( 21.68 ms per token, 46.12 tokens per second) |
|
eval time = 2074.63 ms / 34 tokens ( 61.02 ms per token, 16.39 tokens per second) |
|
total time = 8991.25 ms / 353 tokens |
|
slot release: id 0 | task 2197 | stop processing: n_tokens = 352, truncated = 0 |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 11183297025 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 352, total state size = 83.630 MiB |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv load: - looking for better prompt, base f_keep = 0.057, sim = 0.001 |
|
srv load: - found better prompt with f_keep = 1.000, sim = 0.997 |
|
srv update: - cache state: 6 prompts, 2566.518 MiB (limits: 8192.000 MiB, 32768 tokens, 123630 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e4268394340: 595 tokens, checkpoints: 1, 164.704 MiB |
|
srv update: - prompt 0x5e426ad8a3f0: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e426a0950c0: 15335 tokens, checkpoints: 8, 1037.972 MiB |
|
srv update: - prompt 0x5e426bba13c0: 21931 tokens, checkpoints: 4, 891.138 MiB |
|
srv update: - prompt 0x5e426ad9d890: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv get_availabl: prompt cache update took 589.95 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 2198 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 2198 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13843 |
|
slot update_slots: id 0 | task 2198 | n_tokens = 13798, memory_seq_rm [13798, end) |
|
slot update_slots: id 0 | task 2198 | prompt processing progress, n_tokens = 13843, batch.n_tokens = 45, progress = 1.000000 |
|
slot update_slots: id 0 | task 2198 | prompt done, n_tokens = 13843, batch.n_tokens = 45 |
|
slot init_sampler: id 0 | task 2198 | init sampler, took 1.12 ms, tokens: text = 13843, total = 13843 |
|
slot update_slots: id 0 | task 2198 | created context checkpoint 6 of 8 (pos_min = 13797, pos_max = 13797, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 2198 | |
|
prompt eval time = 1294.61 ms / 45 tokens ( 28.77 ms per token, 34.76 tokens per second) |
|
eval time = 1716.28 ms / 31 tokens ( 55.36 ms per token, 18.06 tokens per second) |
|
total time = 3010.89 ms / 76 tokens |
|
slot release: id 0 | task 2198 | stop processing: n_tokens = 13873, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.986 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 2265 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 2265 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 14067 |
|
slot update_slots: id 0 | task 2265 | n_tokens = 13873, memory_seq_rm [13873, end) |
|
slot update_slots: id 0 | task 2265 | prompt processing progress, n_tokens = 14003, batch.n_tokens = 130, progress = 0.995450 |
|
slot update_slots: id 0 | task 2265 | n_tokens = 14003, memory_seq_rm [14003, end) |
|
slot update_slots: id 0 | task 2265 | prompt processing progress, n_tokens = 14067, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 2265 | prompt done, n_tokens = 14067, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 2265 | init sampler, took 1.34 ms, tokens: text = 14067, total = 14067 |
|
slot update_slots: id 0 | task 2265 | created context checkpoint 7 of 8 (pos_min = 14002, pos_max = 14002, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 2265 | |
|
prompt eval time = 3538.61 ms / 194 tokens ( 18.24 ms per token, 54.82 tokens per second) |
|
eval time = 98183.48 ms / 1680 tokens ( 58.44 ms per token, 17.11 tokens per second) |
|
total time = 101722.09 ms / 1874 tokens |
|
slot release: id 0 | task 2265 | stop processing: n_tokens = 15746, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 11288798023 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 15746, total state size = 444.603 MiB |
|
srv load: - looking for better prompt, base f_keep = 0.003, sim = 0.002 |
|
srv load: - found better prompt with f_keep = 1.000, sim = 0.928 |
|
srv update: - cache state: 6 prompts, 2647.614 MiB (limits: 8192.000 MiB, 32768 tokens, 100706 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e4268394340: 595 tokens, checkpoints: 1, 164.704 MiB |
|
srv update: - prompt 0x5e426ad8a3f0: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e426a0950c0: 15335 tokens, checkpoints: 8, 1037.972 MiB |
|
srv update: - prompt 0x5e426ad9d890: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e424eb6dbb0: 15746 tokens, checkpoints: 7, 972.234 MiB |
|
srv get_availabl: prompt cache update took 11613.77 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 3947 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 3947 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 23642 |
|
slot update_slots: id 0 | task 3947 | n_tokens = 21931, memory_seq_rm [21931, end) |
|
slot update_slots: id 0 | task 3947 | prompt processing progress, n_tokens = 23578, batch.n_tokens = 1647, progress = 0.997293 |
|
slot update_slots: id 0 | task 3947 | n_tokens = 23578, memory_seq_rm [23578, end) |
|
slot update_slots: id 0 | task 3947 | prompt processing progress, n_tokens = 23642, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 3947 | prompt done, n_tokens = 23642, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 3947 | init sampler, took 2.13 ms, tokens: text = 23642, total = 23642 |
|
slot update_slots: id 0 | task 3947 | created context checkpoint 5 of 8 (pos_min = 23577, pos_max = 23577, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 3947 | |
|
prompt eval time = 15749.75 ms / 1711 tokens ( 9.21 ms per token, 108.64 tokens per second) |
|
eval time = 11339.72 ms / 196 tokens ( 57.86 ms per token, 17.28 tokens per second) |
|
total time = 27089.48 ms / 1907 tokens |
|
slot release: id 0 | task 3947 | stop processing: n_tokens = 23837, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
^[[B^[[Asrv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 4145 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 4145 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 23894 |
|
slot update_slots: id 0 | task 4145 | n_tokens = 23837, memory_seq_rm [23837, end) |
|
slot update_slots: id 0 | task 4145 | prompt processing progress, n_tokens = 23894, batch.n_tokens = 57, progress = 1.000000 |
|
slot update_slots: id 0 | task 4145 | prompt done, n_tokens = 23894, batch.n_tokens = 57 |
|
slot init_sampler: id 0 | task 4145 | init sampler, took 1.93 ms, tokens: text = 23894, total = 23894 |
|
slot update_slots: id 0 | task 4145 | created context checkpoint 6 of 8 (pos_min = 23836, pos_max = 23836, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 4145 | |
|
prompt eval time = 8031.84 ms / 57 tokens ( 140.91 ms per token, 7.10 tokens per second) |
|
eval time = 53588.53 ms / 802 tokens ( 66.82 ms per token, 14.97 tokens per second) |
|
total time = 61620.37 ms / 859 tokens |
|
slot release: id 0 | task 4145 | stop processing: n_tokens = 24695, truncated = 0 |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv update_slots: all slots are idle |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 4948 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 4948 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 24838 |
|
slot update_slots: id 0 | task 4948 | n_tokens = 24695, memory_seq_rm [24695, end) |
|
slot update_slots: id 0 | task 4948 | prompt processing progress, n_tokens = 24774, batch.n_tokens = 79, progress = 0.997423 |
|
slot update_slots: id 0 | task 4948 | n_tokens = 24774, memory_seq_rm [24774, end) |
|
slot update_slots: id 0 | task 4948 | prompt processing progress, n_tokens = 24838, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 4948 | prompt done, n_tokens = 24838, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 4948 | init sampler, took 2.04 ms, tokens: text = 24838, total = 24838 |
|
slot update_slots: id 0 | task 4948 | created context checkpoint 7 of 8 (pos_min = 24773, pos_max = 24773, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 4948 | |
|
prompt eval time = 5046.80 ms / 143 tokens ( 35.29 ms per token, 28.33 tokens per second) |
|
eval time = 841.27 ms / 14 tokens ( 60.09 ms per token, 16.64 tokens per second) |
|
total time = 5888.07 ms / 157 tokens |
|
slot release: id 0 | task 4948 | stop processing: n_tokens = 24851, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.952 (> 0.100 thold), f_keep = 0.681 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 4964 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 4964 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 17765 |
|
slot update_slots: id 0 | task 4964 | n_past = 16917, slot.prompt.tokens.size() = 24851, seq_id = 0, pos_min = 24850, n_swa = 1 |
|
slot update_slots: id 0 | task 4964 | restored context checkpoint (pos_min = 16897, pos_max = 16897, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 4964 | erased invalidated context checkpoint (pos_min = 17760, pos_max = 17760, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 4964 | erased invalidated context checkpoint (pos_min = 20409, pos_max = 20409, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 4964 | erased invalidated context checkpoint (pos_min = 21487, pos_max = 21487, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 4964 | erased invalidated context checkpoint (pos_min = 23577, pos_max = 23577, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 4964 | erased invalidated context checkpoint (pos_min = 23836, pos_max = 23836, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 4964 | erased invalidated context checkpoint (pos_min = 24773, pos_max = 24773, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 4964 | n_tokens = 16898, memory_seq_rm [16898, end) |
|
slot update_slots: id 0 | task 4964 | prompt processing progress, n_tokens = 17701, batch.n_tokens = 803, progress = 0.996397 |
|
slot update_slots: id 0 | task 4964 | n_tokens = 17701, memory_seq_rm [17701, end) |
|
slot update_slots: id 0 | task 4964 | prompt processing progress, n_tokens = 17765, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 4964 | prompt done, n_tokens = 17765, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 4964 | init sampler, took 1.44 ms, tokens: text = 17765, total = 17765 |
|
slot update_slots: id 0 | task 4964 | created context checkpoint 2 of 8 (pos_min = 17700, pos_max = 17700, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 4964 | |
|
prompt eval time = 14860.42 ms / 867 tokens ( 17.14 ms per token, 58.34 tokens per second) |
|
eval time = 8429.19 ms / 144 tokens ( 58.54 ms per token, 17.08 tokens per second) |
|
total time = 23289.60 ms / 1011 tokens |
|
slot release: id 0 | task 4964 | stop processing: n_tokens = 17908, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 12109752620 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 17908, total state size = 495.300 MiB |
|
srv load: - looking for better prompt, base f_keep = 0.002, sim = 0.004 |
|
srv load: - found better prompt with f_keep = 0.631, sim = 0.992 |
|
srv update: - cache state: 6 prompts, 2255.693 MiB (limits: 8192.000 MiB, 32768 tokens, 127548 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e4268394340: 595 tokens, checkpoints: 1, 164.704 MiB |
|
srv update: - prompt 0x5e426ad8a3f0: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e426ad9d890: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e424eb6dbb0: 15746 tokens, checkpoints: 7, 972.234 MiB |
|
srv update: - prompt 0x5e426baf03e0: 17908 tokens, checkpoints: 2, 646.052 MiB |
|
srv get_availabl: prompt cache update took 5872.87 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 5110 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 5110 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 9752 |
|
slot update_slots: id 0 | task 5110 | n_past = 9678, slot.prompt.tokens.size() = 15335, seq_id = 0, pos_min = 15334, n_swa = 1 |
|
slot update_slots: id 0 | task 5110 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) |
|
slot update_slots: id 0 | task 5110 | erased invalidated context checkpoint (pos_min = 13464, pos_max = 13464, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 5110 | erased invalidated context checkpoint (pos_min = 13603, pos_max = 13603, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 5110 | erased invalidated context checkpoint (pos_min = 13758, pos_max = 13758, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 5110 | erased invalidated context checkpoint (pos_min = 13911, pos_max = 13911, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 5110 | erased invalidated context checkpoint (pos_min = 14072, pos_max = 14072, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 5110 | erased invalidated context checkpoint (pos_min = 14284, pos_max = 14284, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 5110 | erased invalidated context checkpoint (pos_min = 14508, pos_max = 14508, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 5110 | erased invalidated context checkpoint (pos_min = 14619, pos_max = 14619, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 5110 | n_tokens = 0, memory_seq_rm [0, end) |
|
slot update_slots: id 0 | task 5110 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.210008 |
|
slot update_slots: id 0 | task 5110 | n_tokens = 2048, memory_seq_rm [2048, end) |
|
slot update_slots: id 0 | task 5110 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.420016 |
|
slot update_slots: id 0 | task 5110 | n_tokens = 4096, memory_seq_rm [4096, end) |
|
slot update_slots: id 0 | task 5110 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.630025 |
|
slot update_slots: id 0 | task 5110 | n_tokens = 6144, memory_seq_rm [6144, end) |
|
slot update_slots: id 0 | task 5110 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, progress = 0.840033 |
|
slot update_slots: id 0 | task 5110 | n_tokens = 8192, memory_seq_rm [8192, end) |
|
slot update_slots: id 0 | task 5110 | prompt processing progress, n_tokens = 9688, batch.n_tokens = 1496, progress = 0.993437 |
|
slot update_slots: id 0 | task 5110 | n_tokens = 9688, memory_seq_rm [9688, end) |
|
slot update_slots: id 0 | task 5110 | prompt processing progress, n_tokens = 9752, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 5110 | prompt done, n_tokens = 9752, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 5110 | init sampler, took 0.78 ms, tokens: text = 9752, total = 9752 |
|
slot update_slots: id 0 | task 5110 | created context checkpoint 1 of 8 (pos_min = 9687, pos_max = 9687, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 5110 | |
|
prompt eval time = 73465.76 ms / 9752 tokens ( 7.53 ms per token, 132.74 tokens per second) |
|
eval time = 2107.18 ms / 37 tokens ( 56.95 ms per token, 17.56 tokens per second) |
|
total time = 75572.94 ms / 9789 tokens |
|
slot release: id 0 | task 5110 | stop processing: n_tokens = 9788, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.987 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 5153 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 5153 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 9913 |
|
slot update_slots: id 0 | task 5153 | n_tokens = 9788, memory_seq_rm [9788, end) |
|
slot update_slots: id 0 | task 5153 | prompt processing progress, n_tokens = 9849, batch.n_tokens = 61, progress = 0.993544 |
|
slot update_slots: id 0 | task 5153 | n_tokens = 9849, memory_seq_rm [9849, end) |
|
slot update_slots: id 0 | task 5153 | prompt processing progress, n_tokens = 9913, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 5153 | prompt done, n_tokens = 9913, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 5153 | init sampler, took 0.75 ms, tokens: text = 9913, total = 9913 |
|
slot update_slots: id 0 | task 5153 | created context checkpoint 2 of 8 (pos_min = 9848, pos_max = 9848, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 5153 | |
|
prompt eval time = 2714.39 ms / 125 tokens ( 21.72 ms per token, 46.05 tokens per second) |
|
eval time = 1830.79 ms / 30 tokens ( 61.03 ms per token, 16.39 tokens per second) |
|
total time = 4545.19 ms / 155 tokens |
|
slot release: id 0 | task 5153 | stop processing: n_tokens = 9942, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.790 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 5185 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 5185 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 12578 |
|
slot update_slots: id 0 | task 5185 | n_tokens = 9942, memory_seq_rm [9942, end) |
|
slot update_slots: id 0 | task 5185 | prompt processing progress, n_tokens = 11990, batch.n_tokens = 2048, progress = 0.953252 |
|
slot update_slots: id 0 | task 5185 | n_tokens = 11990, memory_seq_rm [11990, end) |
|
slot update_slots: id 0 | task 5185 | prompt processing progress, n_tokens = 12514, batch.n_tokens = 524, progress = 0.994912 |
|
slot update_slots: id 0 | task 5185 | n_tokens = 12514, memory_seq_rm [12514, end) |
|
slot update_slots: id 0 | task 5185 | prompt processing progress, n_tokens = 12578, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 5185 | prompt done, n_tokens = 12578, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 5185 | init sampler, took 1.07 ms, tokens: text = 12578, total = 12578 |
|
slot update_slots: id 0 | task 5185 | created context checkpoint 3 of 8 (pos_min = 12513, pos_max = 12513, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 5185 | |
|
prompt eval time = 22554.48 ms / 2636 tokens ( 8.56 ms per token, 116.87 tokens per second) |
|
eval time = 1803.50 ms / 31 tokens ( 58.18 ms per token, 17.19 tokens per second) |
|
total time = 24357.98 ms / 2667 tokens |
|
slot release: id 0 | task 5185 | stop processing: n_tokens = 12608, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 5219 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 5219 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 12684 |
|
slot update_slots: id 0 | task 5219 | n_tokens = 12608, memory_seq_rm [12608, end) |
|
slot update_slots: id 0 | task 5219 | prompt processing progress, n_tokens = 12620, batch.n_tokens = 12, progress = 0.994954 |
|
slot update_slots: id 0 | task 5219 | n_tokens = 12620, memory_seq_rm [12620, end) |
|
slot update_slots: id 0 | task 5219 | prompt processing progress, n_tokens = 12684, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 5219 | prompt done, n_tokens = 12684, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 5219 | init sampler, took 1.01 ms, tokens: text = 12684, total = 12684 |
|
slot update_slots: id 0 | task 5219 | created context checkpoint 4 of 8 (pos_min = 12619, pos_max = 12619, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 5219 | |
|
prompt eval time = 1333.55 ms / 76 tokens ( 17.55 ms per token, 56.99 tokens per second) |
|
eval time = 1785.36 ms / 32 tokens ( 55.79 ms per token, 17.92 tokens per second) |
|
total time = 3118.91 ms / 108 tokens |
|
slot release: id 0 | task 5219 | stop processing: n_tokens = 12715, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.984 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 5253 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 5253 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 12917 |
|
slot update_slots: id 0 | task 5253 | n_tokens = 12715, memory_seq_rm [12715, end) |
|
slot update_slots: id 0 | task 5253 | prompt processing progress, n_tokens = 12853, batch.n_tokens = 138, progress = 0.995045 |
|
slot update_slots: id 0 | task 5253 | n_tokens = 12853, memory_seq_rm [12853, end) |
|
slot update_slots: id 0 | task 5253 | prompt processing progress, n_tokens = 12917, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 5253 | prompt done, n_tokens = 12917, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 5253 | init sampler, took 1.08 ms, tokens: text = 12917, total = 12917 |
|
slot update_slots: id 0 | task 5253 | created context checkpoint 5 of 8 (pos_min = 12852, pos_max = 12852, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 5253 | |
|
prompt eval time = 2505.56 ms / 202 tokens ( 12.40 ms per token, 80.62 tokens per second) |
|
eval time = 2117.64 ms / 37 tokens ( 57.23 ms per token, 17.47 tokens per second) |
|
total time = 4623.20 ms / 239 tokens |
|
slot release: id 0 | task 5253 | stop processing: n_tokens = 12953, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.990 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 5292 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 5292 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13078 |
|
slot update_slots: id 0 | task 5292 | n_tokens = 12953, memory_seq_rm [12953, end) |
|
slot update_slots: id 0 | task 5292 | prompt processing progress, n_tokens = 13014, batch.n_tokens = 61, progress = 0.995106 |
|
slot update_slots: id 0 | task 5292 | n_tokens = 13014, memory_seq_rm [13014, end) |
|
slot update_slots: id 0 | task 5292 | prompt processing progress, n_tokens = 13078, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 5292 | prompt done, n_tokens = 13078, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 5292 | init sampler, took 1.23 ms, tokens: text = 13078, total = 13078 |
|
slot update_slots: id 0 | task 5292 | created context checkpoint 6 of 8 (pos_min = 13013, pos_max = 13013, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 5292 | |
|
prompt eval time = 1997.29 ms / 125 tokens ( 15.98 ms per token, 62.58 tokens per second) |
|
eval time = 31247.08 ms / 538 tokens ( 58.08 ms per token, 17.22 tokens per second) |
|
total time = 33244.36 ms / 663 tokens |
|
slot release: id 0 | task 5292 | stop processing: n_tokens = 13615, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 12266760647 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 13615, total state size = 394.634 MiB |
|
srv load: - looking for better prompt, base f_keep = 0.003, sim = 0.002 |
|
srv load: - found better prompt with f_keep = 1.000, sim = 0.969 |
|
srv update: - cache state: 6 prompts, 2456.530 MiB (limits: 8192.000 MiB, 32768 tokens, 102804 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e4268394340: 595 tokens, checkpoints: 1, 164.704 MiB |
|
srv update: - prompt 0x5e426ad8a3f0: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e426ad9d890: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e424eb6dbb0: 15746 tokens, checkpoints: 7, 972.234 MiB |
|
srv update: - prompt 0x5e426816e6d0: 13615 tokens, checkpoints: 6, 846.889 MiB |
|
srv get_availabl: prompt cache update took 7365.40 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 5832 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 5832 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 18481 |
|
slot update_slots: id 0 | task 5832 | n_tokens = 17908, memory_seq_rm [17908, end) |
|
slot update_slots: id 0 | task 5832 | prompt processing progress, n_tokens = 18417, batch.n_tokens = 509, progress = 0.996537 |
|
slot update_slots: id 0 | task 5832 | n_tokens = 18417, memory_seq_rm [18417, end) |
|
slot update_slots: id 0 | task 5832 | prompt processing progress, n_tokens = 18481, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 5832 | prompt done, n_tokens = 18481, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 5832 | init sampler, took 1.45 ms, tokens: text = 18481, total = 18481 |
|
slot update_slots: id 0 | task 5832 | created context checkpoint 3 of 8 (pos_min = 18416, pos_max = 18416, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 5832 | |
|
prompt eval time = 7734.73 ms / 573 tokens ( 13.50 ms per token, 74.08 tokens per second) |
|
eval time = 2791.53 ms / 44 tokens ( 63.44 ms per token, 15.76 tokens per second) |
|
total time = 10526.26 ms / 617 tokens |
|
slot release: id 0 | task 5832 | stop processing: n_tokens = 18524, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.877 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 5878 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 5878 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21130 |
|
slot update_slots: id 0 | task 5878 | n_tokens = 18524, memory_seq_rm [18524, end) |
|
slot update_slots: id 0 | task 5878 | prompt processing progress, n_tokens = 20572, batch.n_tokens = 2048, progress = 0.973592 |
|
slot update_slots: id 0 | task 5878 | n_tokens = 20572, memory_seq_rm [20572, end) |
|
slot update_slots: id 0 | task 5878 | prompt processing progress, n_tokens = 21066, batch.n_tokens = 494, progress = 0.996971 |
|
slot update_slots: id 0 | task 5878 | n_tokens = 21066, memory_seq_rm [21066, end) |
|
slot update_slots: id 0 | task 5878 | prompt processing progress, n_tokens = 21130, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 5878 | prompt done, n_tokens = 21130, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 5878 | init sampler, took 1.72 ms, tokens: text = 21130, total = 21130 |
|
slot update_slots: id 0 | task 5878 | created context checkpoint 4 of 8 (pos_min = 21065, pos_max = 21065, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 5878 | |
|
prompt eval time = 21446.58 ms / 2606 tokens ( 8.23 ms per token, 121.51 tokens per second) |
|
eval time = 1975.53 ms / 33 tokens ( 59.86 ms per token, 16.70 tokens per second) |
|
total time = 23422.12 ms / 2639 tokens |
|
slot release: id 0 | task 5878 | stop processing: n_tokens = 21162, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 5914 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 5914 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21178 |
|
slot update_slots: id 0 | task 5914 | n_past = 21157, slot.prompt.tokens.size() = 21162, seq_id = 0, pos_min = 21161, n_swa = 1 |
|
slot update_slots: id 0 | task 5914 | restored context checkpoint (pos_min = 21065, pos_max = 21065, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 5914 | n_tokens = 21066, memory_seq_rm [21066, end) |
|
slot update_slots: id 0 | task 5914 | prompt processing progress, n_tokens = 21114, batch.n_tokens = 48, progress = 0.996978 |
|
slot update_slots: id 0 | task 5914 | n_tokens = 21114, memory_seq_rm [21114, end) |
|
slot update_slots: id 0 | task 5914 | prompt processing progress, n_tokens = 21178, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 5914 | prompt done, n_tokens = 21178, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 5914 | init sampler, took 1.64 ms, tokens: text = 21178, total = 21178 |
|
slot print_timing: id 0 | task 5914 | |
|
prompt eval time = 1830.46 ms / 112 tokens ( 16.34 ms per token, 61.19 tokens per second) |
|
eval time = 3386.87 ms / 58 tokens ( 58.39 ms per token, 17.12 tokens per second) |
|
total time = 5217.32 ms / 170 tokens |
|
slot release: id 0 | task 5914 | stop processing: n_tokens = 21235, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 5974 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 5974 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21259 |
|
slot update_slots: id 0 | task 5974 | n_tokens = 21235, memory_seq_rm [21235, end) |
|
slot update_slots: id 0 | task 5974 | prompt processing progress, n_tokens = 21259, batch.n_tokens = 24, progress = 1.000000 |
|
slot update_slots: id 0 | task 5974 | prompt done, n_tokens = 21259, batch.n_tokens = 24 |
|
slot init_sampler: id 0 | task 5974 | init sampler, took 1.72 ms, tokens: text = 21259, total = 21259 |
|
slot update_slots: id 0 | task 5974 | created context checkpoint 5 of 8 (pos_min = 21234, pos_max = 21234, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 5974 | |
|
prompt eval time = 375.50 ms / 24 tokens ( 15.65 ms per token, 63.91 tokens per second) |
|
eval time = 4046.66 ms / 67 tokens ( 60.40 ms per token, 16.56 tokens per second) |
|
total time = 4422.16 ms / 91 tokens |
|
slot release: id 0 | task 5974 | stop processing: n_tokens = 21325, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6042 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6042 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21353 |
|
slot update_slots: id 0 | task 6042 | n_tokens = 21325, memory_seq_rm [21325, end) |
|
slot update_slots: id 0 | task 6042 | prompt processing progress, n_tokens = 21353, batch.n_tokens = 28, progress = 1.000000 |
|
slot update_slots: id 0 | task 6042 | prompt done, n_tokens = 21353, batch.n_tokens = 28 |
|
slot init_sampler: id 0 | task 6042 | init sampler, took 1.75 ms, tokens: text = 21353, total = 21353 |
|
slot update_slots: id 0 | task 6042 | created context checkpoint 6 of 8 (pos_min = 21324, pos_max = 21324, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6042 | |
|
prompt eval time = 435.48 ms / 28 tokens ( 15.55 ms per token, 64.30 tokens per second) |
|
eval time = 3561.99 ms / 62 tokens ( 57.45 ms per token, 17.41 tokens per second) |
|
total time = 3997.47 ms / 90 tokens |
|
slot release: id 0 | task 6042 | stop processing: n_tokens = 21414, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6105 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6105 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21441 |
|
slot update_slots: id 0 | task 6105 | n_tokens = 21414, memory_seq_rm [21414, end) |
|
slot update_slots: id 0 | task 6105 | prompt processing progress, n_tokens = 21441, batch.n_tokens = 27, progress = 1.000000 |
|
slot update_slots: id 0 | task 6105 | prompt done, n_tokens = 21441, batch.n_tokens = 27 |
|
slot init_sampler: id 0 | task 6105 | init sampler, took 1.71 ms, tokens: text = 21441, total = 21441 |
|
slot update_slots: id 0 | task 6105 | created context checkpoint 7 of 8 (pos_min = 21413, pos_max = 21413, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6105 | |
|
prompt eval time = 409.14 ms / 27 tokens ( 15.15 ms per token, 65.99 tokens per second) |
|
eval time = 3569.16 ms / 61 tokens ( 58.51 ms per token, 17.09 tokens per second) |
|
total time = 3978.29 ms / 88 tokens |
|
slot release: id 0 | task 6105 | stop processing: n_tokens = 21501, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6167 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6167 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21527 |
|
slot update_slots: id 0 | task 6167 | n_tokens = 21501, memory_seq_rm [21501, end) |
|
slot update_slots: id 0 | task 6167 | prompt processing progress, n_tokens = 21527, batch.n_tokens = 26, progress = 1.000000 |
|
slot update_slots: id 0 | task 6167 | prompt done, n_tokens = 21527, batch.n_tokens = 26 |
|
slot init_sampler: id 0 | task 6167 | init sampler, took 1.91 ms, tokens: text = 21527, total = 21527 |
|
slot update_slots: id 0 | task 6167 | created context checkpoint 8 of 8 (pos_min = 21500, pos_max = 21500, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6167 | |
|
prompt eval time = 390.64 ms / 26 tokens ( 15.02 ms per token, 66.56 tokens per second) |
|
eval time = 3449.90 ms / 59 tokens ( 58.47 ms per token, 17.10 tokens per second) |
|
total time = 3840.54 ms / 85 tokens |
|
slot release: id 0 | task 6167 | stop processing: n_tokens = 21585, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6227 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6227 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21610 |
|
slot update_slots: id 0 | task 6227 | n_tokens = 21585, memory_seq_rm [21585, end) |
|
slot update_slots: id 0 | task 6227 | prompt processing progress, n_tokens = 21610, batch.n_tokens = 25, progress = 1.000000 |
|
slot update_slots: id 0 | task 6227 | prompt done, n_tokens = 21610, batch.n_tokens = 25 |
|
slot init_sampler: id 0 | task 6227 | init sampler, took 1.74 ms, tokens: text = 21610, total = 21610 |
|
slot update_slots: id 0 | task 6227 | erasing old context checkpoint (pos_min = 16897, pos_max = 16897, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6227 | created context checkpoint 8 of 8 (pos_min = 21584, pos_max = 21584, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6227 | |
|
prompt eval time = 621.28 ms / 25 tokens ( 24.85 ms per token, 40.24 tokens per second) |
|
eval time = 3198.08 ms / 56 tokens ( 57.11 ms per token, 17.51 tokens per second) |
|
total time = 3819.36 ms / 81 tokens |
|
slot release: id 0 | task 6227 | stop processing: n_tokens = 21665, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6284 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6284 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21690 |
|
slot update_slots: id 0 | task 6284 | n_tokens = 21665, memory_seq_rm [21665, end) |
|
slot update_slots: id 0 | task 6284 | prompt processing progress, n_tokens = 21690, batch.n_tokens = 25, progress = 1.000000 |
|
slot update_slots: id 0 | task 6284 | prompt done, n_tokens = 21690, batch.n_tokens = 25 |
|
slot init_sampler: id 0 | task 6284 | init sampler, took 1.72 ms, tokens: text = 21690, total = 21690 |
|
slot update_slots: id 0 | task 6284 | erasing old context checkpoint (pos_min = 17700, pos_max = 17700, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6284 | created context checkpoint 8 of 8 (pos_min = 21664, pos_max = 21664, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6284 | |
|
prompt eval time = 750.75 ms / 25 tokens ( 30.03 ms per token, 33.30 tokens per second) |
|
eval time = 3213.78 ms / 55 tokens ( 58.43 ms per token, 17.11 tokens per second) |
|
total time = 3964.53 ms / 80 tokens |
|
slot release: id 0 | task 6284 | stop processing: n_tokens = 21744, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6340 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6340 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21768 |
|
slot update_slots: id 0 | task 6340 | n_tokens = 21744, memory_seq_rm [21744, end) |
|
slot update_slots: id 0 | task 6340 | prompt processing progress, n_tokens = 21768, batch.n_tokens = 24, progress = 1.000000 |
|
slot update_slots: id 0 | task 6340 | prompt done, n_tokens = 21768, batch.n_tokens = 24 |
|
slot init_sampler: id 0 | task 6340 | init sampler, took 1.77 ms, tokens: text = 21768, total = 21768 |
|
slot update_slots: id 0 | task 6340 | erasing old context checkpoint (pos_min = 18416, pos_max = 18416, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6340 | created context checkpoint 8 of 8 (pos_min = 21743, pos_max = 21743, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6340 | |
|
prompt eval time = 363.60 ms / 24 tokens ( 15.15 ms per token, 66.01 tokens per second) |
|
eval time = 3446.97 ms / 56 tokens ( 61.55 ms per token, 16.25 tokens per second) |
|
total time = 3810.56 ms / 80 tokens |
|
slot release: id 0 | task 6340 | stop processing: n_tokens = 21823, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6397 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6397 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21847 |
|
slot update_slots: id 0 | task 6397 | n_tokens = 21823, memory_seq_rm [21823, end) |
|
slot update_slots: id 0 | task 6397 | prompt processing progress, n_tokens = 21847, batch.n_tokens = 24, progress = 1.000000 |
|
slot update_slots: id 0 | task 6397 | prompt done, n_tokens = 21847, batch.n_tokens = 24 |
|
slot init_sampler: id 0 | task 6397 | init sampler, took 1.80 ms, tokens: text = 21847, total = 21847 |
|
slot update_slots: id 0 | task 6397 | erasing old context checkpoint (pos_min = 21065, pos_max = 21065, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6397 | created context checkpoint 8 of 8 (pos_min = 21822, pos_max = 21822, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6397 | |
|
prompt eval time = 374.63 ms / 24 tokens ( 15.61 ms per token, 64.06 tokens per second) |
|
eval time = 3437.24 ms / 58 tokens ( 59.26 ms per token, 16.87 tokens per second) |
|
total time = 3811.86 ms / 82 tokens |
|
slot release: id 0 | task 6397 | stop processing: n_tokens = 21904, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6456 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6456 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21928 |
|
slot update_slots: id 0 | task 6456 | n_tokens = 21904, memory_seq_rm [21904, end) |
|
slot update_slots: id 0 | task 6456 | prompt processing progress, n_tokens = 21928, batch.n_tokens = 24, progress = 1.000000 |
|
slot update_slots: id 0 | task 6456 | prompt done, n_tokens = 21928, batch.n_tokens = 24 |
|
slot init_sampler: id 0 | task 6456 | init sampler, took 1.68 ms, tokens: text = 21928, total = 21928 |
|
slot update_slots: id 0 | task 6456 | erasing old context checkpoint (pos_min = 21234, pos_max = 21234, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6456 | created context checkpoint 8 of 8 (pos_min = 21903, pos_max = 21903, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6456 | |
|
prompt eval time = 422.02 ms / 24 tokens ( 17.58 ms per token, 56.87 tokens per second) |
|
eval time = 3360.58 ms / 58 tokens ( 57.94 ms per token, 17.26 tokens per second) |
|
total time = 3782.60 ms / 82 tokens |
|
slot release: id 0 | task 6456 | stop processing: n_tokens = 21985, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6515 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6515 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22010 |
|
slot update_slots: id 0 | task 6515 | n_tokens = 21985, memory_seq_rm [21985, end) |
|
slot update_slots: id 0 | task 6515 | prompt processing progress, n_tokens = 22010, batch.n_tokens = 25, progress = 1.000000 |
|
slot update_slots: id 0 | task 6515 | prompt done, n_tokens = 22010, batch.n_tokens = 25 |
|
slot init_sampler: id 0 | task 6515 | init sampler, took 1.79 ms, tokens: text = 22010, total = 22010 |
|
slot update_slots: id 0 | task 6515 | erasing old context checkpoint (pos_min = 21324, pos_max = 21324, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6515 | created context checkpoint 8 of 8 (pos_min = 21984, pos_max = 21984, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6515 | |
|
prompt eval time = 383.56 ms / 25 tokens ( 15.34 ms per token, 65.18 tokens per second) |
|
eval time = 3391.38 ms / 57 tokens ( 59.50 ms per token, 16.81 tokens per second) |
|
total time = 3774.94 ms / 82 tokens |
|
slot release: id 0 | task 6515 | stop processing: n_tokens = 22066, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6573 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6573 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22092 |
|
slot update_slots: id 0 | task 6573 | n_tokens = 22066, memory_seq_rm [22066, end) |
|
slot update_slots: id 0 | task 6573 | prompt processing progress, n_tokens = 22092, batch.n_tokens = 26, progress = 1.000000 |
|
slot update_slots: id 0 | task 6573 | prompt done, n_tokens = 22092, batch.n_tokens = 26 |
|
slot init_sampler: id 0 | task 6573 | init sampler, took 1.82 ms, tokens: text = 22092, total = 22092 |
|
slot update_slots: id 0 | task 6573 | erasing old context checkpoint (pos_min = 21413, pos_max = 21413, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6573 | created context checkpoint 8 of 8 (pos_min = 22065, pos_max = 22065, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6573 | |
|
prompt eval time = 413.38 ms / 26 tokens ( 15.90 ms per token, 62.90 tokens per second) |
|
eval time = 3271.08 ms / 54 tokens ( 60.58 ms per token, 16.51 tokens per second) |
|
total time = 3684.46 ms / 80 tokens |
|
slot release: id 0 | task 6573 | stop processing: n_tokens = 22145, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6628 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6628 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22171 |
|
slot update_slots: id 0 | task 6628 | n_tokens = 22145, memory_seq_rm [22145, end) |
|
slot update_slots: id 0 | task 6628 | prompt processing progress, n_tokens = 22171, batch.n_tokens = 26, progress = 1.000000 |
|
slot update_slots: id 0 | task 6628 | prompt done, n_tokens = 22171, batch.n_tokens = 26 |
|
slot init_sampler: id 0 | task 6628 | init sampler, took 1.83 ms, tokens: text = 22171, total = 22171 |
|
slot update_slots: id 0 | task 6628 | erasing old context checkpoint (pos_min = 21500, pos_max = 21500, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6628 | created context checkpoint 8 of 8 (pos_min = 22144, pos_max = 22144, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6628 | |
|
prompt eval time = 417.88 ms / 26 tokens ( 16.07 ms per token, 62.22 tokens per second) |
|
eval time = 2850.32 ms / 49 tokens ( 58.17 ms per token, 17.19 tokens per second) |
|
total time = 3268.19 ms / 75 tokens |
|
slot release: id 0 | task 6628 | stop processing: n_tokens = 22219, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6678 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6678 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22272 |
|
slot update_slots: id 0 | task 6678 | n_past = 22209, slot.prompt.tokens.size() = 22219, seq_id = 0, pos_min = 22218, n_swa = 1 |
|
slot update_slots: id 0 | task 6678 | restored context checkpoint (pos_min = 22144, pos_max = 22144, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6678 | n_tokens = 22145, memory_seq_rm [22145, end) |
|
slot update_slots: id 0 | task 6678 | prompt processing progress, n_tokens = 22208, batch.n_tokens = 63, progress = 0.997126 |
|
slot update_slots: id 0 | task 6678 | n_tokens = 22208, memory_seq_rm [22208, end) |
|
slot update_slots: id 0 | task 6678 | prompt processing progress, n_tokens = 22272, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 6678 | prompt done, n_tokens = 22272, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 6678 | init sampler, took 1.71 ms, tokens: text = 22272, total = 22272 |
|
slot print_timing: id 0 | task 6678 | |
|
prompt eval time = 2156.74 ms / 127 tokens ( 16.98 ms per token, 58.89 tokens per second) |
|
eval time = 1729.32 ms / 33 tokens ( 52.40 ms per token, 19.08 tokens per second) |
|
total time = 3886.06 ms / 160 tokens |
|
slot release: id 0 | task 6678 | stop processing: n_tokens = 22304, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6713 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6713 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22357 |
|
slot update_slots: id 0 | task 6713 | n_past = 22294, slot.prompt.tokens.size() = 22304, seq_id = 0, pos_min = 22303, n_swa = 1 |
|
slot update_slots: id 0 | task 6713 | restored context checkpoint (pos_min = 22144, pos_max = 22144, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6713 | n_tokens = 22145, memory_seq_rm [22145, end) |
|
slot update_slots: id 0 | task 6713 | prompt processing progress, n_tokens = 22293, batch.n_tokens = 148, progress = 0.997137 |
|
slot update_slots: id 0 | task 6713 | n_tokens = 22293, memory_seq_rm [22293, end) |
|
slot update_slots: id 0 | task 6713 | prompt processing progress, n_tokens = 22357, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 6713 | prompt done, n_tokens = 22357, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 6713 | init sampler, took 1.81 ms, tokens: text = 22357, total = 22357 |
|
slot update_slots: id 0 | task 6713 | erasing old context checkpoint (pos_min = 21584, pos_max = 21584, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6713 | created context checkpoint 8 of 8 (pos_min = 22292, pos_max = 22292, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6713 | |
|
prompt eval time = 2757.89 ms / 212 tokens ( 13.01 ms per token, 76.87 tokens per second) |
|
eval time = 2032.58 ms / 34 tokens ( 59.78 ms per token, 16.73 tokens per second) |
|
total time = 4790.47 ms / 246 tokens |
|
slot release: id 0 | task 6713 | stop processing: n_tokens = 22390, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6749 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6749 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22443 |
|
slot update_slots: id 0 | task 6749 | n_tokens = 22390, memory_seq_rm [22390, end) |
|
slot update_slots: id 0 | task 6749 | prompt processing progress, n_tokens = 22443, batch.n_tokens = 53, progress = 1.000000 |
|
slot update_slots: id 0 | task 6749 | prompt done, n_tokens = 22443, batch.n_tokens = 53 |
|
slot init_sampler: id 0 | task 6749 | init sampler, took 1.76 ms, tokens: text = 22443, total = 22443 |
|
slot update_slots: id 0 | task 6749 | erasing old context checkpoint (pos_min = 21664, pos_max = 21664, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6749 | created context checkpoint 8 of 8 (pos_min = 22389, pos_max = 22389, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6749 | |
|
prompt eval time = 1048.08 ms / 53 tokens ( 19.78 ms per token, 50.57 tokens per second) |
|
eval time = 5528.37 ms / 88 tokens ( 62.82 ms per token, 15.92 tokens per second) |
|
total time = 6576.45 ms / 141 tokens |
|
slot release: id 0 | task 6749 | stop processing: n_tokens = 22530, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 12378895599 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 22530, total state size = 603.681 MiB |
|
srv params_from_: Chat format: Qwen3 Coder |
|
srv load: - looking for better prompt, base f_keep = 0.001, sim = 0.057 |
|
srv update: - cache state: 7 prompts, 3663.218 MiB (limits: 8192.000 MiB, 32768 tokens, 119323 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e4268394340: 595 tokens, checkpoints: 1, 164.704 MiB |
|
srv update: - prompt 0x5e426ad8a3f0: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e426ad9d890: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e424eb6dbb0: 15746 tokens, checkpoints: 7, 972.234 MiB |
|
srv update: - prompt 0x5e426816e6d0: 13615 tokens, checkpoints: 6, 846.889 MiB |
|
srv update: - prompt 0x5e426850d0e0: 22530 tokens, checkpoints: 8, 1206.688 MiB |
|
srv get_availabl: prompt cache update took 6022.52 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6838 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6838 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 350 |
|
slot update_slots: id 0 | task 6838 | n_past = 20, slot.prompt.tokens.size() = 22530, seq_id = 0, pos_min = 22529, n_swa = 1 |
|
slot update_slots: id 0 | task 6838 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) |
|
slot update_slots: id 0 | task 6838 | erased invalidated context checkpoint (pos_min = 21743, pos_max = 21743, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6838 | erased invalidated context checkpoint (pos_min = 21822, pos_max = 21822, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6838 | erased invalidated context checkpoint (pos_min = 21903, pos_max = 21903, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6838 | erased invalidated context checkpoint (pos_min = 21984, pos_max = 21984, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6838 | erased invalidated context checkpoint (pos_min = 22065, pos_max = 22065, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6838 | erased invalidated context checkpoint (pos_min = 22144, pos_max = 22144, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6838 | erased invalidated context checkpoint (pos_min = 22292, pos_max = 22292, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6838 | erased invalidated context checkpoint (pos_min = 22389, pos_max = 22389, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6838 | n_tokens = 0, memory_seq_rm [0, end) |
|
slot update_slots: id 0 | task 6838 | prompt processing progress, n_tokens = 286, batch.n_tokens = 286, progress = 0.817143 |
|
slot update_slots: id 0 | task 6838 | n_tokens = 286, memory_seq_rm [286, end) |
|
slot update_slots: id 0 | task 6838 | prompt processing progress, n_tokens = 350, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 6838 | prompt done, n_tokens = 350, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 6838 | init sampler, took 0.03 ms, tokens: text = 350, total = 350 |
|
slot update_slots: id 0 | task 6838 | created context checkpoint 1 of 8 (pos_min = 285, pos_max = 285, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6838 | |
|
prompt eval time = 11793.55 ms / 350 tokens ( 33.70 ms per token, 29.68 tokens per second) |
|
eval time = 1458.87 ms / 23 tokens ( 63.43 ms per token, 15.77 tokens per second) |
|
total time = 13252.42 ms / 373 tokens |
|
slot release: id 0 | task 6838 | stop processing: n_tokens = 372, truncated = 0 |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 12398394801 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 372, total state size = 84.099 MiB |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv load: - looking for better prompt, base f_keep = 0.054, sim = 0.001 |
|
srv load: - found better prompt with f_keep = 1.000, sim = 0.999 |
|
srv update: - cache state: 7 prompts, 2616.005 MiB (limits: 8192.000 MiB, 32768 tokens, 97702 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e4268394340: 595 tokens, checkpoints: 1, 164.704 MiB |
|
srv update: - prompt 0x5e426ad8a3f0: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e426ad9d890: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e424eb6dbb0: 15746 tokens, checkpoints: 7, 972.234 MiB |
|
srv update: - prompt 0x5e426816e6d0: 13615 tokens, checkpoints: 6, 846.889 MiB |
|
srv update: - prompt 0x5e424eb0ef50: 372 tokens, checkpoints: 1, 159.475 MiB |
|
srv get_availabl: prompt cache update took 846.78 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6839 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6839 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22543 |
|
slot update_slots: id 0 | task 6839 | n_tokens = 22530, memory_seq_rm [22530, end) |
|
slot update_slots: id 0 | task 6839 | prompt processing progress, n_tokens = 22543, batch.n_tokens = 13, progress = 1.000000 |
|
slot update_slots: id 0 | task 6839 | prompt done, n_tokens = 22543, batch.n_tokens = 13 |
|
slot init_sampler: id 0 | task 6839 | init sampler, took 1.71 ms, tokens: text = 22543, total = 22543 |
|
slot update_slots: id 0 | task 6839 | erasing old context checkpoint (pos_min = 21743, pos_max = 21743, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6839 | created context checkpoint 8 of 8 (pos_min = 22529, pos_max = 22529, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6839 | |
|
prompt eval time = 480.65 ms / 13 tokens ( 36.97 ms per token, 27.05 tokens per second) |
|
eval time = 5347.05 ms / 81 tokens ( 66.01 ms per token, 15.15 tokens per second) |
|
total time = 5827.69 ms / 94 tokens |
|
slot release: id 0 | task 6839 | stop processing: n_tokens = 22623, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.995 (> 0.100 thold), f_keep = 0.998 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6945 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6945 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22688 |
|
slot update_slots: id 0 | task 6945 | n_past = 22580, slot.prompt.tokens.size() = 22623, seq_id = 0, pos_min = 22622, n_swa = 1 |
|
slot update_slots: id 0 | task 6945 | restored context checkpoint (pos_min = 22529, pos_max = 22529, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6945 | n_tokens = 22530, memory_seq_rm [22530, end) |
|
slot update_slots: id 0 | task 6945 | prompt processing progress, n_tokens = 22624, batch.n_tokens = 94, progress = 0.997179 |
|
slot update_slots: id 0 | task 6945 | n_tokens = 22624, memory_seq_rm [22624, end) |
|
slot update_slots: id 0 | task 6945 | prompt processing progress, n_tokens = 22688, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 6945 | prompt done, n_tokens = 22688, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 6945 | init sampler, took 1.76 ms, tokens: text = 22688, total = 22688 |
|
slot update_slots: id 0 | task 6945 | erasing old context checkpoint (pos_min = 21822, pos_max = 21822, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6945 | created context checkpoint 8 of 8 (pos_min = 22623, pos_max = 22623, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6945 | |
|
prompt eval time = 2796.67 ms / 158 tokens ( 17.70 ms per token, 56.50 tokens per second) |
|
eval time = 2716.46 ms / 46 tokens ( 59.05 ms per token, 16.93 tokens per second) |
|
total time = 5513.13 ms / 204 tokens |
|
slot release: id 0 | task 6945 | stop processing: n_tokens = 22733, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 6993 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 6993 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22766 |
|
slot update_slots: id 0 | task 6993 | n_tokens = 22733, memory_seq_rm [22733, end) |
|
slot update_slots: id 0 | task 6993 | prompt processing progress, n_tokens = 22766, batch.n_tokens = 33, progress = 1.000000 |
|
slot update_slots: id 0 | task 6993 | prompt done, n_tokens = 22766, batch.n_tokens = 33 |
|
slot init_sampler: id 0 | task 6993 | init sampler, took 1.88 ms, tokens: text = 22766, total = 22766 |
|
slot update_slots: id 0 | task 6993 | erasing old context checkpoint (pos_min = 21903, pos_max = 21903, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 6993 | created context checkpoint 8 of 8 (pos_min = 22732, pos_max = 22732, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 6993 | |
|
prompt eval time = 1262.12 ms / 33 tokens ( 38.25 ms per token, 26.15 tokens per second) |
|
eval time = 55410.56 ms / 902 tokens ( 61.43 ms per token, 16.28 tokens per second) |
|
total time = 56672.67 ms / 935 tokens |
|
slot release: id 0 | task 6993 | stop processing: n_tokens = 23667, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.996 (> 0.100 thold), f_keep = 0.998 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 7896 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 7896 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 23730 |
|
slot update_slots: id 0 | task 7896 | n_past = 23625, slot.prompt.tokens.size() = 23667, seq_id = 0, pos_min = 23666, n_swa = 1 |
|
slot update_slots: id 0 | task 7896 | restored context checkpoint (pos_min = 22732, pos_max = 22732, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 7896 | n_tokens = 22733, memory_seq_rm [22733, end) |
|
slot update_slots: id 0 | task 7896 | prompt processing progress, n_tokens = 23666, batch.n_tokens = 933, progress = 0.997303 |
|
slot update_slots: id 0 | task 7896 | n_tokens = 23666, memory_seq_rm [23666, end) |
|
slot update_slots: id 0 | task 7896 | prompt processing progress, n_tokens = 23730, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 7896 | prompt done, n_tokens = 23730, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 7896 | init sampler, took 1.85 ms, tokens: text = 23730, total = 23730 |
|
slot update_slots: id 0 | task 7896 | erasing old context checkpoint (pos_min = 21984, pos_max = 21984, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 7896 | created context checkpoint 8 of 8 (pos_min = 23665, pos_max = 23665, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 7896 | |
|
prompt eval time = 5967.17 ms / 997 tokens ( 5.99 ms per token, 167.08 tokens per second) |
|
eval time = 11417.55 ms / 196 tokens ( 58.25 ms per token, 17.17 tokens per second) |
|
total time = 17384.72 ms / 1193 tokens |
|
slot release: id 0 | task 7896 | stop processing: n_tokens = 23925, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 8094 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 8094 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 23957 |
|
slot update_slots: id 0 | task 8094 | n_tokens = 23925, memory_seq_rm [23925, end) |
|
slot update_slots: id 0 | task 8094 | prompt processing progress, n_tokens = 23957, batch.n_tokens = 32, progress = 1.000000 |
|
slot update_slots: id 0 | task 8094 | prompt done, n_tokens = 23957, batch.n_tokens = 32 |
|
slot init_sampler: id 0 | task 8094 | init sampler, took 1.89 ms, tokens: text = 23957, total = 23957 |
|
slot update_slots: id 0 | task 8094 | erasing old context checkpoint (pos_min = 22065, pos_max = 22065, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 8094 | created context checkpoint 8 of 8 (pos_min = 23924, pos_max = 23924, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 8094 | |
|
prompt eval time = 811.30 ms / 32 tokens ( 25.35 ms per token, 39.44 tokens per second) |
|
eval time = 77493.44 ms / 1302 tokens ( 59.52 ms per token, 16.80 tokens per second) |
|
total time = 78304.74 ms / 1334 tokens |
|
slot release: id 0 | task 8094 | stop processing: n_tokens = 25258, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 9397 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 9397 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 25289 |
|
slot update_slots: id 0 | task 9397 | n_tokens = 25258, memory_seq_rm [25258, end) |
|
slot update_slots: id 0 | task 9397 | prompt processing progress, n_tokens = 25289, batch.n_tokens = 31, progress = 1.000000 |
|
slot update_slots: id 0 | task 9397 | prompt done, n_tokens = 25289, batch.n_tokens = 31 |
|
slot init_sampler: id 0 | task 9397 | init sampler, took 2.03 ms, tokens: text = 25289, total = 25289 |
|
slot update_slots: id 0 | task 9397 | erasing old context checkpoint (pos_min = 22144, pos_max = 22144, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 9397 | created context checkpoint 8 of 8 (pos_min = 25257, pos_max = 25257, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 9397 | |
|
prompt eval time = 454.77 ms / 31 tokens ( 14.67 ms per token, 68.17 tokens per second) |
|
eval time = 17713.42 ms / 309 tokens ( 57.32 ms per token, 17.44 tokens per second) |
|
total time = 18168.19 ms / 340 tokens |
|
slot release: id 0 | task 9397 | stop processing: n_tokens = 25597, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.989 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 9707 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 9707 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 25887 |
|
slot update_slots: id 0 | task 9707 | n_tokens = 25597, memory_seq_rm [25597, end) |
|
slot update_slots: id 0 | task 9707 | prompt processing progress, n_tokens = 25823, batch.n_tokens = 226, progress = 0.997528 |
|
slot update_slots: id 0 | task 9707 | n_tokens = 25823, memory_seq_rm [25823, end) |
|
slot update_slots: id 0 | task 9707 | prompt processing progress, n_tokens = 25887, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 9707 | prompt done, n_tokens = 25887, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 9707 | init sampler, took 2.05 ms, tokens: text = 25887, total = 25887 |
|
slot update_slots: id 0 | task 9707 | erasing old context checkpoint (pos_min = 22292, pos_max = 22292, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 9707 | created context checkpoint 8 of 8 (pos_min = 25822, pos_max = 25822, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 9707 | |
|
prompt eval time = 3431.64 ms / 290 tokens ( 11.83 ms per token, 84.51 tokens per second) |
|
eval time = 5636.36 ms / 92 tokens ( 61.26 ms per token, 16.32 tokens per second) |
|
total time = 9068.00 ms / 382 tokens |
|
slot release: id 0 | task 9707 | stop processing: n_tokens = 25978, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 9801 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 9801 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 26009 |
|
slot update_slots: id 0 | task 9801 | n_tokens = 25978, memory_seq_rm [25978, end) |
|
slot update_slots: id 0 | task 9801 | prompt processing progress, n_tokens = 26009, batch.n_tokens = 31, progress = 1.000000 |
|
slot update_slots: id 0 | task 9801 | prompt done, n_tokens = 26009, batch.n_tokens = 31 |
|
slot init_sampler: id 0 | task 9801 | init sampler, took 2.08 ms, tokens: text = 26009, total = 26009 |
|
slot update_slots: id 0 | task 9801 | erasing old context checkpoint (pos_min = 22389, pos_max = 22389, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 9801 | created context checkpoint 8 of 8 (pos_min = 25977, pos_max = 25977, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 9801 | |
|
prompt eval time = 374.53 ms / 31 tokens ( 12.08 ms per token, 82.77 tokens per second) |
|
eval time = 42089.66 ms / 709 tokens ( 59.36 ms per token, 16.84 tokens per second) |
|
total time = 42464.19 ms / 740 tokens |
|
slot release: id 0 | task 9801 | stop processing: n_tokens = 26717, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 10511 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 10511 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 26747 |
|
slot update_slots: id 0 | task 10511 | n_tokens = 26717, memory_seq_rm [26717, end) |
|
slot update_slots: id 0 | task 10511 | prompt processing progress, n_tokens = 26747, batch.n_tokens = 30, progress = 1.000000 |
|
slot update_slots: id 0 | task 10511 | prompt done, n_tokens = 26747, batch.n_tokens = 30 |
|
slot init_sampler: id 0 | task 10511 | init sampler, took 2.25 ms, tokens: text = 26747, total = 26747 |
|
slot update_slots: id 0 | task 10511 | erasing old context checkpoint (pos_min = 22529, pos_max = 22529, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 10511 | created context checkpoint 8 of 8 (pos_min = 26716, pos_max = 26716, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 10511 | |
|
prompt eval time = 623.52 ms / 30 tokens ( 20.78 ms per token, 48.11 tokens per second) |
|
eval time = 102561.43 ms / 1737 tokens ( 59.05 ms per token, 16.94 tokens per second) |
|
total time = 103184.96 ms / 1767 tokens |
|
slot release: id 0 | task 10511 | stop processing: n_tokens = 28483, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 12249 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 12249 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 28514 |
|
slot update_slots: id 0 | task 12249 | n_tokens = 28483, memory_seq_rm [28483, end) |
|
slot update_slots: id 0 | task 12249 | prompt processing progress, n_tokens = 28514, batch.n_tokens = 31, progress = 1.000000 |
|
slot update_slots: id 0 | task 12249 | prompt done, n_tokens = 28514, batch.n_tokens = 31 |
|
slot init_sampler: id 0 | task 12249 | init sampler, took 2.26 ms, tokens: text = 28514, total = 28514 |
|
slot update_slots: id 0 | task 12249 | erasing old context checkpoint (pos_min = 22623, pos_max = 22623, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 12249 | created context checkpoint 8 of 8 (pos_min = 28482, pos_max = 28482, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 12249 | |
|
prompt eval time = 566.94 ms / 31 tokens ( 18.29 ms per token, 54.68 tokens per second) |
|
eval time = 74037.91 ms / 1242 tokens ( 59.61 ms per token, 16.78 tokens per second) |
|
total time = 74604.85 ms / 1273 tokens |
|
slot release: id 0 | task 12249 | stop processing: n_tokens = 29755, truncated = 0 |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv update_slots: all slots are idle |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 13492 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 13492 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 29786 |
|
slot update_slots: id 0 | task 13492 | n_tokens = 29755, memory_seq_rm [29755, end) |
|
slot update_slots: id 0 | task 13492 | prompt processing progress, n_tokens = 29786, batch.n_tokens = 31, progress = 1.000000 |
|
slot update_slots: id 0 | task 13492 | prompt done, n_tokens = 29786, batch.n_tokens = 31 |
|
slot init_sampler: id 0 | task 13492 | init sampler, took 2.40 ms, tokens: text = 29786, total = 29786 |
|
slot update_slots: id 0 | task 13492 | erasing old context checkpoint (pos_min = 22732, pos_max = 22732, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 13492 | created context checkpoint 8 of 8 (pos_min = 29754, pos_max = 29754, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 13492 | |
|
prompt eval time = 739.31 ms / 31 tokens ( 23.85 ms per token, 41.93 tokens per second) |
|
eval time = 69617.41 ms / 1153 tokens ( 60.38 ms per token, 16.56 tokens per second) |
|
total time = 70356.72 ms / 1184 tokens |
|
slot release: id 0 | task 13492 | stop processing: n_tokens = 30938, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 14646 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 14646 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 30969 |
|
slot update_slots: id 0 | task 14646 | n_tokens = 30938, memory_seq_rm [30938, end) |
|
slot update_slots: id 0 | task 14646 | prompt processing progress, n_tokens = 30969, batch.n_tokens = 31, progress = 1.000000 |
|
slot update_slots: id 0 | task 14646 | prompt done, n_tokens = 30969, batch.n_tokens = 31 |
|
slot init_sampler: id 0 | task 14646 | init sampler, took 2.55 ms, tokens: text = 30969, total = 30969 |
|
slot update_slots: id 0 | task 14646 | erasing old context checkpoint (pos_min = 23665, pos_max = 23665, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 14646 | created context checkpoint 8 of 8 (pos_min = 30937, pos_max = 30937, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 14646 | |
|
prompt eval time = 869.43 ms / 31 tokens ( 28.05 ms per token, 35.66 tokens per second) |
|
eval time = 91165.07 ms / 1465 tokens ( 62.23 ms per token, 16.07 tokens per second) |
|
total time = 92034.51 ms / 1496 tokens |
|
slot release: id 0 | task 14646 | stop processing: n_tokens = 32433, truncated = 0 |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv update_slots: all slots are idle |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 16112 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 16112 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 32464 |
|
slot update_slots: id 0 | task 16112 | n_tokens = 32433, memory_seq_rm [32433, end) |
|
slot update_slots: id 0 | task 16112 | prompt processing progress, n_tokens = 32464, batch.n_tokens = 31, progress = 1.000000 |
|
slot update_slots: id 0 | task 16112 | prompt done, n_tokens = 32464, batch.n_tokens = 31 |
|
slot init_sampler: id 0 | task 16112 | init sampler, took 2.57 ms, tokens: text = 32464, total = 32464 |
|
slot update_slots: id 0 | task 16112 | erasing old context checkpoint (pos_min = 23924, pos_max = 23924, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 16112 | created context checkpoint 8 of 8 (pos_min = 32432, pos_max = 32432, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 16112 | |
|
prompt eval time = 1005.12 ms / 31 tokens ( 32.42 ms per token, 30.84 tokens per second) |
|
eval time = 4757.12 ms / 71 tokens ( 67.00 ms per token, 14.92 tokens per second) |
|
total time = 5762.25 ms / 102 tokens |
|
slot release: id 0 | task 16112 | stop processing: n_tokens = 32534, truncated = 0 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 12983208127 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 32534, total state size = 838.264 MiB |
|
srv params_from_: Chat format: Qwen3 Coder |
|
srv load: - looking for better prompt, base f_keep = 0.001, sim = 0.061 |
|
srv update: - cache state: 8 prompts, 4057.276 MiB (limits: 8192.000 MiB, 32768 tokens, 128684 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e4268394340: 595 tokens, checkpoints: 1, 164.704 MiB |
|
srv update: - prompt 0x5e426ad8a3f0: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e426ad9d890: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e424eb6dbb0: 15746 tokens, checkpoints: 7, 972.234 MiB |
|
srv update: - prompt 0x5e426816e6d0: 13615 tokens, checkpoints: 6, 846.889 MiB |
|
srv update: - prompt 0x5e424eb0ef50: 372 tokens, checkpoints: 1, 159.475 MiB |
|
srv update: - prompt 0x5e426af39220: 32534 tokens, checkpoints: 8, 1441.271 MiB |
|
srv get_availabl: prompt cache update took 13719.02 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 16184 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 16184 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 329 |
|
slot update_slots: id 0 | task 16184 | n_past = 20, slot.prompt.tokens.size() = 32534, seq_id = 0, pos_min = 32533, n_swa = 1 |
|
slot update_slots: id 0 | task 16184 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) |
|
slot update_slots: id 0 | task 16184 | erased invalidated context checkpoint (pos_min = 25257, pos_max = 25257, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 16184 | erased invalidated context checkpoint (pos_min = 25822, pos_max = 25822, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 16184 | erased invalidated context checkpoint (pos_min = 25977, pos_max = 25977, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 16184 | erased invalidated context checkpoint (pos_min = 26716, pos_max = 26716, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 16184 | erased invalidated context checkpoint (pos_min = 28482, pos_max = 28482, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 16184 | erased invalidated context checkpoint (pos_min = 29754, pos_max = 29754, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 16184 | erased invalidated context checkpoint (pos_min = 30937, pos_max = 30937, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 16184 | erased invalidated context checkpoint (pos_min = 32432, pos_max = 32432, n_swa = 1, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 16184 | n_tokens = 0, memory_seq_rm [0, end) |
|
slot update_slots: id 0 | task 16184 | prompt processing progress, n_tokens = 265, batch.n_tokens = 265, progress = 0.805471 |
|
slot update_slots: id 0 | task 16184 | n_tokens = 265, memory_seq_rm [265, end) |
|
slot update_slots: id 0 | task 16184 | prompt processing progress, n_tokens = 329, batch.n_tokens = 64, progress = 1.000000 |
|
slot update_slots: id 0 | task 16184 | prompt done, n_tokens = 329, batch.n_tokens = 64 |
|
slot init_sampler: id 0 | task 16184 | init sampler, took 0.03 ms, tokens: text = 329, total = 329 |
|
slot update_slots: id 0 | task 16184 | created context checkpoint 1 of 8 (pos_min = 264, pos_max = 264, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 16184 | |
|
prompt eval time = 23115.51 ms / 329 tokens ( 70.26 ms per token, 14.23 tokens per second) |
|
eval time = 1521.45 ms / 23 tokens ( 66.15 ms per token, 15.12 tokens per second) |
|
total time = 24636.95 ms / 352 tokens |
|
slot release: id 0 | task 16184 | stop processing: n_tokens = 351, truncated = 0 |
|
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 13021897998 |
|
srv get_availabl: updating prompt cache |
|
srv prompt_save: - saving prompt with length 351, total state size = 83.607 MiB |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv load: - looking for better prompt, base f_keep = 0.057, sim = 0.001 |
|
srv load: - found better prompt with f_keep = 1.000, sim = 1.000 |
|
srv update: - cache state: 8 prompts, 2774.988 MiB (limits: 8192.000 MiB, 32768 tokens, 93141 est) |
|
srv update: - prompt 0x5e426789a450: 168 tokens, checkpoints: 1, 154.691 MiB |
|
srv update: - prompt 0x5e4268394340: 595 tokens, checkpoints: 1, 164.704 MiB |
|
srv update: - prompt 0x5e426ad8a3f0: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e426ad9d890: 352 tokens, checkpoints: 1, 159.006 MiB |
|
srv update: - prompt 0x5e424eb6dbb0: 15746 tokens, checkpoints: 7, 972.234 MiB |
|
srv update: - prompt 0x5e426816e6d0: 13615 tokens, checkpoints: 6, 846.889 MiB |
|
srv update: - prompt 0x5e424eb0ef50: 372 tokens, checkpoints: 1, 159.475 MiB |
|
srv update: - prompt 0x5e426af2f1c0: 351 tokens, checkpoints: 1, 158.983 MiB |
|
srv get_availabl: prompt cache update took 1311.45 ms |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 16185 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 16185 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 32547 |
|
slot update_slots: id 0 | task 16185 | n_tokens = 32534, memory_seq_rm [32534, end) |
|
slot update_slots: id 0 | task 16185 | prompt processing progress, n_tokens = 32547, batch.n_tokens = 13, progress = 1.000000 |
|
slot update_slots: id 0 | task 16185 | prompt done, n_tokens = 32547, batch.n_tokens = 13 |
|
slot init_sampler: id 0 | task 16185 | init sampler, took 2.61 ms, tokens: text = 32547, total = 32547 |
|
slot update_slots: id 0 | task 16185 | erasing old context checkpoint (pos_min = 25257, pos_max = 25257, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 16185 | created context checkpoint 8 of 8 (pos_min = 32533, pos_max = 32533, size = 75.376 MiB) |
|
slot print_timing: id 0 | task 16185 | |
|
prompt eval time = 519.86 ms / 13 tokens ( 39.99 ms per token, 25.01 tokens per second) |
|
eval time = 14544.86 ms / 221 tokens ( 65.81 ms per token, 15.19 tokens per second) |
|
total time = 15064.72 ms / 234 tokens |
|
slot release: id 0 | task 16185 | stop processing: n_tokens = 32767, truncated = 1 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv stop: cancel task, id_task = 16185 |
|
srv update_slots: all slots are idle |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 0.993 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 16432 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 16432 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 32547 |
|
slot update_slots: id 0 | task 16432 | n_past = 32547, slot.prompt.tokens.size() = 32767, seq_id = 0, pos_min = 32766, n_swa = 1 |
|
slot update_slots: id 0 | task 16432 | restored context checkpoint (pos_min = 32533, pos_max = 32533, size = 75.376 MiB) |
|
slot update_slots: id 0 | task 16432 | n_tokens = 32534, memory_seq_rm [32534, end) |
|
slot update_slots: id 0 | task 16432 | prompt processing progress, n_tokens = 32547, batch.n_tokens = 13, progress = 1.000000 |
|
slot update_slots: id 0 | task 16432 | prompt done, n_tokens = 32547, batch.n_tokens = 13 |
|
slot init_sampler: id 0 | task 16432 | init sampler, took 2.51 ms, tokens: text = 32547, total = 32547 |
|
slot print_timing: id 0 | task 16432 | |
|
prompt eval time = 240.75 ms / 13 tokens ( 18.52 ms per token, 54.00 tokens per second) |
|
eval time = 13605.62 ms / 221 tokens ( 61.56 ms per token, 16.24 tokens per second) |
|
total time = 13846.37 ms / 234 tokens |
|
slot release: id 0 | task 16432 | stop processing: n_tokens = 32767, truncated = 1 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 |
|
srv params_from_: Chat format: Qwen3 Coder |
|
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.992 (> 0.100 thold), f_keep = 1.000 |
|
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist |
|
slot launch_slot_: id 0 | task 16654 | processing task, is_child = 0 |
|
slot update_slots: id 0 | task 16654 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 33042 |
|
srv send_error: task id = 16654, error: request (33042 tokens) exceeds the available context size (32768 tokens), try increasing it |
|
slot release: id 0 | task 16654 | stop processing: n_tokens = 32767, truncated = 0 |
|
srv update_slots: no tokens to decode |
|
srv update_slots: all slots are idle |
|
srv stop: cancel task, id_task = 16654 |
|
srv update_slots: all slots are idle |
|
srv log_server_r: done request: POST /v1/messages 127.0.0.1 400 |