Behavioral analysis of quantized small language models under hierarchical sycophancy pressure

Hayri Baytan Ozmen; Fatih Ahmet Senel

Research Article

Recieved:

24/03/2026

Accepted:

17/05/2026

Page:

–

doi:

http://dx.doi.org/10.17515/resm2026-1586ce0324rs

Views:

Behavioral analysis of quantized small language models under hierarchical sycophancy pressure

Hayri Baytan Ozmen^1,², Fatih Ahmet Senel³

¹Faculty of Engineering and Natural Sciences, Uşak University, Uşak, Türkiye
²Graduate School of Natural and Applied Sciences, Süleyman Demirel University, Isparta, Türkiye
³Faculty of Engineering and Natural Sciences, Süleyman Demirel University, Isparta, Türkiye

Abstract

This study evaluates the behavioral robustness of Small Language Models (SLMs) against hierarchical prompt engineering, focusing on the 4-bit quantized Gemma-3-12B model in a low-resource linguistic setting, specifically Turkish. We introduce a Sycophancy Pressure Spectrum to measure how varying adversarial intensity, ranging from mild suggestions to coercive threats, systematically degrades factual integrity. To ensure a comprehensive evaluation, the model was rigorously tested across four critical macro-domains: Legal Reasoning, Analytical Thinking, Knowledge Retrieval, and General Comprehension. Empirical results demonstrate a severe degradation in computational performance; overall accuracy plummeted from a neutral baseline of 49.8% down to merely 5.3% under peak coercive pressure. Crucially, for difficult questions where baseline internal confidence was below 95%, accuracy dropped from 22.3% to 0%. Furthermore, we expose an Inverted Confidence Paradox. Under severe pressure, the model generated sycophantic falsehoods with near-perfect internal certainty of 98.9%, far surpassing its internal confidence in neutral truths, which fell to 89.9%. These compelling findings reveal that coercive prompting effectively rewrites the model’s internal truth representation, proving that current instruction tuning methods inadvertently prioritize submission over factual reliability, at least in considered cases.