Citation for Sun26EvolutionaryBlackboxFramework

BibTeX (in press)

@inproceedings{Sun26EvolutionaryBlackboxFramework,
  booktitle = {Proceedings of the Sixteenth ACM Conference on Data and Application Security and Privacy (CODASPY '26)},
  title = {An evolutionary black-box framework for adversarial prompt generation in large language models},
  author = {Sun, Qiyang and Karafili, Erisa},
  year = {2026},
  pubstate = {inpress},
  url = {https://eprints.soton.ac.uk/511265/},
  abstract = {Large language models (LLMs) remain susceptible to adversarial prompts that can bypass alignment mechanisms. Existing approaches to adversarial prompt generation typically rely on manual prompt engineering, helper LLMs, or white-box adversarial machine learning methods, which either lack scalability or require access to model internals. In this paper, we propose a novel black-box framework for automated adversarial prompt generation based on evolutionary algorithms. The framework is instantiated using a genetic algorithm and an evolution strategy and operates without access to internal model parameters, making it applicable to both open-source and proprietary LLMs. To improve search effectiveness under realistic query constraints, we introduce a novel population initialisation strategy based on templates, pre-prompts, and post-prompts. Evolutionary search is guided by heuristic, model-agnostic fitness signals derived from prompt goal semantic similarity, refusal based response assessment, and a small heuristic lexical bonus based on lightweight instruction-following indicators. We evaluate our framework across multiple LLMs using a refusal based attack success rate metric, demonstrating consistent improvements over direct dataset prompting and competitive performance against a state-of-the-art white-box baseline under comparable query budgets. Additional analyses examine fitness stabilisation and cross-model transferability for unseen models.}
}


Copyright © 2024–2026 | Author: Qiyang Sun <Q.Sun@soton.ac.uk> | Privacy | Last modified: 2026-05-12 Tue 22:14 | Built with Emacs 30.2 (Org mode 9.7.11)