Notes from the Wired
This is a website where I write articles on various topics that interest me, carving out a bit of cyberspace for myself.
You shouldn't believe anything I talk about — I use words entirely recreationally.
Most Recent
May. 08
PRISONBREAK: Jailbreaking Large Language Models With at Most Twenty-Five Targeted Bit-flips
Paper Title: PRISONBREAK: Jailbreaking Large Language Models With at Most Twenty-Five Targeted Bit-flips
Link to Paper: https://arxiv.org/abs/2412.07192
Date: 10. Dec. 2024
Paper Type: LLM, Attack-Paper, Jailbreaking
Short Abstract: This paper is about jailbreaking LLMs, which means circumventing their protections against producing harmful content. To do this, their attack needs to flip at most 5–25 bits, which is 40× fewer bits than prior attacks and 20× faster than previous methods. They evaluate their method on 10 different open-source LLMs and achieve an attack success rate of 80–98%, with minimal utility and performance loss in the models.
May. 08
Resilience Assessment of Large Language Models under Transient Hardware Faults
Paper Title: Resilience Assessment of Large Language Models under Transient Hardware Faults
Link to Paper: https://ieeexplore.ieee.org/document/10301253
Date: 31. Jan 2023
Paper Type: LLM, Testing, Errors
Short Abstract: The paper investigates how resilient Large Language Models (LLMs) are to transient hardware faults (also called soft errors), such as random bit flips caused by cosmic rays, power fluctuations, or electromagnetic interference. This matters because LLMs are increasingly used in safety-critical systems. A transient fault may silently corrupt outputs without crashing the system, called a Silent Data Corruption (SDC).
May. 07
Philosophical Ramblings #13: What is existence?
What is existence? The meaning of words arises through their usage. We use words and learn what they mean not through strict dictionary definitions, but more through associations, so that each word carries a network of related meanings and experiences. From this, meaning emerges. But this meaning is varied; there is no single strict definition of existence. Rather, we often use different senses of the word in different contexts.
When I say that the table exists in front of me, I associate existence with something I can touch, clearly see, and physically interact with. When I say that love exists, I mean something different: not a physical object, but a lived reality expressed through feeling, action, relation, and interpretation. When I say that a unicorn exists, I mean that there are stories, symbols, and cultural imaginaries in which unicorns appear.