Showing posts with label artificial intelligence. Show all posts
Showing posts with label artificial intelligence. Show all posts

Tuesday, June 2, 2026

Fortifying the Digital Vault: A Wee Guide to AI Privacy

In a nutshell (TL;DR)...

The widespread use of generative AI tools introduces major security risks for private and confidential company information. Sensitive data can leak when prompts are retained for logging/training, employees paste data into unmanaged "Shadow AI" accounts (the "Copy/Paste Blind Spot"), or malicious "Prompt Injections" trick the model. Consequences are severe, including regulatory fines (GDPR/HIPAA), data breaches, and loss of competitive advantage. To stay secure, organizations must:

  • Anonymize sensitive data (PII) before using external LLMs.

  • Prioritize vendors offering Zero Data Retention (ZDR).

  • Banish "Shadow AI" by enforcing Single Sign-On (SSO).

  • Upgrade to action-centric Data Loss Prevention (DLP) that monitors copy/paste actions.

Apply the principle of least privilege and keep a human in the loop for critical actions.

The AI Privacy Guide: How to Keep Your Confidential Data Safe in the Age of LLMs

The company I work for has drummed into me the perils of letting slip any confidential information when working with AI applications, but just how important is it? My employer specifically lists the AI applications we are allowed to use when working with confidential information, so it’s a really important thing to bear in mind. Let’s have a look at what the problems are and how we can protect ourselves, our customers and our employers…

Everyone is officially living in the era of Artificial Intelligence. From drafting emails to analyzing complex datasets, generative AI and Large Language Models (LLMs) have seamlessly integrated into our daily workflows. In fact, nearly half of all enterprise employees are already using these tools. But amid all this newfound productivity, there is a crucial conversation we need to have: how are we protecting our private data and confidential company information?

While AI assistants are incredibly helpful, treating them like a private diary or a secure company vault can lead to serious risks. Let’s break down exactly how sensitive information can slip through the cracks, what the consequences are, and the best practices you should adopt to stay secure.

How Does Confidential Information Actually Go Public?

When you type a prompt into an external LLM, that data is processed by a third-party provider. If you aren't careful, sensitive information can be exposed in a few common ways:

Logging and Training Contamination

Many AI providers retain user prompts for a certain period to monitor for abuse, debug their systems, or even train future versions of their models. If you paste confidential data into a prompt, it could end up stored on the provider's servers or, worse, replicated in the model's future outputs.

The Copy/Paste Blind Spot

A staggering 77% of employees paste data directly into generative AI tools, and the vast majority of this activity happens on unmanaged personal accounts. Because this bypasses official corporate channels, IT and security teams have no visibility into what is being shared, creating a massive "Shadow AI" blind spot.

Prompt Injections

Malicious actors can use "prompt injections", carefully crafted inputs designed to manipulate the AI's behavior to trick the model into revealing sensitive information. This can lead to the AI accidentally exposing personally identifiable information (PII), confidential business strategies, or even system credentials. I’ve made a note to dig deeper on this subject for a later post…

The Uncomfortable Consequences of Data Leaks

The fallout from exposing sensitive data to an LLM is rarely a minor hiccup. When PII or corporate secrets leak, the consequences can be severe.

Regulatory Penalties

Mishandling personal data violates strict data protection regulations like GDPR and HIPAA. Failing to comply with these laws can result in massive legal and financial penalties.

Data Breaches and Loss of Trust

If a customer service chatbot or an internal AI tool inadvertently reveals private user details or passwords, it can lead to full-scale data breaches. This erodes user trust and severely damages your organization's reputation.

Loss of Competitive Advantage

Exposing proprietary business data or intellectual property can directly result in a loss of your competitive edge in the market.

Best Practices for Handling Sensitive Information with AI

Fortunately, you don't have to abandon AI to keep your data safe. By implementing a few strategic best practices, you can enjoy the benefits of LLMs while minimizing your risk.

1. Anonymize Before You Analyze

Before sending a prompt containing sensitive data to an external LLM, scrub the text of any PII. You can use automated tools to detect and replace names, emails, and phone numbers with generic placeholders (e.g., swapping a real name for [PERSON] or [EMAIL]). This allows the AI to understand the context of your prompt without ever seeing the raw, sensitive data.

2. Demand "Zero Data Retention" (ZDR)

If you are procuring AI tools for your company, prioritize vendors that offer "Zero Data Retention" agreements. Under a ZDR policy, the AI provider processes your prompt and immediately returns the response without writing your data to any persistent storage, logs, or training queues. This ensures your data exists only in memory for the duration of the request. I think this is what my employer might have in place for the applications I am allowed to use.

3. Banish "Shadow AI" and Enforce SSO

Employees often use unmanaged personal accounts to access AI tools, completely bypassing enterprise security. To regain control, organizations must restrict the use of personal accounts for business-critical apps and enforce Single Sign-On (SSO) across all corporate logins.

4. Upgrade Your Data Loss Prevention (DLP)

Traditional Data Loss Prevention tools are heavily focused on file uploads, but today's sensitive data usually leaks when employees copy and paste text directly into AI prompts. Organizations need to shift to "action-centric" DLP policies that monitor file-less data transfers and enforce controls directly at the web browser level.

5. Keep a Human in the Loop and Limit Privileges

Finally, never give an AI unchecked autonomy. Apply the principle of "least privilege" by ensuring your AI applications only have access to the specific data sources they absolutely need. For high-impact actions, like modifying files or handling highly sensitive records, always require human approval before the AI can proceed.

AI is a powerful collaborator, but it is ultimately up to us to set the boundaries. By treating generative AI platforms with the same security rigor as any other enterprise tool, we can innovate quickly without putting our most valuable data on the line.


Next week let’s take a shifty at this “prompt injection” malarky and see how we can protect ourselves from that…


Tuesday, May 12, 2026

Beyond the Prompt: Context Engineering

 

TL;DR

Context Engineering is the new discipline replacing traditional prompt engineering. Instead of massive, static prompts that lead to "context rot" and high costs, Context Engineering architects dynamic systems to feed Large Language Models (LLMs) only the necessary information at the right time. This is achieved through techniques like Query Rewriting, Active Memory Management (for key facts), and standardized tools like the Model Context Protocol (MCP) for connecting to external APIs. The focus shifts from talking to a model to building the world it lives in.

Apologies for the absence of a post last week, the day job and family holidays got in the way! In my previous a couple of weeks ago I waffled on about Vibe Coding, which is only one aspect of AI that seems to be placing “prompt engineering” as a thing of the past. If vibe coding is how we interact with the output of AI, Context Engineering is how we manage the input.

Context Engineering is the discipline of designing the architecture that feeds an LLM the right information at the right time. It is not about changing the model itself, but about building the bridges that connect it to the outside world, retrieving external data, connecting it to live tools, and giving it a memory.

From Prompts to Context

I’ve heard it mentioned in a few articles on this matter that "if your prompt is a recipe, the model is your kitchen".

In traditional prompt engineering, you tried to cram everything into the recipe. You would write a massive prompt containing the persona, the task, the rules, and all the reference text. But models have a limited "context window" (i.e. their working memory). Overloading this window increases costs, slows down response times, and causes models to suffer from "context rot," where they forget important instructions.

Context engineering solves this by treating the prompt as a dynamic, living ecosystem. It acts like the mise en place for a chef, gathering only the exact ingredients and tools needed for the immediate task before cooking.

A Real Example

The Old Way (Static Prompting)

From yester-year, as far back as 2024! We would employ a workflow where we try to solve the AI's lack of knowledge by cramming everything into a single, massive text box.

  • The Process: You build a 5,000-word system prompt that includes the persona instructions, the entire 50-page company return policy, and the complete transcript of the user's last 20 messages.

  • The Bottleneck: This approach relies on a static "retrieve, then generate" pipeline. As the conversation grows, the "context window" (the AI's active working memory) becomes overloaded. The model suffers from "context rot" or "context distraction", it begins to forget instructions buried in the middle of the prompt, hallucinations increase, and your API costs skyrocket because you are paying to process thousands of irrelevant tokens on every single turn.

The New Way (Context Engineering Ecosystems)

In this new workflow, instead of a single prompt, we architect a dynamic ecosystem:

  • Query Rewriting: A frustrated user types, "How do I make this work when it keeps failing?" Instead of feeding this vague complaint to your main AI, a background "Query Rewriter" agent intercepts it. It analyzes the session and rewrites the hidden search to: "API call failure, troubleshooting authentication headers, rate limiting". This ensures the database retrieves the exact technical manual needed.

  • Active Memory Management: Instead of passing the entire chat history back to the model, an automated "Memory Manager" runs an ETL (Extract, Transform, Load) pipeline in the background. It extracts key facts (e.g., extracting the fact {"shoe_size": 10} from a long conversation), consolidates it by deleting the user's old size 9 preference to avoid conflicting data, and stores it in a Vector Database. On the next turn, the system only injects that single relevant fact into the prompt.

  • Standardized Tools (MCP): Instead of writing custom integration code for every API your agent needs to touch, you use the Model Context Protocol (MCP). Dubbed the "USB-C for AI," MCP allows your agent to seamlessly connect to standardized servers. The agent uses a tool like process_refund(order_id) by outputting structured JSON, observing the result, and adjusting its plan without human intervention.

In summary…

Prompt engineering hasn't disappeared; it has just been absorbed into something much bigger.

We have transitioned from being "prompters" who talk to a model, to architects who build the world the model lives in. Whether you are vibe coding a new application into existence with natural language, or context engineering a sophisticated retrieval pipeline for an enterprise AI agent, the focus is no longer on hacking the AI with clever words. It is about orchestrating intent, memory, and data to create truly autonomous systems.


Tuesday, April 28, 2026

Beyond the Prompt: Vibe Coding

Previously, I explored a provocative reality: the era of manual, meticulous "prompt engineering" is coming to an end. The days of cobbling together the perfect combination of adjectives, persona tricks, and "let's think step by step" commands now seem to be regarded as a thing of the past. But if we are no longer prompt engineers, what exactly are we doing?

TL;DR


Vibe Coding is replacing manual "prompt engineering" as the new discipline for interacting with AI in 2026, representing a fundamental shift from writing instructions to curating intent.

  • What it is: Coined by Andrej Karpathy, Vibe Coding means providing a high-level "vibe" (intended functionality) and letting the AI autonomously generate, compile, and execute the complete software system.

  • Viability: It is highly effective for prototyping, MVPs, and internal tools, allowing rapid development (e.g., building a CRM in moments). However, it has low viability for Enterprise Production due to technical debt, security vulnerabilities, and the lack of architectural oversight.

  • The Trust Gap: Despite massive adoption (92% of US developers use AI tools daily), developer trust in AI-generated code accuracy is low (29%), and roughly 45% of it fails modern security benchmarks.

  • Best Practices: Successful Vibe Coders practice Human Orchestration (reviewing code for security holes), Strategic Decomposition (breaking down requests), and using the "Karpathy Move" (pasting the entire stack trace back to the AI for debugging).

Conclusion: Vibe coding is a "power tool" best utilized by senior engineers who can steer the AI toward stable, secure code.

Monday, April 6, 2026

Architecting Logic: The Chain-of-Thought Prompting Guide

Building on our exploration of Zero-Shot and Few-Shot techniques, Chain-of-Thought (CoT) prompting is the natural next step for tackling tasks that require deep logic. While standard few-shot prompting is excellent at teaching a model what format to output, it often fails at teaching the model how to process a complex problem.

Here is a detailed overview of Chain-of-Thought prompting, why it is critical, and how to implement it effectively.

What is Chain-of-Thought Prompting?

Chain-of-Thought prompting is a technique designed to improve the reasoning capabilities of Large Language Models (LLMs) by forcing them to generate intermediate reasoning steps before providing a final answer. Instead of treating the AI as a "black box" that spits out an immediate conclusion, CoT guides the model to mimic human problem-solving by "talking through" the problem step-by-step.

Why is it Important?

CoT is vital for several reasons:

  • Overcoming Few-Shot Limitations: Standard few-shot prompting provides the final answer but skips the underlying logic, meaning the model fails to learn the process. CoT bridges this gap, allowing models to succeed on complex arithmetic, common-sense, and symbolic reasoning tasks.

  • High Impact, Low Effort: It is a highly effective way to boost performance using off-the-shelf LLMs without the need for expensive fine-tuning.

  • Interpretability: Because the model outputs its reasoning, you gain transparency into its "thought process." If the model hallucinates or makes a mistake, you can read the steps to identify exactly where the logic broke down.

  • Robustness: CoT prompts tend to experience less performance drift when migrating between different LLM versions compared to standard prompts.

Solid Examples of CoT in Action

To understand CoT, let's look at a classic mathematical failure and how CoT fixes it.

Standard Prompt (The Failure):

Prompt : When I was 3 years old, my partner was 3 times my age. Now, I am 20 years old. How old is my partner?


Output: 63 years old.

(The model blindly multiplies without reasoning through the age gap).

1. Zero-Shot Chain-of-Thought

You can trigger CoT without providing any examples simply by appending a specific instruction, most commonly "Let's think step by step".


Prompt : When I was 3 years old, my partner was 3 times my age. Now, I am 20 years old. How old is my partner? Let's think step by step.


Output :

  1. When I was 3 years old, my partner was 3 * 3 = 9 years old.

  2. That is an age difference of 6 years.

  3. Now I am 20 years old, so my partner is 20 + 6 = 26 years old.

  4. The answer is 26.

2. Few-Shot Chain-of-Thought

For even better results, you can combine CoT with few-shot learning by providing examples where the reasoning process itself is mapped out in the demonstration.


Prompt :

Q: When my brother was 2 years old, I was double his age. Now I am 40 years old. How old is my brother? Let's think step by step.

A: When my brother was 2, I was 2 * 2 = 4. That's an age difference of 2 years. Now I am 40, so my brother is 40 - 2 = 38.

Q: When I was 3 years old, my partner was 3 times my age. Now, I am 20 years old. How old is my partner? Let's think step by step.

A: 10

Use Cases for Implementation

Generally, any task that a human would solve by "talking it through" is a great candidate for CoT.

Specific use cases include:

  • Mathematical and Logical Reasoning: Solving complex word problems, physics questions, or symbolic logic puzzles where jumping straight to the answer causes hallucinations.

  • Code Generation and Debugging: Breaking a software request down into functional steps before mapping those steps to specific lines of code.

  • Synthetic Data Generation: Guiding a model to systematically think through the assumptions and target audience of a product before writing a description for it.

Are There Downsides to This Technique?

While powerful, CoT is not a silver bullet and comes with several notable downsides:

Increased Cost and Latency

Because the model must generate the intermediate reasoning text before delivering the final answer, it consumes significantly more output tokens. This means your predictions will cost more money and take longer to generate.

Strict Temperature Requirements

CoT relies on "greedy decoding"—predicting the most logically probable next word. To use CoT effectively, you must set the model's temperature to 0 (or very low), which limits its use in creative tasks.

Diminishing Returns on the Newest Models

Recent research indicates that highly advanced foundation models (like Qwen2.5 or DeepSeek-R1) have been exposed to so much CoT data during training that they have internalized these reasoning patterns. For these extremely strong models, adding traditional CoT exemplars often fails to improve reasoning ability beyond standard zero-shot prompting, as the models simply ignore the examples and rely on their internal knowledge.

API Policy Restrictions

For newer, dedicated "reasoning models" (like OpenAI's o-series), the models handle the chain of thought internally. Attempting to manually extract or force CoT reasoning through prompts is often unsupported and can even violate Acceptable Use Policies 14.

Chain-of-thought (CoT) prompting remains a cornerstone of LLM interaction, but its role has shifted from a "magic trick" that fixes everything to a specialized tool that must be used strategically.

Relevancy and usefulness of CoT today

In the current landscape of 2026, the relevance of CoT depends entirely on whether you are using a Reasoning Model (like OpenAI’s o1/o3 or Gemini Flash 2.5) or a Standard Model (like GPT-4o or Claude 3.5 Sonnet).

1. For Standard Models (GPT-4o, Claude 3.5, Gemini 1.5 Pro)

CoT is still highly useful but inconsistent. Recent studies show that "thinking step-by-step" provides a significant boost on complex logic, but can actually degrade performance on simple tasks.

  • The "Thinking" Tax: Using CoT increases latency by 35% to 600% and scales token costs proportionally.

  • Performance Gains: Models like Claude 3.5 Sonnet still see accuracy improvements of roughly 10–12% on complex reasoning tasks when prompted with CoT.

  • The Inconsistency Risk: Paradoxically, Gemini 1.5 Pro and GPT-4o sometimes perform worse (-17% in some benchmarks) when forced to use CoT on "easy" questions they would have otherwise answered correctly via intuition.

2. For Reasoning Models (OpenAI o1/o3, Gemini 2.0/2.5)

CoT prompting is becoming redundant. These models have "Internal CoT" baked into their architecture—they reason before they speak by default.

  • Diminishing Returns: Explicitly adding "think step by step" to a model that is already designed to think (like o3-mini) yields marginal gains (often <3%) while significantly increasing the time-to-first-token.

  • Conflict of Logic: In some cases, forcing an external chain of thought can interfere with the model’s internal reinforcement-learned reasoning paths, leading to "overthinking" errors.

Comparison: When to Use CoT

Task Type

Utility

Recommended Approach

Simple Extraction/Q&A

❌ Harmful

Ask for a direct answer to save cost and avoid "hallucinated" logic.

Complex Math/Coding

✅ Critical

Use CoT or, better yet, use a dedicated Reasoning Model.

Creative Writing

⚠️ Mixed

CoT can make the output feel "formulaic" or robotic.

Policy/Compliance

✅ High

Use CoT for explainability—it’s useful for auditing why a model made a decision.

The Modern "Best Practice"

Instead of the generic "Let's think step by step," the current trend is Structured CoT. Rather than letting the model wander, you define the "steps" you want it to take:

  1. Analyze the user's intent.

  2. Identify relevant variables/constraints.

  3. Draft a logical solution.

  4. Verify the solution against the constraints.

Summary

CoT is not a universal "better" button anymore. It is a precision tool. If you are using the latest reasoning models, you can likely retire the "step-by-step" prompt entirely. If you are using standard models for complex logic, it remains your best defense against "hallucinated" shortcuts—just be prepared to pay for it in latency and tokens.


Securing Intelligence: A Guide to Preventing Prompt Injection

  In a nutshell (TL;DR)... Prompt injection is a critical security vulnerability where malicious input tricks LLMs into ignoring their origi...