Showing posts with label AI. Show all posts
Showing posts with label AI. Show all posts

Tuesday, April 28, 2026

Beyond the Prompt: Vibe Coding

Previously, I explored a provocative reality: the era of manual, meticulous "prompt engineering" is coming to an end. The days of cobbling together the perfect combination of adjectives, persona tricks, and "let's think step by step" commands now seem to be regarded as a thing of the past. But if we are no longer prompt engineers, what exactly are we doing?

TL;DR


Vibe Coding is replacing manual "prompt engineering" as the new discipline for interacting with AI in 2026, representing a fundamental shift from writing instructions to curating intent.

  • What it is: Coined by Andrej Karpathy, Vibe Coding means providing a high-level "vibe" (intended functionality) and letting the AI autonomously generate, compile, and execute the complete software system.

  • Viability: It is highly effective for prototyping, MVPs, and internal tools, allowing rapid development (e.g., building a CRM in moments). However, it has low viability for Enterprise Production due to technical debt, security vulnerabilities, and the lack of architectural oversight.

  • The Trust Gap: Despite massive adoption (92% of US developers use AI tools daily), developer trust in AI-generated code accuracy is low (29%), and roughly 45% of it fails modern security benchmarks.

  • Best Practices: Successful Vibe Coders practice Human Orchestration (reviewing code for security holes), Strategic Decomposition (breaking down requests), and using the "Karpathy Move" (pasting the entire stack trace back to the AI for debugging).

Conclusion: Vibe coding is a "power tool" best utilized by senior engineers who can steer the AI toward stable, secure code.

Monday, April 6, 2026

Architecting Logic: The Chain-of-Thought Prompting Guide

Building on our exploration of Zero-Shot and Few-Shot techniques, Chain-of-Thought (CoT) prompting is the natural next step for tackling tasks that require deep logic. While standard few-shot prompting is excellent at teaching a model what format to output, it often fails at teaching the model how to process a complex problem.

Here is a detailed overview of Chain-of-Thought prompting, why it is critical, and how to implement it effectively.

What is Chain-of-Thought Prompting?

Chain-of-Thought prompting is a technique designed to improve the reasoning capabilities of Large Language Models (LLMs) by forcing them to generate intermediate reasoning steps before providing a final answer. Instead of treating the AI as a "black box" that spits out an immediate conclusion, CoT guides the model to mimic human problem-solving by "talking through" the problem step-by-step.

Why is it Important?

CoT is vital for several reasons:

  • Overcoming Few-Shot Limitations: Standard few-shot prompting provides the final answer but skips the underlying logic, meaning the model fails to learn the process. CoT bridges this gap, allowing models to succeed on complex arithmetic, common-sense, and symbolic reasoning tasks.

  • High Impact, Low Effort: It is a highly effective way to boost performance using off-the-shelf LLMs without the need for expensive fine-tuning.

  • Interpretability: Because the model outputs its reasoning, you gain transparency into its "thought process." If the model hallucinates or makes a mistake, you can read the steps to identify exactly where the logic broke down.

  • Robustness: CoT prompts tend to experience less performance drift when migrating between different LLM versions compared to standard prompts.

Solid Examples of CoT in Action

To understand CoT, let's look at a classic mathematical failure and how CoT fixes it.

Standard Prompt (The Failure):

Prompt : When I was 3 years old, my partner was 3 times my age. Now, I am 20 years old. How old is my partner?


Output: 63 years old.

(The model blindly multiplies without reasoning through the age gap).

1. Zero-Shot Chain-of-Thought

You can trigger CoT without providing any examples simply by appending a specific instruction, most commonly "Let's think step by step".


Prompt : When I was 3 years old, my partner was 3 times my age. Now, I am 20 years old. How old is my partner? Let's think step by step.


Output :

  1. When I was 3 years old, my partner was 3 * 3 = 9 years old.

  2. That is an age difference of 6 years.

  3. Now I am 20 years old, so my partner is 20 + 6 = 26 years old.

  4. The answer is 26.

2. Few-Shot Chain-of-Thought

For even better results, you can combine CoT with few-shot learning by providing examples where the reasoning process itself is mapped out in the demonstration.


Prompt :

Q: When my brother was 2 years old, I was double his age. Now I am 40 years old. How old is my brother? Let's think step by step.

A: When my brother was 2, I was 2 * 2 = 4. That's an age difference of 2 years. Now I am 40, so my brother is 40 - 2 = 38.

Q: When I was 3 years old, my partner was 3 times my age. Now, I am 20 years old. How old is my partner? Let's think step by step.

A: 10

Use Cases for Implementation

Generally, any task that a human would solve by "talking it through" is a great candidate for CoT.

Specific use cases include:

  • Mathematical and Logical Reasoning: Solving complex word problems, physics questions, or symbolic logic puzzles where jumping straight to the answer causes hallucinations.

  • Code Generation and Debugging: Breaking a software request down into functional steps before mapping those steps to specific lines of code.

  • Synthetic Data Generation: Guiding a model to systematically think through the assumptions and target audience of a product before writing a description for it.

Are There Downsides to This Technique?

While powerful, CoT is not a silver bullet and comes with several notable downsides:

Increased Cost and Latency

Because the model must generate the intermediate reasoning text before delivering the final answer, it consumes significantly more output tokens. This means your predictions will cost more money and take longer to generate.

Strict Temperature Requirements

CoT relies on "greedy decoding"—predicting the most logically probable next word. To use CoT effectively, you must set the model's temperature to 0 (or very low), which limits its use in creative tasks.

Diminishing Returns on the Newest Models

Recent research indicates that highly advanced foundation models (like Qwen2.5 or DeepSeek-R1) have been exposed to so much CoT data during training that they have internalized these reasoning patterns. For these extremely strong models, adding traditional CoT exemplars often fails to improve reasoning ability beyond standard zero-shot prompting, as the models simply ignore the examples and rely on their internal knowledge.

API Policy Restrictions

For newer, dedicated "reasoning models" (like OpenAI's o-series), the models handle the chain of thought internally. Attempting to manually extract or force CoT reasoning through prompts is often unsupported and can even violate Acceptable Use Policies 14.

Chain-of-thought (CoT) prompting remains a cornerstone of LLM interaction, but its role has shifted from a "magic trick" that fixes everything to a specialized tool that must be used strategically.

Relevancy and usefulness of CoT today

In the current landscape of 2026, the relevance of CoT depends entirely on whether you are using a Reasoning Model (like OpenAI’s o1/o3 or Gemini Flash 2.5) or a Standard Model (like GPT-4o or Claude 3.5 Sonnet).

1. For Standard Models (GPT-4o, Claude 3.5, Gemini 1.5 Pro)

CoT is still highly useful but inconsistent. Recent studies show that "thinking step-by-step" provides a significant boost on complex logic, but can actually degrade performance on simple tasks.

  • The "Thinking" Tax: Using CoT increases latency by 35% to 600% and scales token costs proportionally.

  • Performance Gains: Models like Claude 3.5 Sonnet still see accuracy improvements of roughly 10–12% on complex reasoning tasks when prompted with CoT.

  • The Inconsistency Risk: Paradoxically, Gemini 1.5 Pro and GPT-4o sometimes perform worse (-17% in some benchmarks) when forced to use CoT on "easy" questions they would have otherwise answered correctly via intuition.

2. For Reasoning Models (OpenAI o1/o3, Gemini 2.0/2.5)

CoT prompting is becoming redundant. These models have "Internal CoT" baked into their architecture—they reason before they speak by default.

  • Diminishing Returns: Explicitly adding "think step by step" to a model that is already designed to think (like o3-mini) yields marginal gains (often <3%) while significantly increasing the time-to-first-token.

  • Conflict of Logic: In some cases, forcing an external chain of thought can interfere with the model’s internal reinforcement-learned reasoning paths, leading to "overthinking" errors.

Comparison: When to Use CoT

Task Type

Utility

Recommended Approach

Simple Extraction/Q&A

❌ Harmful

Ask for a direct answer to save cost and avoid "hallucinated" logic.

Complex Math/Coding

✅ Critical

Use CoT or, better yet, use a dedicated Reasoning Model.

Creative Writing

⚠️ Mixed

CoT can make the output feel "formulaic" or robotic.

Policy/Compliance

✅ High

Use CoT for explainability—it’s useful for auditing why a model made a decision.

The Modern "Best Practice"

Instead of the generic "Let's think step by step," the current trend is Structured CoT. Rather than letting the model wander, you define the "steps" you want it to take:

  1. Analyze the user's intent.

  2. Identify relevant variables/constraints.

  3. Draft a logical solution.

  4. Verify the solution against the constraints.

Summary

CoT is not a universal "better" button anymore. It is a precision tool. If you are using the latest reasoning models, you can likely retire the "step-by-step" prompt entirely. If you are using standard models for complex logic, it remains your best defense against "hallucinated" shortcuts—just be prepared to pay for it in latency and tokens.


Tuesday, March 31, 2026

Stop Guessing, Start Directing: From Zero-Shot to Few-Shot Guide for AI Precision

When I started using AI almost a year ago I found the whole thing just utterly amazing. It took me quite a while to realise that the answer wasn’t just something that it had found on the web, it had actually generated that answer for me. I was using it like Google Search and it’s much more powerful than that.

After issuing a command, five seconds later my screen is filled with a response that is… technically correct, but completely useless. It’s too wordy and the tone is wrong. It hallucinated facts and worst of all, it sounds like a robot trying too hard to be human. There are so many tell-tale signs of AI generated text. Those long “em dashes” and nearly every paragraph is summarized with bullet points. Don’t worry I get rid of all my bullet points before posting my blogs ;-)

It took me a while to realise that continually asking AI to ‘regenerate’ an answer was effectively asking it to roll the dice again and again. Then I stumbled across Zero-Shot, One-Shot and Few-Shot Prompting…

Learning to make use of these techniques will be the single most powerful shift you can make to your workflow: understanding when to use Zero-Shot Prompting (your quick-and-dirty command) and One-Shot Prompting (The Goldilocks technique) and when to switch to Few-Shot Prompting (giving the AI a template to follow).

If you want the AI to stop guessing and start mimicking your brand’s voice, your logic, or your formatting, you need to master these techniques and understand when each one is most appropriate. In 2026, the era of treating AI like a magic 8-ball is over. We are now in the era of structured prompting.

So let’s try to understand these techniques a little better…

Zero-Shot : The "Quick & Dirty" Method

Zero-shot is the "Google Search" that I was unwittingly using when I started working with AI, and I’m sure everyone also started here. It’s built for speed, intuition, and broad strokes. You aren't teaching the AI; you are tapping into its existing massive library of patterns.

If you’re interested in the “sciency” bit, Zero-shot relies on “Global Probability”. When you ask for a "legal summary," the AI looks at the trillions of words it was trained on and predicts what a "standard" legal summary looks like. It’s essentially playing a high-stakes game of "predict the next word" based on general consensus.

What is it good for?…

After spending an initial period of asking it to create a poem about pirates lost in a garden centre and writing a story about a grubby bear with a gambling addiction, I really found it useful for brainstorming and providing me with a list of ideas, such as:

  • Suggesting 10 titles for a blog article.

  • Summarize this 40-page PDF into 5 bullet points.

  • Broad Fact-Finding such as "What were the three primary causes of the French Revolution?". These types of prompts lead to those Google Search AI Overviews which provide a deeper, more direct answer than sifting through loads of websites and Wikipedia articles.

  • Translate this menu into conversational Italian.

The Danger Zone

Once you’ve been around the block on reading AI generated text you’ll understand you’ll start spotting it instantly. So this Zero-Shot method is the worst for providing generic or clichéd blocks of text, mostly due to this "Global Probability" mechanism.

And for that reason alone it is more likely to hallucinate with a "plausible-sounding" answer if it doesn't know the fact, especially without giving it examples to anchor it.

Lastly, If you need the data in a specific format like JSON or CSV, Zero-shot will almost always include "here is your data!" text preceding the data that breaks your code.

One-Shot : The "Goldilocks" Technique

So, onwards and upwards. Sometimes you don’t need a whole training set; you just need to clear up the confusion. One-shot is providing exactly one example. It’s the most efficient way to define a "style" or "format" without cluttering your context window.

Note : A context window refers to the amount of text (measured in tokens) that a Large Language Model (LLM) can process or "remember" at one time.

One-shot acts as a structural anchor. While Zero-shot leaves the AI guessing about your preferred format, a single example removes 90% of that ambiguity. It’s particularly effective for high-performing 2026 models like Gemini 3 Flash or GPT-5, which are now sensitive enough to pivot their entire behavior based on a single data point.

What is it good for?…

  • If you want the output in a specific JSON structure or bullet-point style, you can define the exact format you want just by providing a verbatim example.

  • Provide one previous email you wrote so the AI can mimic your specific tone and level of formality.

  • Or just to be a little more obscure, maybe you’re translating English to "Legal-Speak" where one example shows the level of complexity that you are trying to achieve.

Few-Shot : The "Pattern-Match" Powerhouse

If Zero-shot is “suck it and see”, Few-shot is a 1-on-1 coaching session. You are providing a "mini-dataset" within the prompt, forcing the AI to ignore its global averages and follow your specific logic.

What is it good for?...

In 2026, models now have massive "context windows" (their short-term memory). Few-shot works because the AI prioritizes the patterns found inside the prompt over the patterns it learned during training. You are essentially creating a temporary "Custom GPT" for that single chat.

Why "Three" is the Magic Number

  • One Example is a suggestion (the AI might think it's a fluke).

  • Two Examples create a line (a basic direction).

  • Three Examples create a pattern. Once the AI sees a pattern repeated three times, its mathematical confidence in mimicking that pattern skyrockets.

Pro-Tip: "Diverse Few-Shotting"

Don’t just give three identical examples. Give three different versions of a success.

  • Example 1: Short sentence success.

  • Example 2: Long, complex paragraph success.

  • Example 3: Success with an "edge case" (like a negative or a question).
    This teaches the AI the boundaries of your request, not just the middle.

The "Shot" Summary

Here’s a wee summary to maybe give you a rule of thumb:

Technique

Method

Accuracy

Token Cost

Best For...

Zero-Shot

Just a command.

⭐⭐

🟢 Lowest

General knowledge, brainstorming

One-Shot

Command + 1 Example.

⭐⭐⭐

🟡 Low

Setting a specific format or tone

Few-Shot

Command + 3-5 Examples.

⭐⭐⭐⭐⭐

🔴 Higher

Logic, complex classification, data clean-up


Some Examples (The "Secret Sauce")

Now I will attempt to summarize with some solid examples to make it less abstract and academic…

Zero-shot prompting

This involves giving the model a direct instruction to perform a task without providing any examples.


Example : Translate this sentence from French to English: 'Bonjour le monde'..


Where it succeeds : Zero-shot succeeds because it is highly efficient for simple, well-understood tasks that the model has frequently encountered during its training, such as straightforward translations like this.

Where it fails : It often falls short when a task requires a specific output structure or when the prompt involves ambiguity, as the model is left guessing the desired format without a pattern to follow.

One-Shot Prompting

One-shot prompting enhances the zero-shot approach by providing exactly one input-output example before presenting the actual request.


Example : Translate the following sentence. Example: 'Salut' → 'Hello'. Now translate: 'Bonjour' → ?.


Where it succeeds : This technique is ideal when the model needs a specific format or context to understand a fairly simple task, giving it a basic starting point to imitate.

Where it fails : One-shot prompting struggles with nuanced tasks because a single example cannot fully capture the range of possible edge cases or complex formatting rules.

Few-Shot Prompting

Few-shot prompting provides multiple examples (typically two to five) to help the model recognize patterns and learn in-context.


Example : Parse a customer's pizza order into valid JSON

EXAMPLE 1 : I want a small pizza with cheese, tomato sauce, and pepperoni.

JSON Response: { "size": "small", "type": "normal", "ingredients": [["cheese", "tomato sauce", "peperoni"]] }

EXAMPLE 2 : Can I get a large pizza with tomato sauce, basil and mozzarella JSON Response: { "size": "large", "type": "normal", "ingredients": [["tomato sauce", "basil", "mozzarella"]] }

EXAMPLE 1 : Now, I would like a large pizza, with the first half cheese and mozzarella. And the other tomato sauce, ham and pineapple. JSON Response: { "size": "large", "type": "normal", "one-half-ingredients": [["tomato sauce", "basil", "mozzarella"]], "second-half-ingredients": [["tomato sauce", "ham", "pineapple"]] }


Where it succeeds : Few-shot prompting dramatically succeeds where zero-shot and one-shot fail by enforcing strict structural patterns (like generating JSON, YAML, or bulleted lists) and teaching the model how to handle varied, nuanced inputs. It allows the model to learn entirely new concepts in-context, such as successfully using a made-up word in a sentence after seeing a few examples of how it is done.

Where it fails : Few-shot prompting hits its limits when dealing with complex, multi-step reasoning or arithmetic tasks. For instance, providing multiple examples of whether a group of odd numbers adds up to an even number might still result in the model returning an incorrect answer for a new list of numbers. Because standard few-shot prompting only shows the final answer rather than the process of getting there, the model fails to learn the underlying logic. To succeed where few-shot fails, you must transition to Chain-of-Thought (CoT) prompting, which provides examples that break the problem down into intermediate reasoning steps. I will delve into CoT prompting in a future post.

Conclusion (and a wee challenge)

The difference between basic AI usage and true mastery often comes down to context.

Use Zero-Shot when you are exploring, brainstorming, or doing a task so simple that it’s almost impossible to mess up. It’s built for speed.

But when reliability, predictability, and precise formatting matter—especially if you are automating workflows—you must use Few-Shot. By providing just three curated examples, you anchor the model's logic, eliminate "AI-isms," and ensure consistent results.

A wee challenge for this week:

  1. Take the last prompt you wrote that gave you a generic, frustrating result.

  2. Structure that same task as a Few-Shot prompt, providing the AI with three examples of what a perfect response looks like.

  3. Compare the outputs.

Beyond the Prompt: Vibe Coding

Previously , I explored a provocative reality: the era of manual, meticulous "prompt engineering" is coming to an end. The days of...