My name is Roger Oriol, I am a Software Architect based in Barcelona, Spain. I am a MSc graduate in Big Data Management, Technologies and Analytics. This blog will be the vehicle to divulgate and discuss topics on web development, data architecture, software architecture and much more.
Recent posts
-
๐ [Link] GPT-5
OpenAI has finally released it's GPT-5 model, and as we were already expecting, it's a hybrid reasoning model. Now the model itself chooses how much to think about each task, and you can force the reasoning effort as well. This probably means the end of the o series of reasoning models from OpenAI, as the regular language models and the reasoning models will now be unified.
Of course, the benchmarks look good but saturated. What stands out to me is that they announced a 74.9 score on SWE-bench (with high reasoning effort), which is just a tad over the score from Claude Opus 4.1 just announced this very same week (74.5).
With the GPT-5 iteration, come 4 new models: GPT-5, GPT-5-mini, GPT-5-nano and GPT-5 Chat. Free users will be allowed to use GPT-5, although when they hit the maximum quota, they will fallback to GPT-5-mini.
GPT-5 allows to set the reasoning effort using the "reasoning.effort" parameter, although you can also force it telling the model to "Think hard about this". These new models introduce a new reasoning tier called "minimal" which produces a few as possible reasoning tokens before answering. The output tokens can also be customized by setting the "verbosity" parameter, which didn't exist for past models. This parameter can be set to "high", "medium" or "low".
The new models also bring some new quality of life improvements for tool calling:
- Tool choice: While the models can choose to call zero, one or multiple tools, you can now set "tool_choice" to "forced" to force the invocation of at least one tool. You can also set a specific function that must be called by passing {"type": "function", "name": "function name"} to the "tool_choice" parameter. Finally, in "tool_choice" you can also specify a list of allowed tools from the list of tools provided to the model: {"type": "allowed_tools", "mode": "auto", "tools": []}.
- Tool preambles: New feature that makes the models explain the rationale behind why they are invoking a function. This provides transparency and better understanding on the model's process. By default, this feature is not enabled. To enable it, you have to include a system message like "Before you call a tool, explain why you are calling it.".
- Custom tools: This feature allows to define functions that allow unstructured, free-form text as input, which frees the model from using a structured JSON object to call the tool. This might improve the ability of the model to call these tools. This can be even more powerful when paired with Context-Free Grammar.
- Context-Free Grammar: This feature allows to set grammar rules for the free-form text, to make them follow a set of rules. You can define this rules using Lark or a regular expression.
The GPT-5 models are now available both in ChatGPT and in the OpenAI API, give them a try!
[[ Visit external link ]] -
๐ [Quote] GPT-5 variants
It's not at all straightforward to understand the variants of the GPT-5 model released today. The API docs describe four models: gpt-5, gpt-5-mini, gpt-5-nano and gpt-5-chat. However, the system card describes 6 models to replace older models, and none of the names match with the API:
It can be helpful to think of the GPT-5 models as successors to previous models:
Table 1: Model progressions
Previous model GPT-5 model GPT-4o gpt-5-main GPT-4o-mini gpt-5-main-mini OpenAI o3 gpt-5-thinking OpenAI o4-mini gpt-5-thinking-mini GPT-4.1-nano gpt-5-thinking-nano OpenAI o3 Pro gpt-5-thinking-pro
The answer is that the gpt-5 model is composed of the gpt-5-main model, the gpt-5-thinking model and a router that selects the model to send the prompt to:GPT-5 is a unified system with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use...
The same applies to the mini model. Gpt-5-mini is made of a gpt-5-main-mini model, a gpt-5-thinking-mini model and a router. The nano model only seems to have a thinking variant, not a main, but this makes sense as a single model without a router will allow the model to be faster. This leaves only the gpt-5-thinking-pro model, which cannot be used via API, only via ChatGPT, with a Pro subscription:
In the API, we provide direct access to the thinking model, its mini version, and an even smaller and faster nano version of the thinking model, made for developers (gpt-5-thinking-nano). In ChatGPT, we also provide access to gpt-5-thinking using a setting that makes use of parallel test time compute; we refer to this as gpt-5-thinking-pro.
-
๐ [Link] GPT-OSS
Just like Sam Altman hinted at a while ago, OpenAI just released two open-weight models trying to appease the common criticism of being a company with "Open" in the name that hasn't released any open language models in a long while (since GPT-2!).
The new open-weights models (not open-source like the name seems to imply) are Mixture-of-experts models with:
- 116.83 billion parameters with 5.13 billion active parameters. It has 128 experts and activates 4 experts for each token.
- 20.91 billion parameters with 3.61 billion active parameters. It has 32 experts and activates 4 experts for each token.
Both models are reasoning models and therefore OpenAI compares them to their own o3 and o4 models. It seems like the 120b version is comparable to o4-mini and the 20b version is comparable to o3-mini. The new models have been throughoutly trained for agentic tasks as in the post-training stage, they were trained specifically to use a browser tool and a python code execution tool, as well as other generic tools.
OpenAI has also introduced a new tokenized specially for these new models called harmony. What stands out about this tokenizer from others is that it introduces a "channels" concept that allows the model to separate the output between user-facing text and internal-facing outputs. Another interesting concept that it introduces is the "system message", which differs from the already known "system prompt". The system message allows for configuration of dates like: "Knowledge cutoff: 2024-06", "Current date: 2025-06-28". It also allows to set the reasoning effort with "Reasoning: high". Finally, it also allows the configuration of channels and what are they used for and tools that the model can use.
A great feature of these models is that it seems that OpenAI has optimized them to be able to easily fit in a single H100 80GB GPU for the largest model and in a 16GB consumer GPU for the small one. This was achieved using MXFP4 quantization after training to 4.25 bits per parameter, which very significantly reduces the model size. While it is possible to natively train models in this quantization to reduce model quality degradation, it looks that in this case the quantization was applied after training.
You can easily start using these models locally using Ollama. I recommend downloading the 20b model that fits in a consumer GPU. It runs really fast in my Macbook!
[[ Visit external link ]] -
๐ [Quote] How we built our multi-agent research system
While reading Anthropic's great article "How we built our multi-agent research system", I stumbled upon this quote where Anthropic researchers present the results where they found that multi-agent systems outperform single-agents for complex tasks:
For example, when asked to identify all the board members of the companies in the Information Technology S&P 500, the multi-agent system found the correct answers by decomposing this into tasks for subagents, while the single agent system failed to find the answer with slow, sequential searches.
This makes a ton of sense to me. We know that LLMs do their best when the scope of the task they are given is as narrow as possible and when they have as much relevant context as possible. By using an orchestrator agent to decompose tasks and give them to sub-agents, we are effectively narrowing down the scope of the task, as well as slimming down the amount of context not relevant to the specific subtask that the sub-agent will do.
Another interesting finding from this article is that Anthropic claims that 80% of the variance of results in the BrowseComp benchmark can be explained by more token usage:
In our analysis, three factors explained 95% of the performance variance in the BrowseComp evaluation (which tests the ability of browsing agents to locate hard-to-find information). We found that token usage by itself explains 80% of the variance, with the number of tool calls and the model choice as the two other explanatory factors.
This also makes using multiple agents more optimal, because they can use more tokens (because they do so in parallel) more efficiently (because agents are less likely to hit a context window limit where the performance starts to degrade if the context is separated for each subtask). It is also in the best interest of Anthropic that you burn tokens at 15x (according to them) the token rate with multi-agent architectures, so they get paid more. So take this with a grain of salt.
I encourage you to read the whole article, as there are many very interesting tips for designing multi-agent applications.
[[ Visit external link ]] -
๐ [Link] Artificial Intelligence 3E: Foundations of computational agents
An agent is something that acts in an environment; it does something. Agents include worms, dogs, thermostats, airplanes, robots, humans, companies, and countries. Artificial Intelligence: Foundations of Computational Agents, 3rd edition by David L. Poole and Alan K. Mackworth, Cambridge University Press 2023
This is the definition I personally like the best for what agents are in the context of AI. Since the overuse of this word has left some of us confused on what it actually means, I would say that any application that uses AI and has the ability to act on its environment, through tools or function calling, is an agent.
[[ Visit external link ]] -
๐ [Link] AGI is not multimodal
A true AGI must be general across all domains. Any complete definition must at least include the ability to solve problems that originate in physical reality, e.g. repairing a car, untying a knot, preparing food, etc.
In this excellent article, Benjamin Spiegel argues that our current approach to building LLMs cannot lead to an AGI. While the current next-token prediction approach is really good at reflecting human-understanding of the world, not everything in this world can be expressed with language and not all valid language constructs are consistent with the world. Therefore, they are not actually learning world models, but just the minimum language patterns that are useful in our written mediums.
Multimodal models can be seen as solving this problem, since they unite multiple ways to see the world in a single embedding space. However, in multimodal models different modalities are unnaturally separated in the training process. Instead of learning about something by interacting with it via different modalities, two models are separately trained for each modality and then artificially sewn together in the same embedding space.
Instead of pre-supposing structure in individual modalities, we should design a setting in which modality-specific processing emerges naturally.
In conclusion, while LLMs are still getting more capable, those gains are already diminishing and might hit a wall soon. To build a general model that is not constrained by the limitations of human language we should go back to the drawing board and come up with a perception system that can seamlessly unite all modalities.
This article has also made me think about AI capabilities that are thriving today because they might not need to unite multiple modalities to form an understanding of that world. For example, programming. Software is built and executed in a digital environment and ruleset that can be easily encoded into plain text. I'm genuinely curious if you need to know about anything about how the world works, apart from just how programming languages can be used (and maybe architecture of the computer and networks), to be a good programmer.
[[ Visit external link ]] -
๐ [Quote] Hype Coding - Steve Krouse
There's a new kind of coding I call "hype coding" where you fully give into the hype, and what's coming right around the corner, that you lose sight of whats' possible today. Everything is changing so fast that nobody has time to learn any tool, but we should aim to use as many as possible. Any limitation in the technology can be chalked up to a 'skill issue' or that it'll be solved in the next AI release next week. Thinking is dead. Turn off your brain and let the computer think for you. Scroll on tiktok while the armies of agents code for you. If it isn't right, tell it to try again. Don't read. Feed outputs back in until it works. If you can't get it to work, wait for the next model or tool release. Maybe you didn't use enough MCP servers? Don't forget to add to the hype cycle by aggrandizing all your successes. Don't read this whole tweet, because it's too long. Get an AI to summarize it for you. Then call it "cope". Most importantly, immediately mischaracterize "hype coding" to mean something different than this definition. Oh the irony! The people who don't care about details don't read the details about not reading the details
I would summarize this sarcastic piece by Steve Krouse by reminding everyone that, while it's fun to try new technologies, it's important not to fall victim of the hype and always use the latest, shiniest, new thing for everything. Instead of choosing a tool based on the hype around it, and what people say can do or will be able to do, assess the tool objectively in your workflow. If it makes YOU more productive, by all means, use it. If it doesn't, don't worry, the fad will die down eventually.
[[ Visit external link ]] -
๐ [Link] OpenAI Codex CLI
Together with the launch of the o3 and o4-mini reasoning models, OpenAI has released a coding assitant for the terminal: Codex.
Codex is meant to be used with OpenAI models. You can use it to create new projects, make changes to existing projects or ask the model to explain code to you, all in the terminal. It can use multimodal input (e.g. screenshots). It also allows sandboxing your development environment to secure your computer. It also allows the use of context files,
~/.codex/instructions.md
for global instructions for Codex andcodex.md
in the project root for project-specific context.In
Full-auto
mode, Codex can not only read and write files, but also run shell commands in an environment confined around the current directory and with network disabled. However, OpenAI suggests that in the future, you will be able to whitelist some shell commands to run with network enabled, once they have polished some security concerns.You can install codex via npm:
[[ Visit external link ]]npm i -g @openai/codex
-
๐ [Link] GPT 4.1
After the unimpressive release of GPT-4.5 a month and a half ago, OpenAI is now releasing a new version - backwards. Today, they released three new models, exclusive to the API:
GPT-4.1
,GPT-4.1 mini
andGPT-4.1 nano
. In the benchmarks, GPT-4.1 easily beats GPT-4.5 at a lower price and higher speed. For this reason, OpenAI has said they will be deprecating GPT-4.5 in 3 months time.While this is a good step ahead for OpenAI, they are still a bit behind Claude and Gemini in some key benchmarks. In SWE-bench, GPT-4.1 gets a 55%, against 70% for Claude 3.7 Sonnet and 64% for Gemini 2.5 Pro. In Aider Polyglot, GPT-4.1 gets 53%, while Claude 3.7 Sonnet gets 65% and Gemini 2.5 Pro gets 69%.
On the other hand, GPT-4.1 nano offers a similar price and latency as Gemini Flash 2.0. If the performance of this small model is comparable to Gemini Flash, it can be a great option for simple tasks.
[[ Visit external link ]] -
๐ [Link] The Agent2Agent Protocol
Just in the middle of the year of agents, Google has released two great tools for building agents: the Agent2Agent (A2A) protocol and the Agent Development Kit (ADK).
The Agent2Agent Protocol is based on JSON RPC, working both over plain HTTP and SSE. It is also built with security in mind, it implements the OpenAPI Authentication Specification.
The agents published using this protocol will advertise themselves to other agents via the Agent Card, which by default can be found at the path
https://agent_url/.well-known/agent.json
. The Agent Card will include information about the agent's capabilities and requirements, which will help other agents decide to ask it for help or not.The specification includes definitions for these concepts, which agents can use to exchange between themselves: Task, Artifact, Message, Part and Push Notification.
This new protocol is not meant to replace Anthropic's Model Context Protocol. They are actually meant to work together. While MCP allows agents to have access to external tools and data sources, A2A allows agents to communicate and work together.
[[ Visit external link ]]