Hexo Labs opens SIA source code and lets an AI agent keep teaching itself
Hexo Labs presented an open source AI agent framework called SIA, or Self Improving Agent. It does not merely polish prompts and workflows. Its central claim is more ambitious: the agent analyses its own attempts, changes its working methods and, when needed, starts internal model fine tuning so it can handle the next cycle better.
A self improving agent no longer means just a better prompt
AI agent development has so far followed two main routes. In one, a developer or meta agent improves the system from the outside: prompts, tool choices, retry logic and search strategy. In the other, a team fine tunes the model itself using task feedback. SIA combines these two approaches in one loop.
The key term here, internal model fine tuning, means changing the neural network’s parameters during training. Put more simply, the agent does not only receive better instructions. The model itself learns from repeated mistakes and results. That separates SIA from agents that simply rewrite a prompt or add another step to the workflow.
Three agents drive the improvement loop
According to the official GitHub description, SIA works through three main components. The Meta Agent reads the task description and creates the first target agent. The Task Specific Agent solves the task and records its actions and results. The Feedback Agent reviews the logs, identifies weaknesses and decides whether to improve the agent scaffold or trigger internal model fine tuning.
That distinction matters technically. The agent scaffold shapes how the system searches for a solution, uses tools and checks results. Internal model fine tuning should add domain judgement that no prompt can simply write into the model. In the arXiv paper, SIA’s authors say using both levers together beat scaffold only improvement in all three tested domains.
The results look strong, but they need sober measurement
The SIA technical paper evaluated the framework on three very different tasks: charge classification from Chinese legal texts, low level GPU kernel optimisation and denoising single cell RNA data. That selection gives the system a broader test than a standard chatbot benchmark, because the tasks require different kinds of precision, experimentation and measurable output.
On paper, the results are sharp. According to the authors, SIA combined scaffold improvement with internal model fine tuning and beat the previous state of the art by 25.1 percent on LawBench. The GPU kernel ran 12.4 percent faster than the previous state of the art, while RNA data denoising improved by 20.4 percent. The GitHub repository also lists 70.1 percent Top 1 accuracy on LawBench, a 14 times speed up for the TriMul kernel against the baseline and an MSE_norm score of 0.289 in the single cell RNA sequencing task.
The “350 times” claim is a press line, not an independent verdict
Hexo Labs says in its press release that SIA speeds the path towards superintelligence by 350 times. That deserves caution. The company links the claim to OpenAI’s MLE bench, but this is not the same as independent proof that SIA actually takes AI towards superintelligence. The public material currently shows something more grounded: SIA offers a strong experimental framework for measurable development tasks.
OpenAI describes MLE bench as a benchmark for testing whether AI agents can perform machine learning engineering work: training models, preparing data and running experiments. It contains 75 Kaggle machine learning competitions and gives agents a much more practical target than ordinary question and answer benchmarks.
From Europe, transparency is the most important part
For European developers and research institutions, SIA’s main value lies in openness. The GitHub repository shows that the framework uses an MIT licence, is written in Python and can be used with custom tasks where public inputs, hidden evaluation data and the evaluator are defined separately. That helps researchers repeat experiments and check what the agent actually changes.
At the same time, internal model fine tuning brings greater responsibility. When an agent can alter its behaviour more deeply, the evaluation framework must also catch whether the system is becoming generally better or merely learning to exploit one particular metric. For that reason, SIA currently fits best in scientific and engineering tasks with clear measures, not in autonomous decision making with open ended goals.
Technical snapshot
SIA combines agent scaffold improvement and internal model fine tuning in one self improvement loop.
The system uses a Meta Agent, a Task Specific Agent and a Feedback Agent.
The tests covered legal texts, GPU kernel optimisation and single cell RNA data denoising.
According to the GitHub repository, SIA is open source, MIT licensed and built for Python 3.11+.
The large “350 times” claim comes from Hexo Labs’ press release, not from an independent audit.
SIA may not be a shortcut to superintelligence. What it does offer is more practical and, for now, more useful: a way to make AI agents show their work, learn from it and face the next test with something more than a better pep talk.