Most of what we call biomedical AI today is a story about borrowing.
A model built to predict the next word in a sentence is adapted to read clinical notes. A model built to recognise objects in photographs is adapted to read pathology slides. A model built to autocomplete protein databases is adapted to propose new molecules.There is something slightly absurd in all this borrowing: the architecture behind chatbots, image generators, and protein-design tools is, under the hood, close to the same machine pointed at different data. The borrowing works often enough that it has become the default move, and for good reason: pretrained representations transfer, and biomedical datasets are usually too small to train large models from scratch.
This is real progress. It would be a mistake to dismiss it, and HASU does not.
But there is a quieter cost in all this borrowing, and it is the thing HASU exists to study. When you hand a general-purpose model a genome, a patient record, or a molecule, you have already made a decision about what kind of object it is. You have decided it is a string, or an image, or a table. That decision is usually implicit. It is also, in many cases, wrong in a way that matters.
Messy is not the same as structureless
Biomedical data is often described as messy, and the description is partly fair. Clinical records mix free text with missing values and irregular timestamps. Genomes are large and repetitive. Wearable sensors produce long, noisy time series. Imaging varies by scanner and operator.
But messiness is not structurelessness. Each of these data types is messy in a specific way, and the specificity is the signal, not the noise.
The irregular timing in a clinical record is irregular because illness does not run on a schedule. The long-range correlations in a genome reflect evolutionary history. The spatial patterns in a histology slide reflect how tissue is organised. The fluctuations in a wearable signal sit on top of circadian rhythms driven by biology.
A genome is not a text document. A clinical trajectory is not an image. A molecule is not a time series. When we treat these objects as interchangeable inputs to a sequence model or a vision backbone, we are flattening them into a shared shape, and the flattening can discard exactly the structure that makes the data scientifically meaningful.
Some concrete cases make this less abstract.
A genome encodes the outcome of mutation, recombination, selection, and drift. At any position, the sampled copies share an ancestral history that can be drawn as a tree, and because of recombination, that history changes as you move along the chromosome.Recombination is the reshuffling that happens when chromosomes are passed on, so any chromosome you carry is a patchwork stitched from different ancestors. Walk along it and you cross seams where the ancestry changes. The string of bases is one view. The changing local genealogies are another, and for questions like demographic inference or detecting selection, the genealogical view may be the more natural one.
A clinical trajectory is not a bag of diagnosis codes. Symptoms appear, tests are ordered, diagnoses are revised, treatments change, outcomes follow. The order and timing carry causal meaning. Two patients with the same set of codes, in different sequences, can have very different clinical pictures, and a model that embeds them identically has lost that distinction.Choosing a representation is also choosing which invariances to impose. A model that ignores the order of events in a clinical timeline is treating time as irrelevant. That is a hypothesis about the data, not a free assumption.
A protein is a one-dimensional string of amino acids, but its function depends on a three-dimensional shape: which residues sit near each other in space, which surfaces are exposed, how a drug fits into a pocket. A model that sees only the sequence has to rediscover that geometry from the statistics of co-evolution. It can do this, partly. It pays for it.
The pattern repeats for tissue architecture, for wearable physiology, and, as we will argue below, for the scientific literature itself.
Why borrowed architectures take us only so far
None of this means the borrowed models are wrong or obsolete. They are useful, and in many settings they are the right tool. The point is narrower: a model handed a flattened object must recover the discarded structure indirectly, from the statistics of the flattening, and that recovery is not free. It costs data, it costs interpretability, and it sometimes costs robustness when the data shifts.
There is a deeper version of the same point. Most of the recent excitement in foundation models has been about scale: more parameters, more tokens, more compute, applied to the same basic tokenisation of the same basic object. That has produced a great deal. But scaling a tokenisation is not the only axis available. In some cases the more useful move may be to change the primitive object being represented in the first place, so that the structure a model needs is present in its inputs rather than something it has to reconstruct.
This is the representational question that sits underneath HASU’s work. Before asking how large a model should be, or which architecture to use, we think it is worth asking a more basic question: what scientific object is the model actually trying to represent, and have we given it that object or a flattened shadow of it?
Introducing HASU
HASU is a public research initiative working on structure-aware biomedical AI.
We are interested in how biomedical AI systems should represent scientific objects before they model them. That covers genomes and evolutionary histories, clinical trajectories, molecular geometries, tissue organisation, wearable signals, and the structure of scientific evidence itself. In each case the underlying conviction is the same: the structure of the object is a claim about the world, and building it into a representation is a scientific decision, not just an engineering one.
HASU is not a product and not a company announcement. It is a programme of research conducted in public, around a cluster of connected questions: representation, structure, evidence, biomedical foundation models, and scientific knowledge systems.
Our working standard is demanding, and we want to state it plainly so it can be held against us. A structure-native representation has to earn its place. It is not enough to gesture at biological realism. The test is whether representing the object faithfully produces models and methods that are more transferable across related tasks, more interpretable, more robust when inputs are perturbed or partly missing, and more scientifically legible, meaning their internal features connect to concepts a domain expert can inspect, challenge, and refine.
That standard cuts both ways. Sometimes the structured representation will win. Sometimes a strong sequence-only or feature-only baseline will be just as good, and the extra machinery will not be worth it. We are committed to reporting both outcomes.
Where we are starting
A research programme is easier to describe than to begin, so it is worth being specific about what we are actually building first.
Our starting point is a project called HASU Claim.
HASU Claim is concerned with the structure of scientific evidence over time, with how biomedical claims, the evidence behind them, and the knowledge structures they form can be represented, audited, and studied as they change. It is a named project inside the wider HASU programme, not a thesis statement about HASU, and we will introduce it properly in its own note rather than trying to compress it here.
The motivation is something most researchers feel intuitively. Scientific papers do not simply deposit isolated facts into a shared pool. They transmit claims, modify them, qualify them, contradict them, and recombine them.Drawing family trees for texts is older than genomics. Scholars have reconstructed how medieval manuscripts were copied by treating scribal errors like mutations, including a published phylogenetic tree of Chaucer’s Canterbury Tales. A claim such as “sleep irregularity contributes to depression severity” might begin as a statistical association in one literature, become a predictive feature in a digital-phenotyping study, get reframed as a mechanistic hypothesis about circadian disruption, and later be narrowed by population, measurement instrument, or study design.
Ordinary literature review can summarise that trajectory. What it does not usually do is treat the trajectory itself as a structured object worth modelling, one with provenance, with directions of borrowing between fields, and with explicit distinctions between an association, a prediction, a mechanism, and an unsupported causal flourish. A pile of PDFs flattens all of that. The evidence base has a shape, and HASU Claim is an attempt to take that shape seriously.
This is the same instinct that runs through everything else here. A genome flattened into a string, a patient flattened into a code list, and a literature flattened into a bag of documents are three versions of one mistake.
After the HASU Claim note, the next thing we expect to share is early output from the system itself: examples of extracted and reconstructed claim structures, with their provenance and their uncertainty attached. We want to be careful about how those are read. They will be preliminary results, early evidence and system behaviour on real material, not settled benchmark numbers. Showing where the system is plausible and where it visibly breaks is part of the point, because a tool for auditing evidence has to be auditable itself.
A later direction: evolutionary foundation models
Further along, the same way of thinking leads somewhere more speculative, and we want to introduce it carefully, as a question we think is worth asking, not as a system we have built.
A genome, as we noted earlier, is usually written as a string, and that view is genuinely useful: it underpins much of modern genomics. But a genome is also a record of copying, mutation, recombination, and shared ancestry. The letters we read today are traces left by events that happened long before anyone sampled the DNA. A genome is both a sequence and a historical document.“History” here does not mean a story we know perfectly. It means a process whose traces can be modelled, inferred, and sometimes misread.
Most genomic AI learns from the letters. Genomic language models hide a base and predict it, much as a text model predicts a missing word, and this has produced useful work on regulatory prediction, variant interpretation, and sequence design. It is, in the terms we opened with, borrowing again. The recipe that works on text, pointed at DNA, has carried the field a long way. The language analogy is productive, but DNA has no words, no grammar, and no speakers. What it has is biochemical function, evolutionary constraint, and population history. Evolution, not grammar, generated the training data.
So the question we keep returning to is not whether sequence models are wrong. They are not. It is narrower, and we think more interesting: do some problems become easier when a model represents the process that produced a sequence, rather than rediscovering that process from the surface letters alone? Taken seriously, that question points toward what we have started calling evolutionary foundation models: the idea of a foundation model for genomes grounded in evolutionary history rather than surface sequence. We want to hold the phrase loosely. For now it names an ambition and a research question, not a system we have built or a design we are ready to describe.
We are wary of the phrase precisely because it is the kind of thing that is easy to say and hard to earn. It does not imply that a larger model will solve population genetics, and it carries real difficulties we have no wish to wave away, not least that the history behind a sequence is itself something you have to infer, imperfectly, from the same data. Anything built on that footing could as easily learn the quirks of an inference procedure as anything about biology.
Which is why, if we pursue it seriously, it will have to be an evaluation question before it is anything else. The interesting outcome is not a single improved benchmark; it is whether a representation of this kind transfers across related problems, holds up when its inputs are perturbed, and earns its complexity against strong, simpler baselines. And if it does not, that is a result too, and an informative one. Perhaps the idea turns out too noisy, or the simpler tools already capture the useful signal. We will say so when we know.
How HASU works in public
A research initiative is partly a set of questions and partly a set of habits. Ours is built around working where it can be checked.
We will publish project notes that set up problems before we have solved them, technical claims with their reasoning exposed, and preliminary results that are clearly labelled as early rather than final. We will also publish negative results and failures, because in this kind of work a clean negative (this representation did not help, here is why) is often more informative than another marginal gain on a saturated benchmark.Most fields publish their wins far more readily than their dead ends, a habit known as the “file-drawer problem.” There is a small irony in HASU Claim studying exactly that distortion in the literature, so the least we can do is not add to it.
That means strong baselines reported honestly, uncertainty quantified rather than hidden, and provenance kept attached to outputs so a reader can trace where a result came from. None of this is glamorous. It is the difference between scientific AI that is merely accurate and scientific AI that is legible enough to build on.
We are early, and we would rather be early in public than polished in private. We opened by calling biomedical AI a story about borrowing; we meant it as description rather than complaint, and HASU borrows too. What we are after is narrower: to make the borrowing a choice rather than a reflex, and to notice the places where the science deserves its own object instead of a borrowed shadow of one. The sequence of work ahead starts with HASU Claim and the evidence it produces, and reaches toward the harder genomic questions over time. If even part of it holds, it points toward biomedical models whose representations connect to reality in ways that can be inspected, argued with, and improved, which is the only kind we think is worth trusting with science and, eventually, with patients.
The first project note follows shortly. We would rather show you the work than describe it, so that is where we will go next.
