LLMs are changing programming, and may also be changing maths research forever. The underlying mechanism is similar to both.
Here's a question I've found enticing: What is the analogous vision for mechanistic models of natural processes?
This would be a big deal, because at the core of our scientific understanding of the world there's often a mechanistic model of reality. Scientific theories that lack a good mechanistic model often feel incomplete. Examples: thermodynamics before Boltzmann; evolution before evolutionary dynamics; quantum mechanics... now.
I think one third of the picture is already there. I have been a researcher building computational and numerical models for a few years now, and the coding agents I have tried are already good at writing custom PDE solvers, statistical models, etc. Granted, they probably do not have as much training data on these topics as they do in, e.g., frontend dev or database queries, and their guesses of numerical models may not be as good yet. But there's no reason why future LLMs would not be able to get essentially perfect at this.
Similarly, given some data, there's no reason why they won't be able to pattern-match and generate interesting enough hypotheses about the underlying mechanisms. They are definitely able to retrieve from the literature the mechanisms that have been proposed for a given phenomenon, e.g., how to connect drought conditions to plant stomatal closure. Or even, given the shape of some data they could make an educated guess of whether its cause may be a first order phase transition, a Lotka-Volterra type of equation, or a type of cellular automaton.
So, we can conclude that the generative, pattern-matching part is or will be more or less solved.
The remaining two thirds is were things get interesting. What is the analogue to programming languages and proof assistants? How can we give LLMs a hard check for their (possibly wild) guesses? How do we stop thinking in a paradigm of model scarcity, where researchers need to invest lots of time in developing siloed models, and begin thinking in terms of model abundance?
I think it will be useful to make the following distinction. On the one hand, in programming and math there are strict validity checks. These are things like software tests (in programming), the agreement with the rest of mathematical theorems (in math), and the opinion of the driving human (in both). In science, the analogue is clearly checking the model results with the data. The nicest fit, I think, would be to use Bayesian methods to, given some data, compute the posterior probability of each mechanistic model.
For this we would need to build a general-purpose automated
end-to-end Bayesian framework. Which is, I think, within our reach, and
a great improvement even if we don't use LLMs ever again because we
discover that they were actually exploiting underpaid graduates who
typed really fast. For us scientific modelers, this would be upgrading
our lab facilities: instead of each one of us glassblowing our own
statistical erlenmeyers for each experiment, we could take one of the
shelf and test our model. The process should be as easy as "here are the
inputs and outputs of my model, with this and that parameter estimates,
and here's the data. How well does it fit?", without, crucially,
shooting ourselves in the foot with some blatant mistake. Software like
PyMC and numpyro bring this vision a bit
closer, but there's quite a long way to go yet.
On the other hand, we have the formal encoding part, which is a bit more subtle. Mathematical statements and programming tasks (the former probably more so than the latter) have a direct encoding into a custom, formal language in that domain. Mechanistic models, however, only have it indirectly, through code. Here's what I mean by this. Lean is a language to "speak" mathematics; C is a language to "speak" algorithms. But where is the language to speak mechanistic models? Sure, I can write a mechanistic model in C, but the relevant design questions are buried in the numerical implementation. I cannot specify initial and boundary conditions, conserved quantities, units, timesteps, spatial dimensions, or the causal graph between different variables. The language I would like the LLM to speak should be able to encode all this information. The main reason for this ties back to the previous point about automatic evaluation, because that is the level of abstraction where the strictness lives in mechanistic models. If units are incompatible, timesteps mismatched, conservation laws broken, or causality is violated, that mechanistic model is wrong, not just statistically unlikely.
Reading the above, I notice that I might come across as more optimistic about the future than I really am. The truth is that I am not convinced that automating iterations on mechanistic models will bring us closer to better scientific theories. Let me stress: it feels impossible for the "industrialized mechanistic science" that I described to be able to deduce general relativity from mere datapoints of the precession of Mercury's perihelion. Also, thinkers like David Deutsch are opposed to the idea that we can put scientific theories in a Bayesian framework and assign them a probability distribution, which would invalidate my structure above.
I am no scientific epistemologist, and I don't intend to play one on the internet. Instead, let's talk about a more pedestrian area of science, which happens to be my area of study, and see how fragments of my vision could move the needle there.
-- TO BE CONTINUED --