Why is it difficult to detect bugs in agent-based models?
Rob Axtell, in his 2000 paper "Why agents? On the Varied Motivations for Agent Computing in the Social Sciences," attributes the existence of what he calls "artifacts" (program behavior that is not a part of the model being created, but a byproduct of a coding decision which was intended only to implement the model, but actually did something else as well) "partially" to the fact that, in agent models, a small amount of source code controls a large amount of "execution" code. As an example, he offers a system where millions of agents may be created and which might occupy up to a gigabyte of memory, even though the source code for the program is only hundreds of lines long.
But this explanation cannot be right, because the causal factor he is talking about does not exist. In any reasonable programming language, only the data for each object will be copied as you create multiple instances of a class. The functions in the agent-object are not copied around again and again: they sit in one place where each agent "knows" how to get to them. What causes the huge expansion in memory usage from the program as it sits on disk to the program running in RAM is the large amount of data involved with these millions of agents: each one has to maintain its own state: its goal, its resources, its age: whatever is relevant to the model being executed.
So what we really have is a small amount of code controlling a large amount of data. But that situation exists in all sorts of conventional data-processing applications: A program to email a special promotional offer to everyone in a customer database who has purchased over four items in the last year may control gigabytes of data while consisting of only a few lines of source code. So this fact cannot be the source of any additional frequency of artifacts in agent-based models.
So what is really going on here? (And I have no doubt that something is going on, since Axtell's basic point that we have to take special care to watch for these artifacts in agent-based models is surely correct.) I have done both traditional IT-type coding and agent-based modeling, and here is what I think is the main difference between the two in terms of the production of these artifacts: artifacts in both cases are the result of programming errors, but in the latter case, when you don't know what your output should be, it is very hard to distinguish them from interesting and valid results.
In most traditional data processing, it is easy to say just what the result should be: they are what your user told you they should be. (This "user", of course, may be a multitude of users, or even an imagined multitude of users you hope your new product will appeal to.) If you were asked to write the above program, that will email a special offer to customers who purchased over four items in the last year, it is easy to tell if your program is working: did those customers, and only those customers, receive the promotion, and only the promotion? Although you were writing the program to save going across the million customer database records by hand and generating the emails, you can easily select a small portion of the database and check your program by hand against that portion. If it is working for that portion, and it is a representative sample, you can assume it will work across all records. Or you can automate that process itself with a test suite, which contains a certain number of cases with known correct output, that your program's results can be checked against. (Of course, even this testing does not ensure the absence of bugs: there may be special cases in the full database that we omitted from our test data. Perhaps, for instance, for some customers, multiple orders were rolled into one for shipping purposes. The intention of the marketing department might be to still send them the special offer, but if we missed putting any such customers in our test cases, we may not detect that our code fails in these instances.)
But at least for Axtell's third type of agent-based model, the very reason we are writing the program is that what we don't know what the results of running it ought to be. We are using the program to explore the implications of the model, in a case where we don't know beforehand what those implications will be. This is not fundamentally different from what Ricardo did with his model farm (see Mary Morgan, The World in the Model, on this point), but while Ricardo was limited to using a limited number of simple cases where he could do all the necessary calculations by hand, by using a computer, we can easily test millions of more complicated cases.
We hope our code implements our model, and only our model. But we can easily make a mistake through a seemingly innocuous coding decision: for instance, as Axtell notes, the order in which agents act can be important in many models. If we write a simple loop proceeding from agent 1 agent N, we may give the agents earlier in our list a decided edge in something like grabbing resources for their own use. We might have to randomize the order in which agents act in every "period" of the test run to truly capture the model. If we fail to account properly for this fact, we might mistakenly think that these agents had some superior resource-capturing feature, instead of realizing that they are only "rich" because we (arbitrarily) stuck them early on in a list of agents.
If I am correct about the main source of these artifacts, then what are we to do about the problem? Although I have just begun to think about this problem, I do have one suggestion already: we can do something similar to what is done in more traditional IT programming: examine small test cases. But since we don't know the "correct" output, the procedure to do so will be somewhat different. In our early runs of our system, we can use a very small number of agents, and proceed step-by-step through our run, with lots of debugging information available to us. This allows us to get an intuitive feel for how the agents are interacting, and perhaps spot artifacts of our coding choices early on.
But while this is a help, it falls far short of the kind of systematic checking of our code that we can achieve with test suites for more typical IT problems. Is it possible to create a more automated method of detecting artifacts? Well, at this point, all I can say is that I am thinking about it.