The Formal Model article format: justifying modelling intent and a critical review of data foundations through publication

Documenting complex models has long been a problem. Models are currently developed, implemented, and applied before review. Combined this leads to details hidden in the appendices or too little detail in the methods section to be reproducible. Modellers involve reviewers too late in the process. This does not allow them to flag issues, suggesting redesigns and reruns only after the analysis is complete. We propose splitting the model documentation, before analysis, into three steps: the Formal Model, Implementation Documentation, and Evaluation and Testing. Researchers can then use the well-built model for analysis. We introduce the first of these, the Formal Model as a peer-reviewed paper format that lays out the intentions for the model. The Formal Model includes reviewed literature that identifies the components of the model. Lays out the theoretical framework, modelling approaches and externalities. Plans to implement each process, with equations, descriptions, state variables and scales. Finally, the Formal Model gives the model’s strengths, weaknesses, exclusions, and place in the literature. We provide a flexible template for a Formal Model to aid in establishing a new common format. The Formal Model aims to improve transparency and provide a formal approach to documentation. Reviewers can help improve the model by identifying problems early. The Formal model contains the details needed to allow for reproducibility.


Introduction
The Food and Ecological Systems Modelling Journal (FESMJ) aims to provide opportunities for modellers to showcase their work (Filter et al. 2019). They are therefore providing innovative new publication formats. One of these formats is the Formal Model. This normally is the first formal step in the development of research and application models. Until now, this first formal step has been included under implementation, documentation, testing and evaluation and/or model application articles. Typically, a model will be developed, implemented, and applied, then published in a research article with each step of the process covered within the article, or associated appendices (e.g., Chetcuti et al. 2021). This approach works well for small models, but quickly becomes unwieldy as models become larger, more complicated, and inter-connected. With the aim of improving the transparency and functionality of the documentation process, we introduce the idea of the Formal Model. Our article presents this format and provides a rationale for its use and some guidance on proper structure.
We see the Formal Model as a mechanism to achieve three main goals. The first is a formal approach to describe a modeller's intention for a model. Reaching this point is a considerable task, requiring extensive literature review, careful design, and ingenuity. This work often goes uncredited, with potential impacts on rigour and scientific quality (Holcombe 2019, Siepel 2019, Jombart 2021. Hence, the second aim is to give credit to modellers for their hard work and creativity. The third goal is related to improving the scientific process through peer review of modelling intentions before the model is used in research applications. This will allow reviewers to catch problems early in the process and provide valuable feedback to the modellers.
As such, we propose a three-step publication process: 1.
Evaluation and testing.
The Formal Model is the first step toward model development of systems models following a standard modelling cycle (Railsback and Grimm 2011). The next step is the implementation and documentation, which is also the subject of a new publication format in FESMJ. This implementation is also peer-reviewed documentation describing how the modeller has implemented the Formal Model in code and would include the full model description. The modeller can then create a third article to evaluate and test the models as implemented, describing changes in the Formal Model as a result and providing the final overall model documentation. Following this, the modeller or researcher may apply the model to a real case or use it theoretically for research in future articles allowing them to cite earlier steps rather than placing them, as is typical, as appendices.
Thus, the Formal Model occupies the first quarter of the first iteration of the modelling cycle ( Fig. 1). The Formal Model is a reasoned statement of intent and a model description separated from the model implementation in code. It should provide the first point of peer review allowing a discussion with reviewers before the final implementation. As such it should also act to limit potential criticisms of the final model later in the development process. Changes to the model required during testing or implementation would need to be documented clearly at the time.
Since FESMJ is the only journal that offers this particular set of paper formats for models, the journal will be in a position to publish all three, subject to quality control. However, publiation of these would not be mandatory. The situation where a formal model is proposed by an author but not followed up could provide the chance for others to develop the model, and this might even be indicated in the formal model paper if known in advance. The modelling cycle as it is often applied to ecological models. The Formal Model as described in this article occupies the top-right quadrant.

Scope and peer review
The Formal Model should document the breadth and depth of information used to create the planned model. This documentation may include information not used in the model but discarded for a particular reason. The overall scope of the document is to reveal the intentions of the modeller and the information used to realise those intentions. This process should then be opened to peer review before a lot of time and resources are used to progress the work further.
Currently, in modelling studies, the normal process is to construct a model, calibrate and test it, then apply it. Only after the last step a peer-reviewed publication is made. This often leads to reviewers questioning modelling decisions and even suggesting re-designs and re-runs of models after a great deal of work has already been done. In addition, within science there has been a criticism of the tendency to pass off post-diction as prediction and therefore invalidation of hypothesis testing, with analysis being redesigned on the fly or creating a hypothesis to fit results (Nosek et al. 2018). This has led to the proposal and design of "preregistration" (Nosek et al. 2018) and registered reports (Montoya et al. 2021). Preregistration allows for accountability and a living document that the modeller can update, but without peer review. The scientist registers the plans for an experiment and generates a Digital Object Identifier (DOI), which is referenced finally in the analysis paper to make clear what is pre-and post-diction. Without peer review, there is no chance to identify problems with experimental design. Registered reports with a journal allow these plans to be peer-reviewed and improved through the process (Crüwell and Evans 2021, Montoya et al. 2021, Wiseman et al. 2019. It can also be used to encourage the publication of negative results as the journal in which the report was reviewed agrees in principle to publish the results (Parker et al. 2019). This works because the design and the experiment are tightly constrained (Crüwell and Evans 2021).
Preregistration has been adapted for modelling Evans 2021, DeHaven et al. 2020) but the format would not work for a complex model design. The modeller may design and then iteratively implement, calibrate, and then use the model for a host of applications and scientific analyses. Only at this final stage of scientific analysis would either preregistration or registered report be applicable to use a tightly constrained model. Setting up a hypothesis and testing, with the model and then potentially repeating. However, it is possible to break the modelling cycle down into different steps. A review of the concepts for the model and their implementation, followed by validation, and then further steps of using the model. The Formal model we are proposing is the first step in this process, a paper that describes the concepts, refined through peer review.

Other approaches
The problem of documenting model building is not new, and others have suggested various approaches to overcome this. Initially, well-commented code was a good starting point, and this has been developed into detailed documentation e.g., using Doxygen (e.g., JSBSim (Vogeltanz and Jašek 2015)). However, this tackles the implementation of the model, not the model structure and aims. The Overview, Design concepts, and Details (ODD) protocol gives a means of describing an individual-based model (Grimm et al. 2006) and this original version was separated from the implementation. ODdox (Topping et al. 2010) links implementation and the overview and details of ODD together with the code. The later version of ODD (Grimm et al. 2020) also moves towards specifying some code implementation. The use of electronic notebooks for the development process of models has also been suggested (Ayllón et al. 2021). Within psychology preregistration has been adapted for mathematical and small computational cognitive models (Crüwell and Evans 2021). However, all these approaches cover a large part of the modelling process which for larger models cannot be comfortably accommodated in a single document. Here, we focus only on the Formal Model, without connections to the model implementation, and seek to represent the processes and variables driving the system under consideration.
Different scientific and business domains define formal models variously. The Formal Methods Model in software engineering is a precisely defined description of components and relationships in a complex piece of software, giving an overview for planning development. This spread to business in the form of formal process modelling (Minkowitz 1993). In mathematical sciences, a formal model is a mathematical proof that is precisely defined (and communicated) and gives replicable results. It is a precise statement of components and the relationships among those components. Versions of these definitions are found in all sciences from social sciences to engineering. What links them all is that a formal model is a formalised definition of the components of the system to be modelled which can be used to evaluate, design or build the actual model of the system. EFSA Panel on Plant Protection Products and their Residues (PPR) (2014) defined the formal model as those in which "model variables and parameters are defined and linked together into mathematical equations or algorithms". But this definition was based on two earlier steps, the problem definition, which sets the scene for the use of the model, and the conceptual model, which gives a qualitative general description of the system to be modelled. In our definition of the Formal Model, we include all three of the steps described by EFSA but formalise these with the intent to provide a standard yet flexible way to describe models across the range of disciplines contributing to FESMJ.

Proposed method
We propose the use of a Formal Model paper. The Formal Model will state the intent of the model, giving the aims and purpose. A review of the literature to identify the key components that will be needed for the model, including any theoretical framework, modelling approaches and externalities. The Formal Model gives an overview of the processes that will be in the model and describes them in terms of the current state of knowledge. It demonstrates how the modeller plans to represent each process, giving equations, descriptions, state variables and scales. Finally, the format includes a discussion on the model's strengths and weaknesses and places the model in a scientific narrative. This includes explaining how things not included within the current scope could affect the model or be incorporated in a future version.

When to create a Formal Model
The modeller should create a Formal Model as part of the modelling process. When the purpose of the model is defined, it forms the basis for the intent of the model. Thus, the Formal Model becomes a living document, to be added to and refined throughout the process. The modeller should complete the Formal Model before creating a finalized and documented versioned model. The completion occurs before final calibration and testing to allow the modeller to incorporate changes to the model from the review process. This three-tier approach (Formal Model, Implementation, and Evaluation and Testing) is best suited to any model that cannot be explained succinctly within the normal methods section of a paper. In fact models where this is the case or where the step from formal model to implementation are not suitable for this format and should probably combine formal model and implementation into a single article. Such models may include many decisions and assumptions that could, and often are, disputed during the final review of a model application. These decisions could also have an impact on the state of knowledge, policy or practice informed by the model. The Formal Model can be used for a diverse range of models, for example, social, agent-based, sub-population, behavioural, or food model processes. The underlying reasons for the Formal Model in all are the same: to avoid bias and to communicate the model structure, processes and background knowledge used to construct the model.
For example, agent-based models are simulations that are designed from the perspective of an entity, i.e., an agent (Macal 2017), individual, or super-individual (Scheffer et al. 1995. The models combine the action and interactions of the entities and processes, and emergent patterns then appear. These models can range from the microscale, for example molecular (Maestri et al. 2022) to the macroscale. What these all have in common is that even at their simplest each model has many processes, parameters, and assumptions that the modeller makes in designing and building the simulation. Thus, the biases of the designer can end up dictating the outputs. It is with models like these that the Formal Model can be particularly helpful by questioning the decisions of the design and suggesting alternatives.

What a Formal Model is not
Having stated what a Formal Model is, it is also important to state what it is not. The Formal Model is not suitable for very simple models. The model should have a volume and complexity that would make descriptions inside model application articles difficult, or might be a substantial sub-model that otherwise may not be described in detail. To allow for the feedback of peers and reviewers the Formal Model is an early formulation of the concepts before implementation, model evaluation, and testing. The formal model, therefore, does not include these next stages of model development. The Formal Model is the springboard to developing the code implementation before moving to further stages of the modelling cycle. Thus the Formal Model cannot include the experimental outputs and eventual documentation and evaluation as it could for a simple model. However, the implementation, evaluation and testing of models also have specific focus within the scope of FESMJ.

Structure of the Formal Model
What format should the Formal Model take? By defining a template, much in the way the standard scientific papers are structured including introduction, methods, results, and discussion (IMRAD), we will allow those examining the Formal Model to become familiar with what to expect (a strategy in common with other model description formats). This will make reading a Formal Model easier and aid in finding information when dipping in and out of a Formal Model. This prescription is necessary to give a common format, but there needs also to be a degree of flexibility to enable the Formal Model to cope with a variety of model types. We, therefore, propose the following structure and give a brief description of the content of each section (Fig. 2):

Introduction
An introduction like any standard paper, with the exception, that this does not lay out problems and hypotheses per se. Instead, this introduction would lay out the reasons for creating the model in question, the theory that supports the model and the modelling approach. It would of course lay out the salient literature defining the model overview and the theoretical framework. This is an overview of the template that provides a standard structure to the document.

Aim and purpose
This is in effect the problem formulation, explaining the aim of creating the model, why the modeller chose this model and approach, and in what ways could the model be used.

Theoretical framework and modelling approach
Theoretical frameworks are taken from the relevant scientific domain and describe the perspective of the research as defined by the theories that the work is based in (Collins and Stockton 2018). For example, within ecology, stating that the model subscribes to "foraging theory" or "metabolic theory of ecology", or "Metacommunity theory" for subpopulation modelling (Scheiner and Willig 2007). The idea is to lay out the discipline, perspective of the model, and any theories which are fundamental to the uppermost levels of the model. Here the author should finally define the approach that the model will use. For example, the simulation approach may use an agent-based methodology. Here a description of what is considered to be an agent-based methodology would be needed, why this approach is the best method to use and an explanation of how the current model fits this framework.

Framing the model
This section includes an overview of external influences on the model and model results, that are not explicitly included by the modeller in the model, but which will potentially affect the model outcome. This is in the form of a narrative explaining the things that we left out of the model knowingly. This process of framing the model is considered a way to avoid false inclusions and false-exclusion errors (Topping et al. 2015) and is based on a 'modest' approach to modelling that avoids making strong claims (Cilliers 2005). The aim of this section, is to be explicity about model limitations outside the scope of a normal uncertainty analysis, taking into account a much broader context. This section may be largely redundant for some models that describe detailed processes and act as sub-models for larger simulations.

Overview of the components and the connections
This section of the Formal Model would introduce the main components to be explained in detail under the subheadings that follow it. This overview describes the interconnections between the components from a high-level perspective. This will typically include an overall diagram of processes and connections and should thus serve as a roadmap to the details presented in the following sections. For larger models this section may be quite long and include multiple diagrams as necessary to provide the overview of the model from a structural and process point of view.

Process description
This section will be needed for each of the processes described in the overview. Here, each component is described in detail, including all relevant knowledge, state variables, and scale information necessary to understand the implementation of the process. Note that it is not necessary to follow each heading precisely, nor include them as headings, it is important that the information is present though:

Review and describe the current state of knowledge for each component
This section can be quite long and lists all the important references leading to the planned implementation. Typically, this will include tables and diagrams from literature used to develop the concepts applied in the model. In most cases this will include suggested starting values from parameters and process descriptions based on literature, although in some cases the model may be a set of equations without the need to specify parameter values.

Planned implementation of each component with a formal representation
The method of implementing the process in the model is described here in the Formal Model. This may be in form of a short text and equations or flow chart. For example, in the agent-based model ApisRAM (Duan et al. 2022, p. 16), metabolic activity was described thus: "Every bee consumes resources and generates heat according to its metabolic rate q, in units of kcal s . Each class of bee has a metabolic rate determined by its activity. The temperature increase is defined as where Q is the heat produced by burning q∆t of nectar, and s is the heat capacity of the bee."

State variables, spatial and temporal scales (with units)
The state variables associated with the process. State variables are variables describing the structure or quality of an entity or process. For example, age, size, growth rate, or parameter values. State variables like these should be described and any units included.
Similarly, time and spatial scales over which the processes operate should be described as appropriate for the process. In the above example the metabolic rate q has units of kcal per second, giving its temporal scale. In larger or more complex examples it is often useful to tabulate the parameter values. Since these parameters will typically be referenced in equations using symbols, its symbol, units, any predetermined value, and a short description of its meaning could all be usefully included in a

Model Properties and Behaviour
This section is optional but may be useful especially for the case when the model implementation would be or is simple or when the model forms a sub-component of a larger model. This would not be a full implementation with calibration and sensitivity analysis but would serve to explore the properties of the model behaviour to help elucidate the functioning of the model. This could also include trials of ideas in a simplified form or expected properties and behaviours. This section might include examples of output under controlled conditions used to demonstrate properties of the model or some of its components.

Discussion
This section will discuss any aspects of the model or model development that might be of interest to the reader, including lessons learned from the modelling process. Here, it might be relevant to discuss the coverage of information needed for the model development, highlighting gaps or inconsistencies. This section must cover the strengths and weaknesses of the model.

Enter subsection title
Enter subsection text

Discussion
An imperfect solution is better than no solution. The proposed Formal Model format is not a silver bullet and has drawbacks as well as advantages. The advantages are that it clearly defines the modelling intention and provides a relatively compact, but still substantial, article for evaluating this. It gives credit to the modeller for the amount of work necessary to craft the design and it provides a review of the existing knowledge for model creation. It also provides a chance to catch problems in the design earlier in the process than waiting until peer review of the model application, at least if review feedback is rapid. This is a major advantage compared to the current 'all-in-one' approach used in the majority of journal articles where overview, design and implementation are combined in a single step. This is of particular importance to the more complex simulations which otherwise drown in detail, resulting in voluminous and rarely read model descriptions. HC   Table 1.
Example of a The disadvantages include the fact that it is another burden for the modeller in documenting their work, in addition to normal documentation and user guides. Although the document is smaller than would be needed to combine all the description, implementation, and testing, the Formal Model can still be substantial (e.g., the ApisRAM formal model (Duan et al. 2022), although not in the new Formal Model format, is 58 pages long). This requires a certain level of commitment to produce. However, this downside is likely to be counter-balanced by the reviewed publication status of the Formal Model and model improvements through dialogue with reviewers. If its writing is included as part of the design process the Formal Model should require only a little extra effort to complete. Another problem might be that nobody reads this afterwards. Of course, as noted above, this goes equally for any other model documentation and is not an inherent Formal Model problem. In parallel to this, we may face a problem of low adoption, and thus no long-term improvement in the process. Again, the status as a separate article, and the fact that this article will be cited whenever the model is used, should increase visibility and impact. This ought to not only encourage people to read the article, but also modellers to prepare the article.
There is also the problem that many of the larger models that this approach is targeted at will undergo several model cycles as new information becomes available or new applications are needed. As such a reviewed and static document does not help and can even confuse the issue. This is where versioning Another drawback of the documentation process, also shared with all other forms, is that often journals, and journal referees prefer not to rely on secondary literature in the model application articles. This leads to a need to repeat information. A brief synopsis of the documentation will probably be included in related papers for readability, whilst deferring details in reference to the Formal Model. Ideally, wider use of the Formal Model and the other two model documentation formats we suggest will overcome this issue, as referees learn to use them. Preregistration and registered reports are gradually being accepted (Crüwell and Evans 2021, Montoya et al. 2021, Nosek et al. 2018) aided by journals that support open science. Opportunities for modellers to showcase their work and implement Formal Models is being initially supported by the Food and Ecological System Modelling Journal (FESMJ), but once established will, we hope, be supported by journals in other fields.
A key advantage of the format suggested is that it embraces the 'modest approach' to model construction (Cilliers 2005, Topping et al. 2015. This provides the option to the modeller of defining the externalities and their potential influence. It also gives the potential to argue for the level of detail chosen in the model design to avoid problems of false inclusions or exclusions. In this way, the Formal Model suggested here expands on the existing formal model frameworks, moving model documentation into a new broader scope.
In conclusion, complex models can result in publications with poor transparency. Modellers can leave information out or lose it in the supplementary material. Even when modellers have used a structured approach, they must decide what to include and how to implement a model. Reviewers often question these decisions after the models' use and analysis. The Formal Model approach we propose aims to address these issues. The proposed format will improve transparency, provide the opportunity to review and give the modeller credit for crafting models, all while improving the approach of the modeller. We hope that through use, this format can improve and be improved by the modelling community, ideally beginning a journey towards better models through improved documentation.