ICFP 2024
Mon 2 - Sat 7 September 2024 Milan, Italy

Authors with a paper conditionally accepted at ICFP 2024 are invited to submit an artifact that supports the conclusions of the paper. The Artifact Evaluation Committee will read the paper, explore the artifact, and provide feedback on how easy it would be for future researchers to build on. The ultimate goal of artifact evaluation is to support future researchers in their ability to reproduce and build on today’s work.

If you have a conditionally accepted paper at ICFP 2024, please see the Call for Artifacts for instructions on submitting an artifact for evaluation.

Call for Artifacts

The Artifact Evaluation Committee (AEC) invites authors of conditionally accepted papers to submit an artifact that supports the conclusions of the paper. The committee will read the paper, explore the artifact, and provide feedback on how easy it would be for future researchers to build on. The ultimate goal of artifact evaluation is to support future researchers in their ability to reproduce and build on today’s work.

The submission of artifacts for review is voluntary and will not influence the final decision of whether the paper itself is accepted. Papers with successfully reviewed artifacts will receive a seal of approval printed on the first page of the paper in the ICFP proceedings. Authors of papers with successfully reviewed artifacts are encouraged to make the artifact publicly available upon publication of the proceedings, by including them as “source materials” in the ACM Digital Library.

The artifact evaluation process starts when a paper is conditionally accepted. If a paper is rejected in the second round of reviews, the corresponding artifact will not be evaluated further.

Types of Artifacts

An artifact that supports the paper’s conclusions can take many forms, including:

  • a working copy of the software and its dependencies, including benchmarks, examples and/or case studies
  • experimental data sets
  • a mechanized proof

Paper proofs are not accepted as artifacts for evaluation.

Selection Criteria

Artifacts have two broad purposes: facilitating reproduction and reuse of the work by future scientists. Reuse goes beyond reproduction by allowing future scientists to, for example, extend a tool with new features or to inspect the exact definitions used in a formal proof.

To facilitate reproduction and reuse, an artifact should be:

  • consistent with the claims of the paper and the results it presents;
  • as complete as possible, supporting all claims of the paper;
  • well-documented;
  • future-proof;
  • easy to extend and modify.

Artifacts that satisfy these criteria will be awarded at least one of the ACM “Available”, “Functional” and “Reusable” badges. For more details on the badges and the evaluation criteria, see the Evaluation Guidelines.

We expect that most artifacts submitted for review at ICFP will have a few common forms: compilers, interpreters, proof scripts,/ and so on. We have codified common forms of artifacts on a separate page. If you are considering submitting an artifact that does not have one of these forms, we encourage you to contact the Artifact Evaluation chairs before the submission deadline to discuss what is expected.

Submission Process

The evaluation process uses an optional, lightweight double-blind system. Of course, authors will not know the names of reviewers. Authors are also encouraged (but not required) to take any reasonably easy steps to anonymize their submissions, and reviewers will be discouraged from trying to learn the names of authors. We do not intend to impose a lot of extra work here: anonymizing artifact submissions is an opportunity to help ensure the reviewing process is fair, but is in no way required, especially if the anonymization would require a lot of work or compromise the integrity of the artifact. There will be a mechanism for easy, double-blind communication between reviewers and authors, so that small technical problems can be overcome during the reviewing process. Authors may iteratively improve their artifacts during the process to overcome small technical problems, but may not submit new material that would require substantially more reviewing effort.

We expect that most artifact submissions include BOTH:

  1. Software installed into a virtual machine (VM) image provided by the committee. See this page for details.

  2. A separate source tarball that includes just the source files.

The intention is that reviewers who are familiar with certain tools (e.g. Agda or OCaml) can inspect the artifact sources directly, while reviewers who are less familiar can still execute the artifact without needing to install new software on their own machines. The VM image also ensures that future researchers, say in 5 years’ time, do not need to worry about version incompatibilities between old tool versions and new operating systems.

/The detailed submission process is as follows:

  1. Read the Submission Guidelines page for details on artifact preparation.
  2. Register your intent to submit an artifact on the separate artifact-only HotCRP site (to be linked).
  3. Prepare your artifact, building upon the base VM image.
  4. Upload your artifact to Zenodo (recommended), or otherwise make it available via a stable URL (i.e. the URL should not change if you later make updates to the artifact; and ideally, the URL has a good chance of continuing to exist well into the future).
  1. Finalize your submission on HotCRP with a link to your artifact. You should also upload a preprint of your paper, and any additional materials the reviewers may find helpful (e.g., appendices).

Timeline

The schedule for artifact evaluation is fairly compressed because it needs to fit between the notification of conditional acceptance and the camera-ready deadline. These are the key dates (all dates are in the Anywhere on Earth (AOE / UTC-12) timezone):

Event Date
Notification of paper acceptance (R1) Mon May 20
Artifact registration Thu May 30
Artifact submission Wed Jun 5
Start of review phase Sun Jun 9
Notification of paper acceptance (R2) Tue Jun 18
Further clarifications until Thu Jul 4
Final decisions sent to authors Fri Jul 5

Ideally, artifact evaluation is a collaborative process during which authors and evaluators work together to improve the artifact. We therefore recommend that at least one author of each artifact is available to respond to evaluators’ comments throughout the review period.

More Information

For additional information, clarification, or answers to questions, please contact the ICFP Artifact Evaluation co-chairs:

Most artifacts that are submitted for review at ICFP have one of a few common forms, so we provide some advice for authors about how to prepare them. This material should be taken as highly suggestive, but not prescriptive. If you have questions about what is expected, or if your artifact does not fit into any of the categories below, please contact the AEC co-chairs as early as possible.

Instructions for All Artifacts

Artifact Registration

Artifacts must be registered via a separate artifact-only HotCRP instance (to be linked) by May 30th.

For the registration, please write a short abstract of the artifact. This allows us to detect in advance artifacts that may have special requirements (e.g., specific hardware).

The submission of the full artifact is due on June 5th.

Sources and VM Image

Artifacts should generally consist of two components:

  1. A source tarball.
  2. A virtual machine image containing the same sources, with the artifact’s dependencies already installed.

The VM image primarily facilitates reproduction: it allows future scientists to reproduce the artifact’s results without having to deal with incompatible dependencies, changes to operating system interfaces, etc. The source tarball primarily facilitates reuse.

Both components must contain a Readme.md file that gives the name of the paper, the and step-by-step instructions for how to execute the artifact. For the source tarball, these instructions should include how to install the artifact’s dependencies.

If the artifact requires particular hardware or software, e.g., a specific platform (x86, ARM, RISCV, a specific GPU…), or a large amount of memory, or a specific operating system (Windows, BSD, …), these requirements should be written explicitly in the Readme.md file and the authors should also indicate these requirements in their submission on HotCRP.

The VM image may be produced by taking our base image, unpacking the source tarball into the VM and executing a prefix of the source tarball’s instructions.

Try to avoid installing graphical environments in the VM unless this is truly necessary. Graphical environments in VMs are sometimes slow and unstable.

Readme

In most cases the step-by-step instructions in your Readme.md should be a list of commands to build and test the artifact on the examples described in the paper, and to reproduce any graphs and benchmarking results. The instructions should call out particular features of the results, such as “this produces the graph in Fig. 5 that shows our algorithm runs in linear time”. Try to keep the instructions clear enough, so that reviewers can work through them in under 30 minutes. Consider providing a top-level Makefile, so that the commands to be executed are just make targets that automatically build their prerequisites.

If the build process emits warning messages, perhaps when building libraries that are not under the author’s control, include a note in the instructions that this is the case. Without a note the reviewers may assume something is wrong with the artifact itself.

Separately from the step-by-step instructions, provide other details about what a reviewer should look at. For example, “our artifact extends existing system X and our extension is the code located in file Y”.

Upload to Zenodo

Once you have prepared your artifact, upload it to Zenodo to ensure that it will remain publicly accessible in perpetuity. Similar publicly accessible, long-term archives are also acceptable (but not GitHub, Docker Hub or your personal website).

Anonymization

We use an optional, lightweight double-blind review process. This means you may, but are not required to, anonymize your artifact to improve the fairness of the reviewing process. See here for how to upload an anonymized artifact to Zenodo. We will ask reviewers to refrain from trying to find out artifact authors’ names.

Revised Papers

Artifacts should, if possible, be evaluated against the revised version of the paper. To facilitate this, you can upload the revised version of the paper, or a partially revised draft, when you submit the artifact. Please also add a note to the artifact’s README alerting reviewers to the revisions.

Instructions for Common Types of Artifacts

Command-line Tools

Unix command-line tools should have standard --help style command-line help pages. It is not acceptable for an executable to throw uninformative exceptions when executed with no flags, or with the wrong flags.

Compilers and Interpreters

It should be obvious how to run the tool on new examples that the reviewers write themselves. Do not just hard-code the examples described in the paper.

If your tool consumes expressions in a custom DSL, then we recommend supplying a grammar for the concrete syntax, so that reviewers can try the tool on new examples. Papers that describe such languages often give just an abstract syntax, and it is often not clear what the full concrete syntax is from the paper alone.

Proof Scripts

In most cases, the artifact VM should contain an installation of the proof checker and specify a single command (preferably make) to re-check the proof. It is fine to leave the VM itself command-line only, and require reviewers to browse the proof script locally on their own machines. It should not be necessary to have an IDE (e.g. CoqIDE, Emacs or VSCode) installed into the VM, unless the paper is particularly about IDE functionality.

Include comments in the proof scripts that highlight the main theorems described in the paper. Use comments like “This is Theorem 1.2: Soundness described on page 6 of the paper”. Proof scripts written in “apply style” are typically unreadable without loading them into an IDE, but reviewers will still want to find the main lemmas and understand how they relate.

Reviewers often complain about lack of comments in proof scripts. To authors, the logical statements of the lemmas themselves may be quite readable, but reviewers typically want English prose that repeats the same information.

Common Problems

This section discusses common problems with artifacts. If your artifact has any special requirements, please contact the AEC co-chairs well in advance. We will then discuss how the artifact can be best reviewed. The advice below has been distilled from past experience at a variety of events and does not describe specific papers, artifacts or authors.

Proprietary Software

It is reasonable for artifacts to depend on commercially licensed tools (e.g. MATLAB or some commercial SMT solver) if the paper’s audience would generally have access to these tools. In such cases, we will try to match the artifact with reviewers who also have access to these tools. If this is not possible, we will ask authors to provide an anonymously accessible environment in which the required tools are installed.

If parts of the artifact depend on a proprietary tool or proprietary data which cannot be made available to the reviewers at all, please contact the AEC co-chairs to discuss whether and how the artifact can still be reviewed.

Special Hardware Requirements

Some artifacts require special hardware, e.g., GPUs or a compute cluster. If the hardware is relatively common, e.g., an NVIDIA GPU, we will try to find reviewers who have access to the hardware. If not, it is usually still possible to (partially) evaluate the artifact:

  • Authors may provide a scaled-down version of the artifact (e.g., a benchmark) which can be executed on a commodity laptop.
  • Authors may provide anonymous access to an environment which includes the needed hardware (e.g. a compute cluster).
  • Authors may provide access to a simulator, e.g., for a specific FPGA.

Long-Running Computations

Some artifacts require extensive computations (on the order of days, not hours). In such cases, please provide a scaled-down version of the artifact which can be evaluated within a reasonable amount of time. If possible, make sure that the computation can be paused and resumed.

Unstable or Dangerous Software

Reviewers may be reluctant to install software that may destabilize their systems (e.g., kernel drivers) or that intentionally performs dangerous actions (e.g., proof-of-concept exploits). In such cases, document very clearly any possible effects on the host system and how to reverse them. If possible, prepare the artifact in such a way that the dangerous software is isolated from the reviewer’s system.

Web Interfaces

If your artifact has a web interface, try to get the server running locally inside the VM and allow the reviewer to connect to it via a web browser running natively on their host machine. Graphical environments installed into VMs are sometimes laggy and unstable, and standard web protocols are stable enough that such artifacts should be usable with any recent browser.

Programs that Generate Images

If the artifact produces an image file (e.g., a graph), then expect the reviewers to use scp or some such to copy it out to the host machine and view it. Authors should test that the connection to the VM works, so that this is possible.

Since 2022 at ICFP, artifact evaluation does not only make a binary decision between ‘artifact accepted’ and ‘artifact not accepted’, but also distinguishes between two levels of artifact quality. To facilitate this, we believe it is important to lay out in some detail what these levels mean, to guide reviewers, to ensure fairness and to help authors prepare their submissions.

However, this document cannot possibly capture all the nuances of what makes a good artifact. Any prescriptive statement should be read as a guideline, not a fixed rule. The ultimate aim is to produce artifacts that are useful to the research community and if that aim is better served by deviating from these guidelines, authors and reviewers should do so (if the deviation is substantial, though, please contact the AEC co-chairs to discuss it, the earlier the better). Our guidelines are also not necessarily complete: if an artifact satisfies all the criteria below but has other major issues, it may still be denied the corresponding badge.

Badges

Artifacts can earn three badges (familiar from other conferences):

  • Artifact functional
  • Artifact reusable
  • Artifact available

Functional and Reusable Badges

We award two badges for quality: functional and reusable. To be deemed reusable, an artifact must also fulfill all the criteria for functional artifacts. The difference between the two badges is, roughly, the difference between reproducibility and reusability:

  • Functional artifacts allow future scientists to confirm that any claims from the paper which are supposed to be supported by the artifact are in fact supported by the artifact. For example, if the artifact is a program, reviewers must be able to build it, to run it and to confirm that it yields the expected results.

  • Reusable artifacts go beyond reproduction by enabling future scientists to build upon the artifact. For example, programs should run on modern operating systems and have up to date dependencies.

In the following, we spell out our expectations for the two badges in more detail.

Functional Badge

Consistency and Completeness

The artifact should directly implement or support the technical content of the paper (consistency). It should validate any claims made in the paper about the artifact or, if there are no explicit claims in the paper, any claims that one would expect to be validated (completeness).

  • For programs, the program should work as described in the paper. The program may be an extended version of the one described in the paper, but all examples discussed in the paper should run with at most minimal and clearly documented changes.
  • For benchmarks, results obtained by running the benchmarks should be consistent (within the expected variance) with the results reported in the paper. All graphs, tables, etc. should be reproducible. Exceptions are possible, e.g,. when a benchmark takes a very long time to run. We expect such exceptions to be clearly documented.
  • For formal proofs, the proved statements should match those from the papers. Axioms or incomplete proofs are acceptable if they are documented in the paper.

Exercisability

It should be possible to reproduce the artifact’s contribution in any commonly used environment. For executable artifacts, this requirement is satisfied if the artifact is packaged as a VM image containing all relevant software and data sets (as described in the submission guidelines). If external data are required, it should be clear how to access them.

This requirement does not apply to artifacts which necessarily require a non-standard environment, e.g., special hardware or large amounts of computing power. But if at all possible, the artifact should still allow reviewers to partially verify the artifact, for example by

  • providing simulators for special hardware;
  • providing anonymous remote access to special hardware or compute clusters;
  • providing downscaled versions of the artifact which can be run on standard hardware.

Authors of such artifacts should contact us already before the submission to discuss these issues.

Documentation

The artifact should contain sufficient documentation for reviewers to perform the activities mentioned above.

  • For programs, it should be clear how to build the program and how to run it on the examples provided in the paper.
  • For benchmarks, it should be clear how to run the benchmark and how to interpret the resulting data.
  • For formal proofs, it should be clear:
  1. how to check that the proofs are axiom-free;
  2. which parts of the formal proof correspond to which theorem in the paper;
  3. how the notation and definitions used for the formal proof correspond to those used in the paper.

Reusable Badge

Reusable artifacts should satisfy the following requirements in addition to the functional ones. The reusable requirements are necessarily fuzzier than the functional ones. We rely on reviewers’ good judgment to determine whether an artifact can, in fact, be reused by future scientists. However, we ask reviewers to be lenient with regards to the more work-intensive requirements. It is generally easy to find fault with the documentation and code quality of research projects, but artifacts should be evaluated against the state of the art, not against a theoretical ideal of perfect software.

Exercisability

The artifact should work not only inside the VM image, but also in other standard environments. This means:

  • The artifact’s dependencies should be reasonably up to date. The artifact should not unnecessarily depend on specific drivers, instructions sets, etc.
  • The artifact should not depend on undocumented changes to, for example, the operating system or dependencies.
  • The artifact’s packaging should facilitate reuse (e.g., a Coq library could be packaged as an opam package).
  • The artifact should give reasonable error messages if it cannot be executed in some particular environment.

Documentation

The artifact should be documented in a way that facilitates reuse. This means:

  • There should be install instructions for all supported operating systems. Dependencies should be clearly documented.
  • For programs, it should be clear how to run the program on inputs other than those from the paper. For example, for a compiler, the concrete syntax of the input language should be documented. Any options to the program should be documented. The main parts of the implementation should be documented to a reasonable degree. It should be clear how to run the test suite (if any).
  • For benchmarks, it should be clear how to run the benchmark on inputs other than those from the paper and how to prepare such inputs.
  • For formal proofs, the main parts of the proof (key lemmas and definitions) should be documented, especially if the notation differs from that used in the paper.

Quality

The artifact should be of sufficiently high quality that future scientists could reuse it without major changes. For example:

  • Code should be reasonably clear and consistently formatted.
  • The build process should be reasonably simple.
  • Error messages should be clear enough to facilitate debugging.
  • Web interfaces should work with all modern browsers.

Available Badge

The Artifact available badge is awarded to artifacts that meet at least the quality criteria of the functional badge and that are additionally stored in a long-term, publicly accessible archive. The archive must have a plan for permanent accessibility along the lines of Zenodo’s or FigShare’s policies. The submission guidelines contain instructions on submitting artifacts to Zenodo. Other archives are also accepted, but they must fulfil the two above criteria:

  • Long-term archival: the archive must ensure that artifacts are available indefinitely. This excludes commercial offerings such as GitHub which make no long-term commitments, as well as personal websites.
  • Public accessibility: the archive must be freely accessible to the general public.

Artifacts should generally consist of two components:

  1. A source tarball.
  2. Either a virtual machine image containing the same sources, with the artifact’s dependencies already installed, or a docker image containing the same.

We provide a VM image that you may use as a base for your own VM image. If you wish to use this image, follow the instructions in the next section. If you wish to create your own VM image from scratch, follow the instructions further down below.

Creating an Artifact Using the Base Image

Download an archive containing the base image and some supporting files:

base-image.tar.xz (468M)

To verify the download, create a file sha next to base-image.tar.xz with content

215f77689884ed09d43d3713a0ec849c5d64cb4e240247f66d8bbfdef6b18e98256a556f35435121c9f2a8ebcbb29bf1eb2f85c0c5538611f7800aebf3e8b116  base-image.tar.xz

Then run the command

$ sha512sum -c sha

Unpack base-image.tar.xz. This requires about 2.2GB of disk space. The unpacked directory contains a file README.md with further instructions.

The base VM image uses a virtual hard disk with a maximum capacity of 16GB. (Its actual size expands dynamically up to this maximum.) If you need more disk space, please create a custom VM image.

Creating an Artifact Using a Custom Image

Download an archive containing the supporting files. (These are the same files contained in base-image.tar.xz, minus the actual VM image.)

supporting-files.tar.xz (6K)

To verify the download, create a file sha next to supporting-files.tar.xz with content

7c82468ba2a6dc481048a1b6a1ba6def13db3f2dff0dacecd643f0a44721726f4fe6d0c5ce91c65325c4fa6146990376ed2bd80197bfc9cfd0b3dfd86847ca88  supporting-files.tar.xz

Then run the command

$ sha512sum -c sha

Unpack the archive. The unpacked directory contains a file ImageCreation.md detailing the steps we took to create the base VM image. When creating your own image, please follow these steps as closely as possible (while making any modifications you need). This makes sure that the reviewers can run your VM image without too many surprises.

Adjust the supporting files as necessary for your custom VM image. Check that the start.sh and start.bat scripts still work. Add a prominent notice for the reviewers to your README.md.

M1 Macs

This year we adapted the script start.sh to work on Apple Silicon M1 macs. However, you must use macOS >= 12.4 to run this script. Earlier versions of macOS contain a bug which will lead to a kernel panic on certain M1 chips (at least M1 Pro and M1 Max).

You can also use the UTM app as a graphical QEMU frontend. Create a VM with the following settings:

  • 4GB of RAM
  • Use disk.qcow as the virtual hard drive.
  • Forward port 22 of the VM to port 5555 on the host (if you want to SSH into the VM).

Specifying number of cores

In case you need to rely on more than one core in your VM, you can specify the number of cores as extra parameters when launching the start.sh or start.bat script, e.g., start.sh -smp 2 will launch QEMU with two available cores.

Questions? Use the ICFP Artifact Evaluation contact form.