Double-Ended Bit-Stealing for Algebraic Data Types (ICFP 2024 - ICFP Papers and Events)

Track

ICFP 2024 ICFP Papers and Events

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 5 Sep 2024 14:06 - 14:24 at Green 1-2-3 - Memory Models / Memory Management / Low-Level Languages Chair(s): Stefan Monnier

Abstract

Algebraic data types are central to the development and evaluation of most functional programs. It is therefore important for compilers to choose compact and efficient representations of such types, in particular to achieve good memory footprints for applications.

Algebraic data types are most often represented using blocks of memory where the first word is used as a so-called tag, carrying information about the constructor, and the following words are used for carrying the constructor’s arguments. As an optimisation, lists are usually represented more compactly, using a technique called bit-stealing, which, in its simplest form, uses the word-alignment property of pointers to byte-addressed allocated memory to discriminate between the nil constructor (often represented as 0x1) and the cons constructor (aligned pointer to allocated pair). Because the representation supports that all values can be held uniformly in one machine word, possibly pointing to blocks of memory, type erasure is upheld.

However, on today’s 64-bit architectures, memory addresses (pointers) are represented using only a subset of the 64 bits available in a machine word, which leave many bits unused. In this paper, we explore the use, not only of the least-significant bits of pointers, but also of the most-significant bits, for representing algebraic data types in a full ML compiler. It turns out that, with such a particular utilisation of otherwise unused bits, which we call double-ended bit-stealing, it is possible to choose unboxed representations for a large set of data types, while still not violating the principle of uniform data-representations. Examples include Patricia trees, union-find data structures, stream data types, internal language representations for types and expressions, and mutually recursive ASTs for full language definitions.

The double-ended bit-stealing technique is implemented in the MLKit compiler and speedup ranges from 0 to 26 percent on benchmarks that are influenced by the technique. For MLKit, which uses abstract data types extensively, compilation speedups of around 9 percent are achieved for compiling MLton (another Standard ML compiler) and for compiling MLKit itself.

Link to Publication

https://elsman.com/pdf/icfp24main-p22-final.pdf

DOI

https://doi.org/10.1145/3674628

Bio

Professor in the Programming Languages and Theory of Computation (PLTC) section at Department of Computer Science, University of Copenhagen (DIKU). Conducts research in the design and implementation of programming languages, including compilation techniques for functional languages, parallelism, memory management, and program optimisation.

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 5 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

13:30 - 15:00	Memory Models / Memory Management / Low-Level LanguagesICFP Papers and Events at Green 1-2-3 Chair(s): Stefan Monnier Université de Montréal

13:30 18m Talk		Oxidizing OCaml with Modal Memory Management ICFP Papers and Events Anton Lorenzen University of Edinburgh, Leo White Jane Street, Stephen Dolan Jane Street, Richard A. Eisenberg Jane Street, Sam Lindley University of Edinburgh DOI Pre-print
13:48 18m Talk		A Two-Phase Infinite/Finite Low-Level Memory Model: Reconciling Integer–Pointer Casts, Finite Space, and undef at the LLVM IR Level of Abstraction ICFP Papers and Events Calvin Beck University of Pennsylvania, Irene Yoon Inria, Hanxi Chen University of Pennsylvania, Yannick Zakowski Inria, Steve Zdancewic University of Pennsylvania DOI
14:06 18m Talk		Double-Ended Bit-Stealing for Algebraic Data Types ICFP Papers and Events Martin Elsman University of Copenhagen Link to publication DOI
14:24 18m Talk		Beyond Trees: Calculating Graph-Based Compilers (Functional Pearl)Functional Pearl ICFP Papers and Events Patrick Bahr IT University of Copenhagen, Graham Hutton University of Nottingham DOI Pre-print
14:42 18m Talk		Sound Borrow-Checking for Rust via Symbolic Semantics ICFP Papers and Events Son Ho Inria, Aymeric Fromherz Inria, Jonathan Protzenko Microsoft Azure Research DOI

Double-Ended Bit-Stealing for Algebraic Data Types

Thu 5 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

Martin Elsman

University of Copenhagen

Tracks

Workshops

Co-hosted Symposia

Double-Ended Bit-Stealing for Algebraic Data Types

Program Display Configuration

Program Display Configuration

Thu 5 SepDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

Martin Elsman

University of Copenhagen

Thu 5 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change