Data Model of a Programming Language

The data model of a programming language is the framework that describes the language itself: the set of protocols, special methods and contracts that the language’s runtime understands, and which user-defined objects can implement to participate as first-class citizens.

In Fluent Python 2e, Luciano Ramalho talks about Python’s data model in exactly these terms: the foundation that determines how we work with the language, and how we make our own objects work well with the most idiomatic language features.

When we learn JavaScript, .NET languages, C++, or Python, much of the time we pick up the idiosyncrasies of each language without truly understanding why a given pattern is considered the “best practice” for that language in that scenario. Reasoning from the data model gives that answer: the idiom is whatever co-operates most directly with the protocols the runtime already speaks.

Why It Matters

At the end of the day, it is the data model that influences, or even determines, the public API of the language. The same conceptual operation surfaces very differently depending on what the data model expects:

In Python, any object that implements __len__ works with the built-in len(). Implementing __iter__ makes the object usable in a for loop and in comprehensions. The “Pythonic” style is, in large part, the style that hooks into these special methods rather than re-inventing them. This is also what makes Python’s polymorphism structural rather than nominal: capabilities are acquired by satisfying protocols, not by inheriting from named types.
In JavaScript, the equivalent participation is via well-known symbols such as Symbol.iterator and Symbol.asyncIterator, plus prototype-chain conventions. The idioms differ because the data model differs.
In C++, operator overloading and the rules of the standard library (iterators, allocators, traits) play a similar role: a type that “looks like” an STL container behaves like one because it satisfies the same compile-time data-model expectations.

In each case, the surface syntax and the “best practices” are downstream of the underlying data model. Knowing the data model is therefore high leverage: it explains the idioms instead of forcing you to memorise them.

Relation to Other Language Characteristics

The data model is one of the characteristics that describes a programming language, alongside its data types, conceptual hierarchy, variance, and its position on the imperative-declarative spectrum. When learning an unfamiliar language, asking “what is its data model?” is often more productive than memorising syntax.

References

Ramalho, L. (2022). Fluent Python, 2nd Edition. O’Reilly.

Liwen's Notes

Recent Notes

NatWest Accelerator: Storytelling Workshop

Thoughts on Presenting

Six Types of Notes in Zettelkasten Method

High Output Management

Grove's Sports Analogy

Backends for Frontends (BFF)

Head First Statistics

Border Gateway Protocol (BGP)

Cloud Computing

Containers vs Virtual Machines

Explorer