Skip to main content

Python Memory Management

Overview

Python is a high-level language that provides robust and dynamic data manipulation features but doesn't require user to directly manage memory usage as if in languages such as C.

Memory management can be divided into two categories: memory allocation and memory deallocation.

  • Memory Allocation: Handled by Python Memory Manager, responsible for handling memory management aspects such as caching, segmentation, etc. It is done dynamically by the through the Python/ CPython APIs.
  • Memory Deallocation: Handled by the Garbage Collector (GC)

Under the hood, memory management is based on a core mechanism called reference count, which initiates a reference counter for each object and increments or decrements by 1 on referenced and dereferenced. Once the conter reaches zero, the object can be deallocated with the GC. Note that when the gc deallocates those no-longer referenced objects is non-deterministic.

Basics

When we say Python, we are often referring to the CPython implementation, which is actually written in C. And quoting the Python documentation:

Objects are Python’s abstraction for data. All data in a Python program is represented by objects or by relations between objects. (In a sense, and in conformance to Von Neumann’s model of a “stored program computer”, code is also represented by objects.)

-- The Python Language Reference, 3. Data Model

C, however, does not natively support object-oriented programming. Then how does the CPython manage the memory in an object-oriented way?

In CPython, a C struct, PyObject, is implemented to represent all types of data that are allocated on the heap. For a normal Python object, it contains a object’s reference count field and a pointer to the corresponding type object, and all object types are extensions of this type.

Python object Layout (Source: CPyhton - Github)

Memory management utilizes the reference count field to track whether an object is being referenced, which can be increases the count on variable assignment, passed to function as parameters, or included in list, etc. One can get the reference count with sys.getrefcount() function.

Once ref count reaches 0, it indicates that it's no longer in use and the memory block it occupied can be deallocatd for others to utilize. This process is carried out by the garbage collector, which is enabled by default.

import sys

def test(input):
print(f'input has ref count of: {sys.getrefcount(input)}')
# reference decreased on funciton exit

a = [1, 2, 3]
sys.getrefcount(a) # prints: 2
b = a
sys.getrefcount(a) # prints: 3
c = [a, 4]
sys.getrefcount(a) # prints: 4
del c
sys.getrefcount(a) # prints: 3
test(a) # prints: input has ref count of: 5

import gc
gc.isenabled() # prints: True
Note

Objects can be accessed by multiple processes/threads, and simultaneoulsy manipulate on the same memory might result in unpredictable behaviours. This is where the Global Interpreter Lock (GIL) comes in, it prevents multiple thread to execute simultaneouslly.

References