zeitbach.com

Python re caches compiled patterns

TIL: Whenever you compile or match a regular expression string, (C)Python’s regex module internally uses a cache of compiled patterns to optimize performance for you.

At the time of writing, two caches are used:

  1. an LRU (least-recently-used) cache, which keeps the 512 most recently used patterns, and
  2. a FIFO (first-in-first-out) cache, which keeps the 256 latest patterns that entered the cache.

The FIFO cache is checked first, because it has a bit less overhead (you don’t need to update the cache on hits). Both are implemented as dictionaries, relying on dict’s stable insertion order introduced in Python 3.6. So if you’re using less than 512 patterns, you’re guaranteed to hit the cache and using re.compile won’t improve your overall performance. However, explicitly compiling a pattern can make sure that the expensive compilation is not happening at a critical time in your application.

Anyway, I will keep compiling my patterns, if just out of habit.