Python etc
6.11K subscribers
18 photos
194 links
Regular tips about Python and programming in general

Owner — @pushtaev

© CC BY-SA 4.0 — mention if repost
Download Telegram
Now, let's see how to dump stack trace when a specific signal is received. We will use SIGUSR1 but you can do the same for any signal.

import faulthandler
from signal import SIGUSR1
from time import sleep

faulthandler.register(SIGUSR1)
sleep(60)

Now, in a new terminal, find out the PID of the interpreter. If the file is named tmp.py, this is how you can do it (we add [] in grep to exclude the grep itself from the output):

ps -ax | grep '[t]mp.py'

The first number in the output is the PID. Now, use it to send the signal for PID 12345:

kill -SIGUSR1 12345

And back in the terminal with the running script. You will see the stack trace:

Current thread 0x00007f22edb29740 (most recent call first):
File "tmp.py", line 6 in <module>

This trick can help you to see where your program has frozen without adding logs to every line. However, a better alternative can be something like py-spy which allows you to dump the current stack trace without any changes in the code.
PEP-563 (landed in Python 3.7) introduced postponed evaluation of type annotations. That means, all your type annotations aren't executed at runtime but rather considered strings.

The initial idea was to make it the default behavior in Python 3.10 but it was postponed after a negative reaction from the community. In short, it would be in some cases impossible to get type information at runtime which is crucial for some tools like pydantic or typeguard. For example, see pydantic#2678.

Either way, starting from Python 3.7, you can activate this behavior by adding from __future__ import annotations at the beginning of a file. It will improve the import time and allow you to use in annotations objects that aren't defined yet.

For example:

class A:
@classmethod
def create(cls) -> A:
return cls()


This code will fail at import time:

Traceback (most recent call last):
File "tmp.py", line 1, in <module>
class A:
File "tmp.py", line 3, in A
def create(cls) -> A:
NameError: name 'A' is not defined


Now add the magic import, and it will work:

from __future__ import annotations

class A:
@classmethod
def create(cls) -> A:
return cls()


Another solution is to manually make annotations strings. So, instead of -> A: you could write -> 'A':.
Often, your type annotations will have circular dependencies. For example, Article has an attribute category: Category, and Category has attribute articles: list[Article]. If both classes are in the same file, adding from __future__ import annotations would solve the issue. But what if they are in different modules? Then you can hide imports that you need only for type annotations inside of the if TYPE_CHECKING block:

from __future__ import annotations
from dataclasses import dataclass
from typing import TYPE_CHECKING

if TYPE_CHECKING:
from .category import Category

@dataclass
class Article:
category: Category


Fun fact: this constant is defined as TYPE_CHECKING = False. It won't be executed at runtime, but the type checker is a static analyzer, it doesn't care.
PEP-604 (landed in Python 3.10) introduced a new short syntax for typing.Union (as I predicted, but I messed up union with intersection, shame on me):

def greet(name: str) -> str | None:
if not name:
return None
return f"Hello, {name}"


You already can use it in older Python versions by adding from __future__ import annotations, type checkers will understand you.
LookupError is a base class for IndexError and KeyError:

LookupError.__subclasses__()
# [IndexError, KeyError, encodings.CodecRegistryError]

KeyError.mro()
# [KeyError, LookupError, Exception, BaseException, object]

IndexError.mro()
# [IndexError, LookupError, Exception, BaseException, object]


The main purpose of this intermediate exception is to simplify a bit lookup for deeply nested structures when any of these two exceptions may occur:

try:
username = resp['posts'][-1]['authors'][0]['name']
except LookupError:
username = None
The operator is checks if the two given objects are the same object in the memory:

{} is {}  # False
d = {}
d is d # True

Since types are also objects, you can use it to compare types:

type(1) is int        # True
type(1) is float # False
type(1) is not float # True

And you can also use == for comparing types:

type(1) == int  # True

So, when to use is and when to use ==? There are some best practices:

+ Use is to compare with None: var is None.

+ Use is to compare with True and False. However, don't explicitly check for True and False in conditions, prefer just if user.admin instead of if user.admin is True. Still, the latter can be useful in tests: assert actual is True.

+ Use isinstance to compare types: if isinstance(user, LoggedInUser). The big difference is that it allows subclasses. So if you have a class Admin which is subclass of LoggedInUser, it will pass isinstance check.

+ Use is in some rare cases when you explicitly want to allow only the given type without subclasses: type(user) is Admin. Keep in mind, that mypy will refine the type only for isinstance but not for type is.

+ Use is to compare enum members: color is Color.RED.

+ Use == in ORMs and query builders like sqlalchemy: session.query(User).filter(User.admin == True). The reason is that is behavior cannot be redefined using magic methods but == can (using __eq__).

+ Use == in all other cases. In particular, always use == to compare values: answer == 42.
The del statement is used to delete things. It has a few distinct behaviors, depending on what is the specified target.

If a variable specified, it will be removed from the scope in which it is defined:

a = []
del a
a
# NameError: name 'a' is not defined

If the target has a form target[index], target.__delitem__(index) will be called. It is defined for built-in collections to remove items from them:

a = [1, 2, 3]
del a[0]
a # [2, 3]

d = {1: 2, 3: 4}
del d[3]
d # {1: 2}

Slices are also supported:

a = [1, 2, 3, 4]
del a[2:]
a # [1, 2]

And the last behavior, if target.attr is specified, target.__delattr__(attr) will be called. It is defined for object:

class A:
b = 'default'
a = A()
a.b = 'overwritten'
a.b # 'overwritten'
del a.b
a.b # 'default'
del a.b # AttributeError
The method __del__ is called on the object by the garbage collector when the last reference to the object is removed:

class A:
def __del__(self):
print('destroying')

a = b = A()
del a
del b
# destroying

def f():
a = A()

f()
# destroying


The method is used by Python's file object to close the descriptor when you don't need it anymore:

def f():
a_file = open('a_file.txt')
...


However, you cannot safely rely on that the destructor (this is how it's called in other languages, like C) will be ever called. For instance, it can be not true in PyPy, MicroPython, or just if the garbage collector is disabled using gc.disable().

The thumb-up rule is to use the destructor only for unimportant things. For example, aiohttp.ClientSession uses __del__ to warn about an unclosed session:

def __del__(self) -> None:
if not self.closed:
warnings.warn(
f"Unclosed client session {self!r}", ResourceWarning
)
By using __del__ and global variables, it is possible to leave a reference to the object after it was "destroyed":

runner = None
class Lazarus:
def __del__(self):
print('destroying')
global runner
runner = self

lazarus = Lazarus()
print(lazarus)
# <__main__.Lazarus object at 0x7f853df0a790>
del lazarus
# destroying
print(runner)
# <__main__.Lazarus object at 0x7f853df0a790>


In the example above, runner points to the same object as lazarus did and it's not destroyed. If you remove this reference, the object will stay in the memory forever because it's not tracked by the garbage collector anymore:

del runner  # it will NOT produce "destroying" message


This can lead to a strange situation when you have an object that escapes the tracking and will be never collected.

In Python 3.9, the function gc.is_finalized was introduced that tells you if the given object is a such runner:

import gc
lazarus = Lazarus()
gc.is_finalized(lazarus) # False
del lazarus
gc.is_finalized(runner) # True


It's hard to imagine a situation when you'll need it, though. The main conclusion you can make out of it is that you can break things with a destructor, so don't overuse it.
The module warnings allows to print, you've guessed it, warnings. Most often, it is used to warn users of a library that the module, function, or argument they use is deprecated.

import warnings

def g():
return 2

def f():
warnings.warn(
"f is deprecated, use g instead",
DeprecationWarning,
)
return g()

f()

The output:

example.py:7: DeprecationWarning: 
function f is deprecated, use g instead
warnings.warn(

Note that DeprecationWarning, as well as other warning categories, is built-in and doesn't need to be imported from anywhere.

When running tests, pytest will collect all warnings and report them at the end. If you want to get the full traceback to the warning or enter there with a debugger, the easiest way to do so is to turn all warnings into exceptions:

warnings.filterwarnings("error")

On the production, you can suppress warnings. Or, better, turn them into proper log records, so they will be collected wherever you collect logs:

import logging
logging.captureWarnings(True)
The string.Template class allows to do $-style substitutions:

from string import Template
t = Template('Hello, $channel!')

t.substitute(dict(channel='@pythonetc'))
# 'Hello, @pythonetc!'

t.safe_substitute(dict())
# 'Hello, $channel!'

Initially, it was introduced to simplify translations of strings. However, now PO-files natively support python-format flag. It indicates for translators that the string has str.format-style substitutions. And on top of that, str.format is much more powerful and flexible.

Nowadays, the main purpose of Template is to confuse newbies with one more way to format a string. Jokes aside, there are a few more cases when it can come in handy:

+ Template.safe_substitute can be used when the template might have variables that aren't defined and should be ignored.
+ The substitution format is similar to the string substitution in bash (and other shells), which is useful in some cases. For instance, if you want to write your own dotenv.
A long time ago, we already covered the chaining of comparison operations:
https://tttttt.me/pythonetc/411

A quick summary is that the result of right value from each comparison gets passed into the next one:

13 > 2 > 1  # same as `13 > 2 and 2 > 1`
# True

13 > 2 > 3 # same as `13 > 2 and 2 > 3`
# False


What's interesting, is that is and in are also considered to be operators, and so can be also chained, which can lead to unexpected results:

a = None
a is None # True, as expected
a is None is True # False 🤔
a is None == True # False 🤔
a is None is None # True 🤯


The best practice is to use the operator chaining only to check if the value in a range using < and <=:

teenager = 13 < age < 19
This post is provided by @PavelDurmanov:

As you may know, generators in Python are executed step-by-step. This means that there should be a possibility to "see" that state between the steps.

All generator's local variables are stored in frame locals, and we can access the frame through the gi_frame attribute on a generator:

def gen():
x = 5
yield x
yield x
yield x

g = gen()
next(g) # 5
g.gi_frame.f_locals # {'x': 5}


So if we can see it, we should be able to modify it, right?

g.gi_frame.f_locals["x"] = 10
next(g) # still gives us 5


Frame locals returned as a dict is a newly created object from actual frame local vars, meaning that returned dict doesn't reference the actual variables in the frame.

But there's a way to bypass that with C API:

import ctypes

# after we've changed the frame locals, we need to "freeze" it
# which is basically telling the interpreter to update the underlying frame based on newly added attributes
ctypes.pythonapi.PyFrame_LocalsToFast(ctypes.py_object(g.gi_frame), ctypes.c_int(0))


So now we can verify that the generator's locals have actually changed:

next(g)  # 10


You might wonder what is ctypes.c_int(0)? There are 2 "modes" you can use to update the underlying frame, 0 and 1. If you use 1, it'll add and/or update frame local vars that are already present in the frame. So if we'd remove the x from the locals dict and call the update with c_int(0), it'd do nothing as it cannot delete the vars.

if you want to actually delete some variable from the frame, call the update with c_int(1). That will replace underlying frame locals with the new locals we've defined .f_locals dict.

And as you may know, coroutines in Python are implemented using generators, so the same logic is present there as well, but instead of gi_frame it's cr_frame.
The os.curdir is a trap!

import os
os.curdir
# '.'

It's a constant indicating how the current directory is denoted in the current OS. And for all OSes that CPython supports (Windows and POSIX), it's always a dot. It might be different, though, if you run your code with MicroPython on some niche OS.

Anyway, to actually get the path to the current directory, you need os.getcwd:

os.getcwd()
# '/home/gram'

Or use pathlib:

from pathlib import Path
Path().absolute()
# PosixPath('/home/gram')
Python 3.11 is released! The most interesting features:

+ Fine-grained error location in tracebacks.
+ ExceptionGroup and the new except* syntax to handle it.
+ A new module to parse TOML.
+ Atomic grouping and possessive quantifiers for regexes.
+ Significant performance improvements.
+ New Self type.
+ Variadic generics.
+ Data class transforms.

That's a lot of smart words! Don't worry, we'll tell you in details about each of these features in the upcoming posts. Stay tuned!
PEP 657 (landed into Python 3.11) enhanced tracebacks so that they now include quite a precise location of where the error occurred:

Traceback (most recent call last):
File "query.py", line 24, in add_counts
return 25 + query_user(user1) + query_user(user2)
^^^^^^^^^^^^^^^^^
File "query.py", line 32, in query_user
return 1 + query_count(db, response['a']['b']['c']['user'], retry=True)
~~~~~~~~~~~~~~~~~~^^^^^
TypeError: 'NoneType' object is not subscriptable

It shows not only where the error occurred for each frame, but also which code was executed. Beautiful!
PEP 678 (landed in Python 3.11) introduced a new method add_note for BaseException class. You can call it on any exception to provide additional context which will be shown at the end of the traceback for the exception:

try:
1/0
except Exception as e:
e.add_note('oh no!')
raise
# Traceback (most recent call last):
# File "<stdin>", line 2, in <module>
# ZeroDivisionError: division by zero
# oh no!

The PEP gives a good example of how it can be useful. The hypothesis library includes in the traceback the arguments that caused the tested code to fail.
PEP 654 (landed in Python 3.11) introduced ExceptionGroup. It's an exception that nicely wraps and shows multiple exceptions:

try:
1/0
except Exception as e:
raise ExceptionGroup('wow!', [e, ValueError('oh no')])

# Traceback (most recent call last):
# File "<stdin>", line 2, in <module>
# ZeroDivisionError: division by zero

# During handling of the above exception, another exception occurred:

# + Exception Group Traceback (most recent call last):
# | File "<stdin>", line 4, in <module>
# | ExceptionGroup: wow! (2 sub-exceptions)
# +-+---------------- 1 ----------------
# | Traceback (most recent call last):
# | File "<stdin>", line 2, in <module>
# | ZeroDivisionError: division by zero
# +---------------- 2 ----------------
# | ValueError: oh no
# +------------------------------------

It's very helpful in many cases when multiple unrelated exceptions have occurred and you want to show all of them: when retrying an operation or when calling multiple callbacks.
PEP 654 introduced not only ExceptionGroup itself but also a new syntax to handle it. Let's start right with an example:

try:
raise ExceptionGroup('', [
ValueError(),
KeyError('hello'),
KeyError('world'),
OSError(),
])
except* KeyError as e:
print('caught1:', repr(e))
except* ValueError as e:
print('caught2:', repr(e))
except* KeyError as e:
1/0

The output:

caught1: ExceptionGroup('', [KeyError('hello'), KeyError('world')])
caught2: ExceptionGroup('', [ValueError()])
+ Exception Group Traceback (most recent call last):
| File "<stdin>", line 2, in <module>
| ExceptionGroup: (1 sub-exception)
+-+---------------- 1 ----------------
| OSError
+------------------------------------

This is what happened:

1. When ExceptionGroup is raised, it's checked against each except* block.

2. except* KeyError block catches ExceptionGroup that contains KeyError.

3. The matched except* block receives not the whole ExceptionGroup but its copy containing only matched sub-exceptions. In case of except* KeyError, it includes both KeyError('hello') and KeyError('world')

4. For each sub-exception, only the first match is executed (1/0 in the example wasn't reached).

5. While there are unmatched sub-exceptions, they will be tried to match to remaining except* blocks.

6. If there are still sub-exceptions left after all of that, the ExceptionGroup with them is raised. So, ExceptionGroup('', [OSError()]) was raised (and beautifully formatted).
There is one more thing you should know about except*. It can match not only sub-exceptions from ExceptionGroup but regular exceptions too. And for simplicity of handling, regular exceptions will be wrapped into ExceptionGroup:

try:
raise KeyError
except* KeyError as e:
print('caught:', repr(e))
# caught: ExceptionGroup('', (KeyError(),))