Python etc
6.12K subscribers
18 photos
194 links
Regular tips about Python and programming in general

Owner — @pushtaev

© CC BY-SA 4.0 — mention if repost
Download Telegram
Sometimes you want to check the syntax of a py-file without running it. Such naive check may be useful as a commit-hook or a fast continuous integration check.

There is no direct way to do this. You can run the file as python -m module.py, that prevents the traditional if __name__ == '__main__' block from running. Still, all imports will be executed, and this may fail if you want to check syntax in the environment where the module can't be and shouldn't be run.

However, the python standard library contains the py_compile module that generates byte-code from Python source file without running it. That's exactly what we need:

$ python -m py_compile test.c
File "test.c", line 1
int main() {
^
SyntaxError: invalid syntax
.
CPython supports two levels of optimization. You can enable them with -O and -OO flags.

-O sets __debug__ to False and removes all assert statements from the program. -OO do the same and also discards docstrings.

A regular version of a script is cached to .pyc file while an optimized one is cached to .pyo. However, since Python 3.5 .pyo is no more a thing, .opt-1.pyc and .opt-2.pyc are introduced by PEP 488 instead.
To watch multiple file descriptors, asyncio uses the selectors module. It provides high-level access to the API your kernel supports such as epoll (Linux), kqueue (BSD) and so on via corresponding classes (EpollSelector, KqueueSelector etc.).

asyncio uses selectors.DefaultSelector which is equal to the most efficient implementation on the current platform (epoll|kqueue|devpoll > poll > select). If you ever need to use selectors manually, you should prefer DefaultSelector too.

selectors uses the low-level select module, written in C. It contains all the implementations supported by your system, which is decided in compile time.
“Reduce” is a higher-order function that processes an iterable recursively, applying some operation to the next element of the iterable and the already calculated value. You also may know it termed “fold,” “inject,” “accumulate” or somehow else.

Reduce with result = result + element brings you the sum of all elements, result = min(result, element) gives you the minimum and result = element works for getting the last element of a sequence.

Python provides reduce function (that was moved to functions.reduce in Python 3):

In : reduce(lambda s, i: s + i, range(10))
Out: 45
In : reduce(lambda s, i: min(s, i), range(10))
Out: 0
In : reduce(lambda s, i: i, range(10))
Out: 9

Also, if you ever need such simple lambdas like a, b: a + b, Python got you covered with operator module:

In : from operator import add
In : reduce(add, range(10))
Out: 45
SVG is a vector image format that stores image info by specifying all shapes and figures that need to be drawn in XML. The orange circle can be represented as simple as that:

<svg xmlns="http://www.w3.org/2000/svg">
<circle cx="125" cy="125" r="75" fill="orange"/>
</svg>

Since SVG is a subset of XML, it's pretty easy to create SVG files in any language. You can do it in Python with lxml, for example. However, there is the svgwrite module that is intended precisely for creating SVGs.

Here is an example on how to express a Recamán's sequence with a diagram.
To sort some sequence in Python you use sorted:

In : sorted([1, -1, 2, -3, 3])
Out: [-3, -1, 1, 2, 3]


With the key argument you can provide a function that will be used to get a comparison key of each value. Let's sort the same sequence by absolute values:

In : sorted([1, -1, 2, -3, 3], key=abs)
Out: [1, -1, 2, -3, 3]


Let's suppose we also want to put the numbers with the same absolute value in ascending order. In that case, we can provide a tuple as a comparison key:

In : sorted([1, -1, 2, -3, 3], key=lambda x: (abs(x), x))
Out: [-1, 1, 2, -3, 3]


This is not some sorted magic, this is how tuples are sorted in general:

In : (1, 2) == (1, 2)
Out: True

In : (1, 2) > (1, 1)
Out: True

In : (1, 2) < (2, 1)
Out: True
Creating a new variable is essentially creating a new name for an already existing object. That's why it's called name binding in Python.

There are numerous ways to bind names, these are the examples of how x can be bind:

x = y
import x
class x: pass
def x(): pass
def y(x): pass
for x in y: pass
with y as x: pass
except y as x: pass

You also can bind an arbitrary name by manipulating global namespaces:

In : x
NameError: name 'x' is not defined
In : globals()['x'] = 42
In : x
Out: 42

Note, however, that you cannot do the same with locals() since updates to the locals dictionary are ignored.
When you use a variable in Python, it's first looked up in the current scope. If no such variable is found, the next enclosing scope is searched. That is repeated until the global namespace is reached.

x = 1
def scope():
x = 2
def inner_scope():
print(x) # prints 2
inner_scope()
scope()

However, the variable assignment doesn't work the same way. The new variable is always created in the current scope unless global or nonlocal is specified:

x = 1
def scope():
x = 2
def inner_scope():
x = 3
print(x) # prints 3
inner_scope()
print(x) # prints 2
scope()
print(x) # prints 1

global allows using variables of global namespaces while nonlocal searches for the variable in the nearest enclosing scope. Compare:

x = 1
def scope():
x = 2
def inner_scope():
global x
x = 3
print(x) # prints 3
inner_scope()
print(x) # prints 2
scope()
print(x) # prints 3


x = 1
def scope():
x = 2
def inner_scope():
nonlocal x
x = 3
print(x) # prints 3
inner_scope()
print(x) # prints 3
scope()
print(x) # prints 1
Imagine you are moving your web-API from HTTP to HTTPS. How do you handle all requests from clients who are not aware they should use HTTPS? You set up redirection rules.

What HTTP status code should you use? The choice is usually between 301 Moved Permanently and 302 Found. The first one is permanent (as the status name states) and the second one is one-off and never cached. Moving to HTTPS is usually permanent, so the choice is obvious, it's 301 Moved Permanently.

The problem with both 301 and 302 is they work correctly only for HEAD and GET requests. Though all other methods (such as POST) should work as well according to RFC, they don't. A lot of modern HTTP-clients (your favorite browser probably included) make GET requests after the redirection despite the original request method. That became so usual that RFC now explicitly says that you couldn't rely on the client persisting the method.

To fight that problem two other codes were introduced: 303 See Other and 307 Temporary Redirect. 303 implies “use GET for the new request” and 307 means “use the same method for the new request”. So basically most of the clients act like they get 303 when they get 302 while should act like they get 307.

Sadly, both 303 and 307 are temporary. To make a redirect that both method-persisting and permanent one can use 308 Permanent Redirect, but that code is still experimental.

So the correct solution for our HTTP to HTTPS migration is to use 307 Temporary Redirect. 308 is even better, but can't be relied on. Mind that human users usually start an interaction by sending GET request, so the problem with 301 only applies to robots.
list allows you to store an array of any objects. This is quite helpful but may be inefficient. The array module can be used to represent arrays of base values compactly. The supported values include various C types including char, int, long, double and so on. The actual representation is determined by the C implementation.

In : a = array.array('B')
In : a.append(240)
In : a.append(159)
In : a.append(144)
In : a.append(180)
In : a.tobytes().decode('utf8')
Out: '🐴'
The map function produces the new list applying a function to each element of the original list:

>>> map(int, ["1", "2", "3"])
[1, 2, 3]


The thing is it's only right for Python 2. In Python 3 map returns a generator instead, meaning you can apply it to other generators (infinite including). The same thing happened to filter and range.

In : def gen():
...: l = []
...: while True:
...: l.append(1)
...: yield l
...:
In : map(len, gen())
Out: <map at 0x7f85c4a11978>
In : next(map(len, gen()))
Out: 1
If you want to iterate over several iterables at once, the zip function may be a good choice. It returns a generator that yields tuples containing one element from every original iterables:

In : eng = ['one', 'two', 'three']
In : ger = ['eins', 'zwei', 'drei']
In : for e, g in zip(eng, ger):
...: print('{e} = {g}'.format(e=e, g=g))
...:
one = eins
two = zwei
three = drei

Notice, that zip accepts iterables as separate arguments, not a list of arguments. To unzip values, you can use the * operator:

In : list(zip(*zip(eng, ger)))
Out: [('one', 'two', 'three'), ('eins', 'zwei', 'drei')]
If you want to distribute a package across different paths, it can be done with namespace packages, a special kind of packages that don't contain __init__.py files:

$ find dir1 dir2
dir1
dir1/package
dir1/package/a.py
dir2
dir2/package
dir2/package/b.py

$ PYTHONPATH='dir1:dir2' python -c 'import package.a; import package.b'

However, namespace packages weren't a thing until Python 3.3. Python provided pkgutil.extend_path to solve this problem. You make several packages with the same name, but extend_path in every __init__.py, import loads one of that packages and extend_path makes sure other will be loaded too:

$ cat dir1/package/__init__.py
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
$ cat dir2/package/__init__.py
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
$ find dir1 dir2
dir1
dir1/package
dir1/package/a.py
dir1/package/__init__.py
dir1/package/__init__.pyc
dir2
dir2/package
dir2/package/__init__.py
dir2/package/b.py
$ PYTHONPATH='dir1:dir2' python2 -c 'import package.a; import package.b'

You can learn more in PEP 420.
The io module provides two types of in-memory file-like objects. Such objects may be helpful for interacting with interfaces that only support files without the need of creating one. The obvious example is unit-testing.

These two types are BytesIO and StringIO that works with bytes and string respectively.

In : f = StringIO()
In : f.write('first\n')
Out: 6
In : f.write('second\n')
Out: 7
In : f.seek(0)
Out: 0
In : f.readline()
Out: 'first\n'
In : f.readline()
Out: 'second\n'
python supports several forms of starting a script. The usual one is python foo.py; in that case, foo.py would be simply executed.

However, you can also do python -m foo. If foo is not a package, then foo.py is found in sys.path and executed. If it is, then Python executes foo/__init__.py, and foo/__main__.py after that. Note, that __name__ is equal to foo during __init__.py execution, but it's __main__ during __main__.py execution.

You also can do python dir/ or even python dir.zip. In that case, dir/__main__.py is looked for and executed if found.

$ ls foo
__init__.py __main__.py
$ cat foo/__init__.py
print(__name__)
$ cat foo/__main__.py
print(__name__)

$ python -m foo
foo
__main__
$ python foo/
__main__
$ python foo/__init__.py
__main__
In Linux, crontab file must end with an empty line. This is a highly unusual requirement and may lead to an unexpected behavior.

$ crontab file
new crontab file is missing newline before EOF, can't install.

To avoid dealing with that, you may introduce some tests to your project that will check all committed crontab files. Installing crontabs via a custom script that automatically fixes crontabs is also possible. Finally, configuration your favorite editor to always add an empty line at the end of file may be a good idea.
Converting datetime object to the number of seconds since the start of the epoch is not a simple task until Python 3.3.

The most natural solution for the problem is to use strftime method that can format the datetime. Using %s as a format you can get a timestamp. Look a the example:

naive_time = datetime(2018, 3, 31, 12, 0, 0)
utc_time = pytz.utc.localize(naive_time)
ny_time = utc_time.astimezone(
pytz.timezone('US/Eastern'))

ny_time is the exact the same moment as utc_time, but written as New Yorkers see it:

# utc_time
datetime.datetime(2018, 3, 31, 12, 0,
tzinfo=<UTC>)
# utc_time
datetime.datetime(2018, 3, 31, 8, 0,
tzinfo=<DstTzInfo 'US/Eastern' ...>)

Since they are the same moments, their timestamps should be equal:

In : int(utc_time.strftime('%s')),
int(ny_time.strftime('%s'))
Out: (1522486800, 1522468800)

Wait, what? They are not the same at all. In fact, you can't use strftime as a solution for this problem. Python's strftime doesn't even support %s as an argument, it merely works because internally the platform C library’s strftime() is called. But, as you can see, the timezone of datetime object is wholly ignored.

The proper result can be achieved with straightforward subtraction:

In : epoch_start = pytz.utc.localize(
datetime(1970, 1, 1))

In : (utc_time - epoch_start).total_seconds()
Out: 1522497600.0

In : (utc_time - epoch_start).total_seconds()
Out: 1522497600.0

Again, if you use Python 3.3+, you can solve the problem with timestamp() method of datetime: utc_time.timestamp().
A lot of system calls can be interrupted by an incoming signal. If a programmer wants the call to be completed anyway, they have to issue it again.

The notable example is sleep(x) function that is expected to freeze the program for x seconds, but in reality, it can return earlier if a signal appears.

However, since Python 3.5, thanks to PEP 475, Python cares about all such calls for you. The following program ends on the first SIGINT it receives in any Python before 3.5. But it sleeps for exactly five seconds regardless of the signals in Python 3.5+.

import signal
import time
def signal_handler(signal, frame):
print('Caught')a
signal.signal(signal.SIGINT, signal_handler)

time.sleep(5)
List comprehensions may contain more than one for and if clauses:

In : [(x, y) for x in range(3) for y in range(3)]
Out: [
(0, 0), (0, 1), (0, 2),
(1, 0), (1, 1), (1, 2),
(2, 0), (2, 1), (2, 2)
]

In : [
(x, y)
for x in range(3)
for y in range(3)
if x != 0
if y != 0
]
Out: [(1, 1), (1, 2), (2, 1), (2, 2)]

Also, any expression with for and if may use all the variables that are defined before:

In : [
(x, y)
for x in range(3)
for y in range(x + 2)
if x != y
]
Out: [
(0, 1),
(1, 0), (1, 2),
(2, 0), (2, 1), (2, 3)
]

You can mix ifs and fors however you want:

In : [
(x, y)
for x in range(5)
if x % 2
for y in range(x + 2)
if x != y
]
Out: [
(1, 0), (1, 2),
(3, 0), (3, 1), (3, 2), (3, 4)
]
Python supports the new @ operator since Python 3.5. It's intended to use for matrix multiplication. However, none of the standard objects support it; it was introduced specifically for the numpy module.

To make your objects support this operator, you should define one of the following methods: __matmul__, __rmatmul__ or __imatmul__.

You can learn more from PEP 465.