474 subscribers
6 photos
1 video
2 files
550 links
python, go, code quality, security, magic

Website and RSS:
https://itgram.orsinium.dev

Source:
https://github.com/orsinium/itgram

Author:
@orsinium
https://orsinium.dev/
Download Telegram
Попробовал тут ray. Это такая чудесная библиотека для запуска кода на нескольких процессах, машинах и даже GPU. Интерфейс чудесный и максимально простой. Одна из тех замечательных штук, что умеют выдерживать баланс между огромными возможностями и простым использованием. Вот какой у меня получился код, чтобы параллельно запустить обработку кучи файлов:

@ray.remote
def f(input_path, output_path):
...

ray.init()
remotes = []
for path in Path(sys.argv[1]).iterdir():
remotes.append(f.remote(
input_path=str(path),
output_path=str(tmp / (path.stem)),
))
ray.get(remotes)


Идеально.

Сегодня на кнопке пицца. Потому что я ещё не видел Python проекта с пиццой на логотипе, несмотря на то, что пицца офигенна.

#python
python-prompt-toolkit -- штука для создания всяких интерактивных CLI, REPLов и всего такого. Есть подсветка синтаксиса с помощью pygments, автодополнение и всё такое. На нём работает pgcli, ipython, gitsome (умный автокомплит для git), EdgeDB, pyvim (vim на Python, просто потому что) и много что ещё.

#python
Our journey to type checking 4 million lines of Python -- длинная история от создателей MyPy о том, как проект рос, какие проблемы призван решать, с какими трудностями столунклись в процессе. Довольно интересное чтиво о мотивации, стоящей за аннотациями типов в Python и какими-то неочевидными решениями (stub файлы, опциональность аннотаций).

#python #typing
Pyright is an alternative Python static type checker from Microsoft, written on TypeScript. The main difference from MyPy is "watch mode" that runs type checks only for the changed parts of the code. That means, faster for local development and better for IDEs.

Pylance is a new VS Code extension and language server for Python from Microsoft, empowering the main Python extension. It's built on top of pyright. Features: fast, knows a lot about types (hence better code discovery, go to definition, documentation pop-ups), semantic syntax highlighting, auto-import, running type checks as-you-type (VS Code can't run Flake8 without saving file yet, AFAIK). See announce and changelog for more features. Or just give it a try.

#python #vscode
# Algebraic effects

1. One year ago I posted 📄 Algebraic Effects for the Rest of Us (the post is here 🇷🇺)

2. On the last weekend I read 📄Why PLs Should Have Effect Handlers. It's a bit older article but simpler and shorter. And it remind me about the topic...

3. Today I released eff — a small and simple Python library to work with algebraic effects. It is a true "pythonic" library, with context managers, metaclasses, type annotations, and global variables 🙃
https://github.com/orsinium-labs/eff

It's a part of "orsinium-labs" where I do quick experiments. Maybe, someone will find this experiment interesting and maybe even helpful. Since it's a very small library, I consider it "production ready". Have fun!

#python
🐍 pympler is a Python library for quick debugging of memory leaks. The idea is simple: record objects size and count before running a function, run the function, find the diff, show the new objects. For "pure" functions no new objects should be created. The tool is quite powerful, it even has a Tk-based interface because why not.

#python
🐍 atheris is a fuzzer for Python code written on C++. It generates random bytes, feeds it into the function and checks if it fails. To cover more cases, it on every run checks the coverage of the tested code. It has an awful undocumented API but you don't need to know much to use it. Atheris has been around for some time but Google open-sourced it only about a month ago.

In general, there are not many fuzzers for Python, so it's a great news. I know only 2 more:

+ python-afl
+ pythonfuzz

BTW, the next release of deal will have a native support for fuzzers and hypothesis.

#python
🐍 folium is a Python library to render interactive maps right in Jupyter notebooks. Perfect for geo points visualization: easy to use, different styles and APIs, customizable markers.

#python
A long time ago, there was a post about profiling Python code 🇷🇺🐍. The conclusion was that pyflame works the best. A lot has changed since then. A few years ago, pyflame was archived. Now, it's a part of 158 dead projects from uber.

However, py-spy has changed a lot for better. Now it is cool, stable, and saved me multiple times. I constantly use it for profiling and a few times I used it to dump the stack trace for a running process. So, forget that old post, use py-spy.

#python
typesplainer is a fun little tool (CLI, vscode/vim extension, and website) for explaining type annotations in a Python code in plain English. I think it has a pretty limited application (if you don't know a specific type, better read the official docs). still, might be helpful if you're starting on a project with type annotations but haven't worked with them before.

#python
Relatively recently, bloomberg open-sourced memray — the first memory profiler for Python that doesn't suck. I especially like their pytest plugin. I tried it and it works pretty nicely, it already found a few places in my code where memory consumption can be a bit less aggressive.

If you need to profile execution time, py-spy is still the best.

#python
If you Google "data visualization in Python", most of the tutorials you'll see will mostly cover matplotlib. The truth is that, yes, matplotlib is very powerful and flexible tool but at the same time very low-level and hard to use. So, plenty of wrappers emerged on top of it.

The most famous wrapper is seaborn. It is very high-level and has sensible defaults to make nice looking commonly used charts with a call to a single function.

My personal favorite is plotnine, I use it all the time when I need to visualize something. It's based on the idea of "the grammar of graphics" introduced in R library ggplot2. The idea is pretty simple. In ggplot2 (and so in plotnine) the visualization consists of multiple layers: data (a data frame), aesthetics (what rows and aggregations to use for ox, oy, color, etc.), and one or more geometries (a way to visualize data using the aesthetic, like boxplot, bars, etc.). There are also "facets", "scales", and "stats", but they are also just layers.

Why this is cool:
1. It's easy to learn, explain, and understand.
2. It's easy to use and it's not so much code.
3. It's flexible, you can make any visualizations and combine them in any manner you want.

This is why the recent alpha release of seaborn (v0.12.0a1) introduced Next-generation seaborn interface. The article nods to ggplot2 and plotnine as the sources of inspiration for the new interface. But it also says the interface will be different and more "Pythonic". All that sounds very cool. Maybe, the new seaborn will win my heart over plotnine ❤️

Both libraries are one-man projects. That's crazy how much time some people can invest in a single project without getting paid. I was there but I gave up at some point on dephell when I just couldn't sleep at night for hours because of anxiety about unresolved issues... Be mindful of your health, folks.

#python
SimpleParsing is a little Python library that "does one job and does it well".

If you, like me, want everything in your Python code to be type annotated (for the sake of autocomplete, semantic syntax highlighting, and safety) you may know that the container that argparse returns (argparse.Namespace) isn't type checked because mypy can't statically know what flags and of what type you registered in ArgumentParser. And if you want to make it type-safe, there is quite a bit of boilerplate: define flags, define type safe container, unpack flags into the container. And if the definition and the container mismtch, well, your code is wrong.

SimpleParsing solves exactly this problem. You define a dataclass class, annotate attribute types, set defaults, add comments to attributes, and then SimpleParsing will turn it into CLI args. Attribute names will form names of the CLI flags, types will be checked, defaults will be respected, and comments will be turned into help messages.

I've seen quite a few alternatives for that task, and all of them work on top of pydantic, click, or attrs. And I really like that SimpleParsing works with what we already have in stdlib, without bringing unnecessary dependencies.

And don't listen to anyone, click sucks. Testing it is quite hard, functions with 20 arguments and 20 decorators on top aren't nice, and passing all that stuff deeper into the code is hard and verbose. Using click will encourage dirty and verbose code and bad practice. Oh, and IMHO the CLI it produces is worse than that of argparse.

#python
A week ago, SQLAlchemy 2.0.0 was released. Now, the default way to describe ORM models is using declarative type-annotations based definitions. So, the fields instead of id = Column(Integer) can be described as just id: int. In some cases, it can get more verbose, but it pays back by better IDE integration, syntax highlighting, type checking, and other cool stuff that comes with type annotations.

I wanted to try different alternative ORMs with asyncio and typing support, but never got to it. Now I think that it gets quite hard to beat SQLAlchemy. The project, despite being very old, still keeps up the pace (what I can't say about Django ORM, Pony ORM, or Peewee) and has very good support for modern practice. Namely, for asyncio and type annotations.

Anyway, there are some asyncio-powered ORMs that I haven't tried but that look interesting:

+ sqlmodel is a thin wrapper on top of pydantic and sqlalchemy from the author of FastAPI. It's not actively maintained but there is not that much of code for it to be a big problem. This is the most popular ORM on this list because the author is famous.
+ ormar is another wrapper on top of pydantic and sqlalchemy to consider. Don't get deceived by the number of commits, though, they all are from dependabot.
+ tortoise-orm is an asyncio ORM inspired by Django ORM. At this point, I'm not sure anymore that it's a good idea. A long time ago, I used to like Django ORM for its similicity, but now I'm more skeptical about this simplicity as I learned how much of poor performance and testability it costs. Internally, it uses pypika for building the queries.
+ piccolo has quite a nice query builder but model definitions aren't declarative. Also, they say its "fully type annotated" but that's not what you might expect. There is Any all around, and no type safety at all in what the queries return.

#python
🐍 Pandas 2.0 is released! The biggest highlight is a support for pyarrow instead of numpy as a backend, which gives a better performance and a better support for missing values. A few months ago, there was a great post about it in the pandas blog: pandas 2.0 and the Arrow revolution.

The release notes are a bit too verbose, so I recommend reading this TL;DR instead: Everything you need to know about pandas 2.0.0.

#python
ruff is making a lot of buzz in Python community recently. It's a Python linter written in Rust. It re-implements all built-in rules from flake8, most of the rules from pylint, and a lot of flake8 plugins. On top of having it all out-of-the-box, it's very fast and provides autofix logic for many stylistic checks.

It's still in "beta", and a few months ago when I tried it for the first time it exploded on most of big projects I tried it on, but now it's stable enough. I recommend giving it a try on your pet projects.

For me, the number one feature of the tool is a vscode plugin that lints the code as you type it. It's something that I had in Atom years ago and what I've been missing the most in vscode. It even has autofixes as refactoring actions!

From the disadvantages, I'd mention that it brings the clumsy configuration approach from flake8 and still doesn't have a way to write plugins for it. But I think fixing it is just a matter of time.

BTW, did you know Atom had been archived in March? It was objectively less stable and consistent than vscode, but it had some nice features, and it's always great for ecosystem to have alternatives. I still use Atom hotkeys and theme in vscode. Probably, since Microsoft acquired Github, ditching Atom in favor of vscode was inevitable.

#python
msgspec is a serialization and validation library for Python with schemas defined declaratively with type annotations (like dataclasses). Think of it as a pydantic alternative. And it has quite a few advantages:

1. Performance. In their benchmarks they claim to tear apart everything else, especially pydantic. And while Pydantic 2.0 with the core implemented on Rust is on horizon, it's not clear when it will be finished, and I still expect msgspec to stay ahead.

2. No implicit type conversion. If the filed type is specified as int and you pass in a string, pydantic (by default) will try its best to convert the passed value to the int. While it is useful for some cases, like parsing URL or GET query parameters, most often you'll use it for interacting with other APIs and providing your own API, and there types are better to be strict.

3. Clean and consistent API. Pydantic has a relatively long history, and it has accumulated a fair share of bad decisions. Again, it's something that the author wants to fix in pydantic 2.0, which isn't clear when it will come out. Msgspec has a clean and functional API that doesn't mess with your classes.

4. Out-of-the-box support for many formats: JSON, YAML, TOML, and MessagePack. And if you need more, there is a function that converts your data into JSON-compatible primitive types that you can then serialize as you want.

A nice bonus is the support for dataclass and TypedDict, so you can give it a try without rewriting the models you have.

#python
refurb is yet another linter for Python. What makes it special is that it internally runs as a mypy plugin. This gives the linter static information about types for everything. As a result, refurb can give more precise and reliable suggestions. But of course it also means refurb is slower than something like pycodestyle which works on regular expressions or ruff which is written on Rust and works only with AST.

#python
Mutation testing is technique for evaluating how good your tests are. The idea is that a special tool (or a very bored human) breaks the code by slightly changing ("mutating") something in it and then running all the tests, and the tests must fail. If they don't fail, you either have dead code or the tests aren't good enough.

There are some pre-requisites for when it mkes sense to do that:

1. The test coverage is 100%. If it's not, you already know what you should write tests for.
2. The tests are fast.
3. The tests are reliable.
4. The codebase is small.

mutmut is a tool for mutation testing of Python code. You simply point it to the directory with the code, directory with the tests, and what command to use to run the tests, and it will do the rest. When it's done, you can generate an HTML report with the list of diffs of mutations for each file that weren't detected by the tests.

I think it's an "advanced" testing technique. I use it not that often and only on small projects that need to be reliable. There are quite a few "false positives" (things that you can't and shouldn't test, like databse connection options), but from my experience it also a very reliable way to detect what tests you're missing.

#python