I've not tried Nuitka yet, but I've done similar things with 2 other tools to package up a Qt-based GUI tool that has a Windows installer, Mac ".dmg" or a Linux package:
* FBS (FMan's Build System) - worked well for a while, but we couldn't upgrade Qt past a certain version and he version we used was only happy with Python 3.6.
* Beeware - our current solution. Very, very happy with this; works smoothly, lets us use the latest Python and Qt versions and lets us make professional-looking installers that we can distribute.
Hey - first of all, looks great and I'll definitely look into it. Good job!
Question - do you have any comparisons on the size of the resulting distributed executable compared to Nuitka, pyinstaller?
This is the issue I've faced. Nontrivial programs (that, say, import pandas, numpy, selenium) are huge, implying a ~250MB+ download for the user - and then maybe up to 1GB in disk when its all unpacked.
The OpenBLAS version of NumPy (from pip rather than conda), plus PyQt5, can fit in around 100 megabytes uncompressed on disk for a Win64 binary. However I no longer recommend writing number crunching apps in Python, since you don't have thread-based parallelism due to the GIL, and have to awkwardly shuffle data across processes. (Last time I tried multiprocessing, some data can only be sent to workers on Linux using fork-based worker spawning, and not on Windows which lacks fork and requires pickling.)
If the number crunching is numpy-heavy, multiple threads absolutely can use real thread-based parallelism. C extensions like numpy release the GIL and so can run in parallel.
Ah, great news. Thanks for making FBS; it is incredibly liberating to find a tool that lets us write Python GUI apps and have them appear as if they were made by XCode or Visual Studio.
Tried to use this recently. I don’t understand the decision to limit the free version to versions < 3.6. It would be beneficial for you and your users if you at least have a free trial for versions above 3.6.
I track my time and have spent 570 hours since 2018 working on fbs. The vast majority of this (526h) was for the free version you are complaining about, the nice documentation etc. You are criticizing me for not giving you everything I do for free?
Beeware sounds really interesting. I've often wondered if there are any things like it out there that can help me build complex GUIs (e.g. user interfaces with interactive 3D stuff), and an interface that doesn't look like its from the 90s. I'll look into Beeware more in depth, hopefully it can handle stuff like that.
Big fan of Nuitka, and specifically the author Kay.
I sent him an email years back, just thanking him for his work and asking how to donate. I was surprised when he wrote a very thoughtful and kind reply, acting rather shocked why someone would give him money. He's a real gem of a person, and I'm glad his work is getting (rightly) recognized.
Nuitka is the best python compiler I've used. I have tried at least three others and none worked as good as nuitka. And I'm talking about complicated dependencies such as pytorch, selenium, tesseract - that type of stuff.
Based on my own tests, once you have a 'normal' program that does a bit of IO, calls out some C based libraries etc. etc. I've rarely seen more than 10-20% performance increase across the whole program run.
For individual functions you can see 2-4 times speed up over pure python.
Most importantly, I've never seen it result in slower than cpython performance.
It's all about compatibility to be honest. PyPy has done yeoman's work and it's still lacking in compatibility for important packages (e.g., psycopg2). Performance means little if we can't use major parts of the ecosystem.
A few years ago I experimented with running a Pyramid/SQLAlchemy/Postgres app with PyPy. I was able to get it to work with https://pypi.org/project/psycopg2cffi/
I didn't notice any particular speedup for this app - manipulating lots of JSON Python data structures is not really PyPy's sweet spot. Whereas for some processing of large genomic data files I saw a substantial speedup.
Is it preferable on PyPy to use a libpq wrapper over a native solution? As far as I can tell, even though CPython C API is in some way supported on PyPy it's really only advisable to use packages using it as a last resort for performance reasons. Aside from psycopg2cffi, which you should be able to use with PyPy, I'm pretty sure I saw other PEP 249 implementations for PostgreSQL the last time I checked (which admittedly was a few years ago).
> As far as I can tell, even though CPython C API is in some way supported on PyPy it's really only advisable to use packages using it as a last resort for performance reasons
Yes, but so much of the Python ecosystem uses the C API that the "last resort" and the "common case" are one in the same most of the time. `psycopg2cffi` IIRC that's not very well supported. It looks like it was updated last in January of 2021 but before that the last update was from 2018. This was what prevented us from using it in 2020. Moreover, there are other packages besides Postgres drivers; that's just the one that I recall running into problems with. To be clear, I want Pypy to be successful, and the project is nothing short of amazing.
> I'm pretty sure I saw other PEP 249 implementations for PostgreSQL the last time I checked (which admittedly was a few years ago).
There was a pure Python version, but it didn't seem battle-tested and there was no indication of its quality or performance. I don't want to pull a package like that into production for something as important as a database driver. It's been a couple years since I looked into it as well, so perhaps things have changed for the better in the interim.
Yeah, that's shame, because 1) that API is mildly awful, and 2) it really restricts implementation choices to the extent that making another implementation of the language is a problem just because of the need to fake to numerous existing C extensions that your implementation choices are the same as CPython's (when really your implementation is likely to work very differently on the inside). I guess Python people really programmed themselves into a corner here.
Without further clarification, when I read "compiler" I translate it as "standalone program which takes source code and emits exactly one binary which executes the program".
It can be a much broader term, for example you will run into people here arguing (with some justification!) that a "transpiler" doesn't exist since it's a mere subset of the broad definition of a compiler.
I'm just reporting on the mental image that calling such-and-such a compiler forms in my mind's eye. I expect I'm not alone in that.
There's nothing "broad" about a compiler not generating "exactly one binary". First, most of the time, even in conventional systems, it is the linker who creates that binary, not the compiler. Second, in other systems, such as Oberon, there's not even "exactly one binary" for anything, unless you're talking about the currently running system as a whole.
Is Nuitka able to create self-contained binaries that can be distributed as a single file? Or do they still have external dependencies such as the Python runtime?
I’ve chosen Go rather than Python for a few small projects recently. Not for performance, parallelism or a particular fondness for the language, but just because Go can build truly standalone executables.
The last time I tried this was quite a while ago and your question was exactly why I tried. But, it failed with some unintuitive errors in dependencies of my Python script, and I gave up after 10 mins of Googling.
I’m still wary of that experience and will avoid Python where I have such deployment needs unless the language natively comes up with such a build solution.
FWIW, I think borg(backup solution written in Python and C) uses pyinstaller to get a single binary executable. It may be of interest to you.
The only problem I have found with Pyinstaller is that their "standalone" builds still depends on certain libraries that will change with the version of Python in the host OS, such as the glibc library, or others.
The proposed solution by Pyinstaller is to build your program in the "oldest" version you support, which is silly compared to build systems like Go's, which are true standalone.
Having said that, I'm curious about this solution, since it seems to claim a true standalone build...
The other issue with PyInstaller is that, by default, it includes all dynamic libraries that the Python interpreter has on your machine. It makes sense, it needs an interpreter and collects what is needed for it to run.
Unfortunately this might include libreadline.so, which is licensed under GPL, making your resulting executable unable to be under a proprietary license.
There are ways to solve this issue, but one has to search and read documentation (and code, in my case -- when I was researching it the docs were not clear).
> [PyInstaller] includes all dynamic libraries that the Python interpreter has on your machine.
Yes - and last time I used it, it created either a large folder or a compressed archive containing all of those libraries. Only the latter gives a truly standalone executable - but it's very slow to start up because it has to extract the archive to disk every time it runs.
It sounds like Nuitka has a solution for this problem, at least on Linux: "[the binary] will not even unpack itself, but instead loop back mount its contents as a filesystem".
Indeed it does. The only downside of this is that the resulting binary is (at least in my case) almost tripled in size compared to Pyinstaller (5MB vs 14.6MB). But I can live with that.
Still doesn't statically link C libraries (or at least I didn't find the setting for it), or other libraries for that matter.
Pyinstaller binary build depends only on: libdl.so.2, libz.so.1 and libc.so.6.
Nuitka binary build depends on: libdl.so.2, libz.so.1 and libc.so.6 AND libpthread.so.0 (for the loopback mounts I suppose).
The one that always creates problems is libc.so.6, which usually is not present 4 year old systems...
Go binaries are often not truly standalone. You can certainly make static Go binaries that don't use cgo, but you can't then use standard libraries which rely on it.
I believe that if you use MUSL for libc, you can statically link libc into the binary. That should make the standard library all statically linkable, I'm not aware of any other C dependencies (although I could very well be wrong).
Hmm, with Pyinstaller I generate true standalones which require no libs at all, nor Python to be installed on the host system. Do you use the --onefile option?
As an example, here's a little project which I also release as a single binary file that works everywhere I've tested, with no deps:
I just downloaded it and it works great. Took me a minute to figure out how it wanted me to enter my location (Austin, TX USA is what worked). Would it be hard to add a button to get the location from the OS? I think all the major operating systems have a location service these days.
Yeah but they only run on a single operating system, so all you've accomplished is going from O(n*2) files to O(n) which is good but it'll never be great like APE which needs 1 file.
Each binary runs on a single operating system, but you can easily cross-compile in Go. This is strictly better than the compiled Python case--either you're compiling Python to native code in which case you have the same problem and none of the easy-cross-compile benefits or else you ship a zip which inevitably contains lots of C dependencies that are also specific to one platform and you still can't easily cross compile.
> so all you've accomplished is going from O(n*2) files to O(n)
You go from N*M*2 where `M` is the number of target platforms to M (no need to use big-O notation as far as I can tell).
Are people actually using APE? It's a really clever hack but I can't imagine it's a good idea to actually distribute executables that rely on ancient COM / shell formats and a weird C library to work.
It's really not that hard to build standalone binaries for Mac, Linux and Windows using GitHub Actions. Especially if you use a reasonably modern language like Go, Dart or Rust.
> In the future Nuitka will be able to use type inferencing based on whole program analysis. It will apply that information in order to perform as many calculations as possible in C, using C native types, without accessing libpython.
They already claim a 300% speed boost, but with those type inferencing things, it will probably go a lot of further... Can't wait for this project to mature.
I guess compiling to WebAssembly should be trivial then?
This looks pretty cool. A better summary at the top would be nice to new users. I had to scroll down a bit, and read few lines until I realized it translates Python to C, and then uses CC to do final compilation (it is also quite ambiguous if executable is native or interpreted, until you read that part).
I'd be looking more into this, but anyone else knows any other dynamically typed language which compiles to native code, not JITed or bytecode (I remember there were few Lisps out there).
Humorously, https://cliki.net/Python - "Python is the name of a free high-performance compiler originally developed at Carnegie-Mellon University for CMU Common Lisp, it is now used by SBCL ..."
There are some for Perl in various degrees of emitting bytecode to actually optimizing (perlcc, RPerl). I think others prefer emitting to Java or making their own JITs.
We have an internal tool that autoformats code (it calls clang for C++, some other tool for C#, and autopep8 for python). But for Python I did not want to rely on python, but have it a separate standalone .exe that can run by itself (almost, still needed python38.dll). So nuitka helped me there:
I wrote a small app, which is simply:
autopep8tool.py:
import autopep8; autopep8.main()
Then had this .bat file to do the .exe for me:
autopep8tool.make_exe.cmd:
@echo off
pushd %~dp0
call "%VS140COMNTOOLS%\..\..\VC\bin\amd64\vcvars64.bat"
p4 edit ..\tools\autopep8tool.\*
p4 edit ..\tools\python38.dll
for /d %%i in (*.dist *.build) do rd %%i /s /q
echo import autopep8; autopep8.main() > autopep8tool.py
call nuitka --standalone --assume-yes-for-downloads --unstripped --windows-dependency-tool=pefile autopep8tool.py
copy autopep8tool.dist\autopep8tool.\* ..\tools
copy autopep8tool.dist\python38.dll ..\tools
for /d %%i in (*.dist *.build) do rd %%i /s /q
popd
works great so far, and even that the produced .exe is 10mb + 4mb python38.dll, it's great logistical saver (IMHO), as I can rotate the tool to other teams, depots, without requiring proper python (which is even trickier on windows). It's probably not completely sandboxed (hermetic), but works well so far.
calling autopep8.exe --help even gives the right help.
So great tool (though I've been told about others, like google's subpar, etc.)
Do you find it increases the maintenance burden? Since now a previously 1-character change requires a massive recompilation, and updates to Python (which happen frequently...) also require recompilation.
Recompilation is the norm for compiled languages so this doesn’t seem like it would be very onerous or have any kind of substantial effect on maintainability.
I've only had to compile it once, unless we change python drastically and/or want autopep8.py changes (I would think this does not happen often, and if it does it also means we'll have to reformat our code).
Granted, my case is really not general one, but nuitka fit well there (like I really wanted to avoid another 'freeze' tool that unpacks python to a temp folder, and calls it from there just to run autopep8.py).
That to be said, I was not able to get nuitka working with some of the other popular format lib (I'll need to look again, it takes a bit of setup)
you could just set up a file system listener and then trigger recompiles at some arbitrarily decided-upon interval of time. Doesn't seem like a huge pain otherwise.
The problem with this method is that the user still needs to have Python on their system. Pyinstaller (Py2exe, cx_Freeze) bundle the python interpreter along with your app.
I'm one of the rare idiots who wrote a consumer-faced python+Qt program. I need to ship the right dependencies along with the right version of python to my customers.
Using Pyinstaller it works out in the end, but after everything, it would have been better to use a language meant for user-facing applications.
I need to, because otherwise I wouldn't be able to package it for Windows users.
The app is also closed source.
I cannot easily package the application for Linux users, because of so many different flavours of Linux, so many different combinations of Python and Qt versions.
Which is why I've given up on a pure Linux version and I'm in the middle of adapting it to work under WINE alongside Elite: Dangerous also running under WINE, instead.
Many systems still have both Python 2 and Python 3 installed. Also, in the standard case, the version of Python is controlled by the user, not by your app. Bundling it with the app lets you make sure that it's running Python 3.8, for example, so that you can use Python 3.8 features without having to add a bunch of compatibility shims in case the user is running an older version.
Even better: use OS packages. They allow listing what is installed on a host, remove it cleanly, do atomic updates, ship manuals, configuration files, systemd unit files, implement sandboxing and so on.
GvR and the old boys currently present themselves as CoC proponents, and strategically use the CoC against anyone who criticizes them.
Compare with his current Microsoft sponsored activities. If anyone would use that language against "his" (i.e. Mark Shannon's) project, they'd be out and suffer public defamation in no time.
I don't love the guy or anything. But you're pulling pieces together from different contexts over almost 10 years to tell your story.
The quote was about a presentation in 2013, not the project as it is now. Guido's complaint is mostly that they were equating "compiled" with "performance" and were not rigorous in backing that up. The presenters may have been young. But I'd have made the same complaint. The blog thing was just snarky.
That was a period of time when he has admitted he wasn't doing so well as a human being and this is pretty low on the snark-meter compared to other BDFLs.
The project page as it is today says very little about performance 'wins' as I read today. In fact, it points out losses, in the same vein as the RedHat issue with Python linked to libpython posted the other day here. Their point doesn't seem to be "make your code go faster", but as an easier mechanism of distribution for some cases.
I'm not getting the "his" shot. I haven't seem him take credit for the work, just that headlines do because journalists are lazy. I may have missed something.
But the MS team's activities today are things he would have dismissed years ago. He's learned and changed his mind. If he was completely consistent with himself from 2013, I'd think he wasn't that bright. (I'll never understand those kinds of arguments about politicians)
You're drawing a line from criticism of a 2013 paper, through his current career, to a malicious intent or conspiracy. That's a lot more defamatory than a valid (but a little rude) critique. You're attacking a person's character, not their work output.
contexts, events, whatever. It doesn't matter. If Kay had listened to any of them, the Python ecosystem would be lacking its best packaging tool, and a significant optimizing tool as well.
That is pretty unacceptable.
> In fact, it points out losses,
Because they have cleared away enough chaff to start encountering these problems with the design of the cpython code itself.
There isn't much point in saying X is faster if significant harassers in the python community will make a point of walking out of your talks and later belittling your work online, right?
I don't think so. Call me cynical, but I suspect that Guido went where the money was (one does not simply go out of retirement). Otherwise there would be an apology to the projects he's been consistently berating for years.
Good thing Kay Hayen has the perseverance and quite mad planning skills. Today we have a compiler that works better with each release and is capable of delivering results today, while van Rossum's work at Microsoft as of now is largely vaporware I don't have very high hopes for.
install Nuitka with no problems and compiled "hello world", also with no problems.
It created a 4,060,104 byte hello.bin with no symbol table, so no need to run strip to make it smaller, which again worked like a breeze.
(The developers say on GutHub they will work on smarter ways to keep the dependency list small, which can at present escalate for modules like pandas that themselves have >1000 dependencies.)
I remember this being promising but struggling with a dependency that was using entry points to dynamically load plugins, and I couldn't get Nuitka to register that. Seeing this is a good reminder to have another shot at it.
pyinstaller doesn't compile your code. instead, it bundles a copy of the python interpreter along with your code, and sets everything up with an executable header so you can double-click the file to run your code within the embedded interpreter.
I suppose if my target is on x version of y OS then I'll have to compile on that machine, yeah? Sorry, I don't have much experience creating these sort of packages for distribution, but have recently found the need.
Have used on home projects couple of years ago to compile code on macOS and run on Windows and on Debian(on a pi). Worked well. Looking forward to retrying nuitka again once my current project gets to point I've finished the firmware and get back to working on middleware.
Def worth a look for anybody that wants to distribute python developed system without installing python or the inevitable many dependencies of projects. It just makes distribution a whole lot more predictable.
And to the reader who asked about GraalPython its on my todo list - have it installed but life, children, and work keep getting in the way.
I have a project that pyinstaller can make into a 12 MB single file. Docker turns that same project into a ~400 MB container. This is not considering the Docker install footprint.
I looked into the this question for Graal Nodejs and I think it has the same problem in Python...
Basically Graal Python and Nodejs each provide a custom interpreter for the target lang, the main goal of which is to provide interoperability with the Graal polyglot ecosystem. So you can run your python code under the GraalPython interpreter and it will run JITed fast and can import libs from other Graal-supported langs.
But as far as outputting executable binaries, Graal only provides that for JVM projects and LLVM languages like C/C++/Rust.
So it's not impossible, but you have to build your own Java wrapper project that loads the Graal Python interpreter class in code and then runs your python lib inside that.
I expect eventually that boilerplate step can be automated as part of the Graal build tools.
I love Python so much. It's my favourite language. It's so easy.
I don't even use the dynamic nature of Python - monkey patching etc.
I wish more people wrote algorithms in Python for understandability then compiled them into a compiled language for performance. I find that Python resembles pseudocode because it is syntactically sparse, unlike C++. In Python I don't need to worry about ownership or memory. It probably needs Pointers to be a low level system language.
I used to take your view, but actually there are plenty of languages (e.g. ML-family languages) where you get the best of both worlds - light, clean syntax, but also type safety and decent performance.
I experimented with OCaml and some of my Python code. I found that using Python with Numba enabled me to get much better performance than the same algorithm written in OCaml.
In this case, using Numba for the performance-critical aspect was trivially simple, just the addition of a decorator. And I've found that is generally pretty close to true, at least for the kinds of things where I care about performance.
That's why I'm still in the Python world rather than the OCaml world for my algorithmic work. I really like OCaml as a language and would actually prefer to use it, but Python/Numba seems to give me substantially better performance, as well as all of Python's standard libraries.
And Numba lets you turn off the GIL, so you can get multiple cores going at once.
YMMV, of course. I'm not making a general claim that would be true for everyone. But I think it may be worth mentioning that it could be true for more uses than one might assume.
I'm sympathetic to this point of view. But then there is 25 yrs of stdlib that's tattooed onto people's brains and stackoverflow.
A pure python implementation of python stdlib ready be transpiled could be that bridge.
Also it's possible to support multiple source syntaxes in such an ecosystem. ML based, python based or even a hybrid. In the end all I need is a well documented AST.
One of the main reasons people choose Python is the huge stdlib and vast package ecosystem. This is difficult to ignore if you need to do some very heavy lifting.
Personally I moved to Scala, which has a poor reputation because of some symbol-heavy libraries, but you can use it to write very clean Python-like code. Version 3 has actually added a more Python-inspired syntax with colons and indentation instead of braces.
Standard ML was where I started. As a language it's great, as an ecosystem it's limited. OCaml seems to be where the action is (and even then its packaging/dependency management isn't great - but then it can't be worse than Python's).
F# has a very good reputation but I haven't used it a lot myself.
I use JetBrains PyCharm with type annotations in my code. I run the type-checker/linter ("Inspect Code") every 10 minutes or so, it's muscle-memory at this point. It covers most cases, problem resolved.
You fundamentally cannot build a statically typed variant of Python without cutting portions of the stdlib, most notably `mock.patch`. You could do a good majority of it, probably enough to be completely usable, but at what point does it stop being Python? Go use Nim if you want statically typed Python-ish syntax.
py2many transpiles py3 to nim. Think of python as a way to interpret nim code if you like, with a large installed base of existing libraries.
There is demand for iterative creation of software: first get the logic out with the least friction and then think about types, resource leaks, good software engineering practices etc.
Note that type safety typically means better tooling that what Python people are used to. If you’ve only used Python you have no idea what you’re missing out on with respect to refactoring, editor integrations, documentation generation, etc. And then there is the actual documentation for humans that is guaranteed to be correct and up to date so long as your build is green.
Python does support parallelism, so long as you're calling into a C library to do the work, because they (usually) release the GIL while doing the work. That sounds like a cop out but using C libraries is very common to do CPU intensive work anyway. Examples include numpy (e.g. you can call np.dot from multiple threads in Python and they will genuinely use multiple cores) and all the rest of the scipy stack, and even modules in the standard library (e.g. when using zipfile, even to process an in-memory buffer, the GIL is released).
if you call np.dot on the same underlying ndarray from multiple threads, what are the semantics?
I don't know what they are but I think most people want a single invocation of np.dot in a single thread to use multiple worker threads (as described in https://scipy.github.io/old-wiki/pages/ParallelProgramming) by pushing the work to a thread-aware matrix library.
> if you call np.dot on the same underlying ndarray from multiple threads, what are the semantics?
Usually, this will be safe because np.dot only reads from its arguments, and produces a fresh new array to hold the result.
I just checked the documentation and found there is an out=argument to pass a destination array. (In any case there are other numpy functions that do modify their argument.) If you pass the same array to that parameter in two different threads then I don't know what happens. Maybe numpy locks the GIL to write its output. Maybe it produces C-style undefined behaviour (this would be my guess ... and my preference).
> I think most people want a single invocation of np.dot in a single thread to use multiple worker threads
Maybe? I'm not convinced that's what I want! But it's pretty irrelevant either way to my point; np.dot was just an example. My real point is that lots of CPU-heavy Python functions are implemented in C and release the GIL, so in practice parallelism with Python is often possible.
99% of people want this: np.dot and others use all the cores on your system by default. This accelerates the common use case, which is individual user on a multicore machine who wants single result ASAP. Then, there's an env var that lets you configure the underlying matrix library to use only a single (or maybe another number) of cores, and you work with the multithreaded/multiprocess libraries to run many independent calculations (possibly on shared read-only source data); then that scheduler has a tunable set to saturate the cores or IO on the system.
This is what most BLASses use and it's how data analysts and supercomputer folks can get along.
To reiterate: None of this contradicts my main point that numpy.dot releases the GIL, which it does even if it calls into a BLAS library that uses multiple cores (although of course that would then have its own global lock - potentially even an interprocess lock - to manage concurrent uses). numpy.dot was only meant to be an example in the first place! I could just as easily have talked about some optimisation routine in scipy or something in the standard library (actually I did mention ZipFile.extract).
> I've worked in this field awhile.
This statement undermines your point more than it supports it. There is no "this field" for numpy. Just working in some field that uses it is no qualification for knowing whether users as a whole would like it multithreaded automatically. "Data analysts" and "supercomputer folks" no doubt include many users but certainly not all and possibly not even most (for all we know).
> 99% of people want this: ...
I can believe this might be true. I don't think I ever claimed otherwise. In fact I said "maybe"! I only claimed that I don't want this.
I will make one more claim though: I don't think there's any way that you could know this, even if it's really true.
But yes, I agree and already knew that it's how BLAS libraries work by default, including OpenBLAS which is what the prebuilt numpy wheels use. Even then, as you're probably already aware by the sounds of it, that's only for sufficiently large matrices. If you're just using small matrices and vectors (e.g. in 3D to represent real-world coordinates) then processing will be done directly in the calling thread (because the overhead of using multiple threads would be greater than the benefit).
No need to downvote a request for explanation; this is within HN rules. Otherwise, you're giving an ambiguous signal so I don't know whether you're unhappy with the current state of Python, or disagree with my technical statement (in which case, please provide a counterargument instead of a vote).
Requesting explanation of downvotes is among the things the policy against discussion of the votes on a post is directed at, and, IMO, also demonstrates a fundamental failure to understand (or respect) the purpose of downvotes, which is about managing signal-to-noise ratio by, among other things, not polluting threads with meta-level discussion (abd invariably debates) of why a particular comment is (or is not) inappropriate.
I have 11,659 karma. I'm not bragging but that means I have a pretty good understanding and respect for the purpose of downvotes (because I learn from them how to make more upvotes). Nothing is being polluted here, just a reasonable request for clarification (for example I would love to learn that I was factually wrong!). I'm guessing the downvote signal was actually somebody agreeing with me and showing displeasure with the problem, or believing I was technically incorrect, or gauche to be pointing out the problem in the first place.
"Python does support parallelism, so long as you're calling into a C library to do the work"
Actually, you can use Numba to get the same effect while still writing Python. Numba allows you to simply apply a decorator to critical Python code and it is JITted into C-like performance. I do this in my code, and it has worked great for my uses.
Not really. You can't build large data structures (not arrays, but trees, graphs, etc) with structural sharing between threads. Unless of course you translate all of your data to C too, at which point you are really doing everything in C (data and code).
OK, let's put it like this: In some contexts, but not all, Python supports a useful amount of parallelism.
I have definitely worked on Python projects in the past that were easily parallisable and worked absoluately fine. I simply created threads with threading.Thread, passed messages around with queue.Queue and used some standard Python modules to process those messages. The messages were independent enough from each other that I didn't have any issues with marshalling access to them, while being large enough that the overhead of mulithreading was still outweighed by its benefits. Rewriting the whole thing in C++ (etc.) would have made negligable difference to performance.
I can certainly imagine a program similar to what you describe, where there are lots of messages that operate on a single monolythic data structure, which couldn't easily be made to work with true parallelisation in Python.
> You can't build large data structures (not arrays, but trees, graphs, etc) with structural sharing between threads.
You probably do not want to do that in the first place. The reason that the GIL still exists is that the overhead of fine-grained locking for Python datastructures would negate the speedup from multithreading.
It's not specific to Python. If you want to do fine-grained "parallelism" on a set of complex shared datastructures, your cores might look busy, but really you're just wasting most of the time locking/unlocking stuff. That's not necessarily going to result in a speedup over a lock-free single-threaded implementation (that you didn't implement to compare against).
In the case of Python, the single-threaded implementation was already there, the implementation that removed the GIL in favor of fine-grained locking failed to deliver the goods.
If you want to actually take advantage of parallelism, you want data structures that are amenable to it, which is basically sections of arrays without a lot of potential read/write conflicts over multiple threads. If you're at that point, you probably want to use C/C++/Rust anyway, you don't want speed up something that's already 100x slower (the interpreter) by parallelizing it. Python offers that with C-extensions like NumPy.
It doesn't matter if you're reading or writing, if there's a potential conflict you need to lock, or you need some complex and probably buggy "lock-free" data structure. The complexity and memory overhead of a concurrent GC just goes on top of that.
I understand that you (and I for that matter) can't stop using python altogether rn. Just wanted to point out that these two desirable qualities where build into Julia from the get go.
Julia just isn't a replacement for Python yet. Recently did a deep dive researching various capabilities that would make Julia at least useful as a microservice/RPC target for a more I/O heavy language like Python. gRPC support is really poor. There's some OpenAPI/Swagger tooling, but it's not great. Likewise for ZeroRPC. The best candidate is probably the WebApi library [0] but even then, it does not inspire a ton of confidence for production. From cruising github issues and the Julia forums, nothing in this space (rpc/io/microservice) feels really bulletproof. Combined with no solid native executable support, it's just a nonstarter for me.
My gestalt sensation is the community is still too small, the tooling too unpolished, to be ready for anything production-worthy. If there were a bulletproof 0rpc library, I think it would go a huge way towards growing the community and mindshare through successive approximation. But the other problem is many of the folks on the forum seem totally disinterested in this problem.
Would support for parallelism have benefits other than increasing speed? If not, I'd rather see an increase in Python's single-threaded performance - which is what Guido and his colleagues at Microsoft are currently working on.
The problem with interpreting the Python way, using a switch loop ceval.c is that there is plenty of overhead with that loop. Branch prediction fails. It could be every opcode is compared before reaching the actual opcode in the switch.
I wonder if Python will ever take the JVM approach of having a TemplateInterpreter and mapping each bytecode to assembly directly. Of course you would have to do it for each platform.
I've tried multiprocessing but I found it buggy for my use case.
I tried to create a worker queue and submit items to worker threads with JoinableQueue but eventually the system deadlocks and no progress is made. I'm not sure what the problem is. It never finishes.
It's a sentence correlator using multiprocessing - it generates correlations of words in your sentences.
Out of curiosity, is using multiprocessing.Pool’s map/etc.() methods not an option? It’s a higher-level API which would remove the need to manually manage workers/a queue.
yes, for many of us, we really do want a CPython that runs multiple threads of the same interpreter, with memory sharing. That said after decades of CPython not supporting it, I've rewritten most of my code to use multiprocessing with granular parallelism.
dill-compat Required by the dill module
eventlet Required by the eventlet package
gevent Required by the gevent package
multiprocessing Required by Python's multiprocessing module
numpy Required for numpy, scipy, pandas, matplotlib, etc.
pmw-freezer Required by the Pmw package
pylint-warnings Support PyLint / PyDev linting source markers
qt-plugins Required by the PyQt and PySide packages
tensorflow Required by the tensorflow package
tk-inter Required by Python's Tk modules
torch Required by the torch / torchvision packages
* FBS (FMan's Build System) - worked well for a while, but we couldn't upgrade Qt past a certain version and he version we used was only happy with Python 3.6.
* Beeware - our current solution. Very, very happy with this; works smoothly, lets us use the latest Python and Qt versions and lets us make professional-looking installers that we can distribute.