Annoying Things about Python

There's a lot of commentary out there on whether Python is good or bad and what's good and bad about it and so on. The thing is, when you want to grab data from some hardware that was never meant for what you're using it for, do some on-the-fly DSP you've just made up in the last few days on it, shoot it across a network, save it, and then analyze it properly later, you're pretty much out of luck with anything else. That is, R and Julia (and Matlab, Octave and a few others, I guess, but I try never to touch them) are great for the analysis part, and you can tie them in with C (or anything that'll produce C bindings) and Python code, but every language has its quirks, and having to think about two or three languages at once is far too distracting when the work you're trying to do is fundamentally challenging. Python will kind of work for all this stuff and has worked well enough for a long time, so there's a lot of good software out there to help. How good or bad the core language is almost doesn't factor into it anymore.

Anyway, this kind of means that I'm locked into Python, and while I'll occasionally write a high-performance module in C for some obscure transformation that isn't in NumPy/SciPy already, I pretty much never touch any other languages. All in all, I don't miss the days when I was writing this stuff in C++ and had to read ten pages' worth of compiler errors every time I messed up a template parameter. That doesn't mean Python is all that great, though. Presented here are a few things I find annoying, in no particular order, so I can get them off my chest.

Confusing Syntax of Comprehensions

The comprehension syntax really is all over the place in terms of which clause can go where and how many of them there are:

>>> mat = [[1, 2, 3], [4, 5], [6]]
>>> [ [ i for i in row ] for row in mat ] # cool, makes sense
[[1, 2, 3], [4, 5], [6]]
>>> [ i for row in mat for i in row ] # wait, we're flipping the fors now? Why?
[1, 2, 3, 4, 5, 6]
>>> [ i for row in mat if len(row) < 3 for i in row ] # sure, why not?
[4, 5, 6]
>>> [ i for row in mat for i in row if len(row) < 3 ] # probably slower, but okay
[4, 5, 6]
>>> [ i for row in mat if len(row) < 3 for i in row if 4 < i < 6 ] # fine
[5]
>>> [ i for row in mat if len(row) < 3 for i in row if 4 < i if i < 6 ] # two ifs per for? Why?
[5]

Multiple ifs per for seems totally useless, and you know, "There should be one-- and preferably only one --obvious way to do it." and the logical and is that obvious way. Still, that's a feature I can comfortably ignore. The for order flipping between nested and flat comprehensions is annoying as hell, and there's really no reason for it.

Unnecessary Inconsistencies in Assignments

These are just annoying and the asterisks mean way too many things:

>>> numbers = 1, 2, 3; numbers
(1, 2, 3) # tuple, got it (based on what's right of the =)
>>> a, *rest = 1, 2, 3; a, rest
(1, [2, 3]) # number and list (based on the * on the left of the =)

>>> *numbers, = 1, 2, 3; numbers
[1, 2, 3] # okay, okay

>>> (a, *rest) = 1, 2, 3; a, rest
(1, [2, 3]) # okay, that works, too

>>> def f(*args): print(args)
...
>>> f(1, 2, 3)
(1, 2, 3) # wait, it's a tuple now?

>>> def f(**kwargs): print(kwargs)
...
>>> f(a=1, b=2)
{'a': 1, 'b': 2} # okay, so there's something similar for mappings

>>> **mydict, = a=1, b=2
File "<input>", line 1
**mydict, = a=1, b=2

^
SyntaxError: invalid syntax # but only for assigning to function arguments...

I can almost hear all the complaints about how a comma defines a tuple and how you can have multiple assignments, so you can't just stick argument-style equal signs anywhere, etc. Those were decisions somebody made at some point, though, rather than fundamental limitations. What if direct assignment and function-argument assignment had the same semantics? Why shouldn't we be able to write this

def xydist((x0, y0), (x1, y1)):

return ((x1 - x0)**2 + (y1 - y0)**2)**0.5

xydist(mypt0, mypt2) # with mypt0 and 1 having two elements

or this

((a0, a1), b, *args, c=7, d=8, **kwargs) = ((1.0, 1.1), 2, 3, 4, *[5, 6], x=8, **{ 'c': 77 })

# a0 = 1.0, a1 = 1.1, b = 2, args = (3, 4, 5, 6),

# c = 77, d = 8, kwargs = {'x': 8}

and raise errors when the bit doing the assigning and the one being assigned to don't match up?

It's the same with slices. Why isn't start:stop:step just univerally an alias for slice(start, stop, step)instead of just in indexing? We could have things like range(:10:2) instead of the three-argument version so it's consistent with slicing. numpy.r_[] and numpy.c_[] and numpy.mgrid[], which are meant to be used as functions, could actually be called as functions.

The Index (Item) Operator Doesn't Take Keywords

Regarding dictionaries, assuming you didn't know a thing about Python, which of these would seem more obvious:

mydict.get(mykey, something_else) or mydict[mykey, default=something_else]

or worse still:

mydict.get(mykey) or mydict[mykey, default=None]

or maybe:

mydict.setdefault(mykey, []).append(value) or mydict[mykey, default=[], set=True].append(value)

? There is a correct answer, and it's the one that's a syntax error in Python. How about something a bit more in my field. Which of these would you expect not to make a copy of a matrix's data:

matrix.flat[:7] or
matrix.ravel()[:7] or
matrix.flatten()[:7] or
matrix[:7, index='flat']

? The first three are all in NumPy, by the way, and they all do slightly-different things, and I'm pretty sure at least one copies everything. There is, as far as I know, no way to do flat indexing on an n-D array without creating some sort of helper object.

From a NumPy-centric point of view, I'd even advocate for full-on function-calling semantics for indexing, but then mydict[1,] and mydict[1] would have to mean the same thing and I'm not sure I like that.

Operations on Functions are too Wordy

Say we have two functions f and g that we want to compose. We have to write

gf = lambda x: g(f(x))

which isn't bad at all, although if there were more than two functions, you do get this "))))" sort of effect at the end. Also, isolated lambdas aren't recommended by PEP8, but let's go with it. Now, we want to pass some parameters to f and g, too. No big deal:

gf = lambda x: g(f(x, a, b), c)

Now, maybe those parameters are computed somehow:

gf = lambda x: g(f(x, a(), b()), c()).

We have a problem because we don't want to recompute the parameters. Instead of saving three temporary variables, we bring in functools.partial:

f_ = functools.partial(f, a=a(), b=b())

g_ = functools.partial(g, c=c())

gf = lambda x: g_(f_(x))

or maybe make a new function whose default arguments we never intend to change:

def gf(*, a=a(), b=b(), c=c()):

return g(f(x, a, b), c)

but now the parameters aren't defined where they're used (that is, they're not meaningfully bound to the function). We could also decorate the parameter-computing functions with functools.lru_cache, which is overkill for function whose arguments don't change, or we could manually cache the results, or a bunch of other things that are way too hard.

Now, imagine we could define partials with [] and compose functions somehow, say with a new binary -> operator:

gf = f[a=a(), b=b()] -> g[c=c()]

Note that this is backwards to how composition is defined in math, where "

g \circ f " means " f then g

", but I'm sticking with it because I think writing some things from right and to left and others from left to right is kind of dumb.

The only other common function manipulation is to rearrange the arguments somehow. For example,

f.transpose(1, 2)

that works like

lamba a1, a2, *args: f(*args[:1], a1, *args[2:2], a2, *args[3:])

and then we can do stuff like

gf = f.transpose(1, 2)[a(), b()] -> g.transpose(1)[c()]

which more or less covers situations where passing keywords is a no-go.

If this feels unnecessary, imagine a signal-processing pipeline with some twenty stages that keep changing, each of which takes parameters that need to be computed, and otherwise takes in one argument and returns one result. There is no succinct way to describe the pipeline right now. You end up writing pipeline objects that are glorfied lists of callables and all sorts of other things. It could all be made easier if functions were just a bit less rudimentary.

The Syntax for Empty Collections is Inconsistent

x = () is a tuple, but x = (1) isn't and x = (1,) is a tuple and x = , and x = (,) are syntax errors. (e for e in ...) is a generator, but it's actually just that this expression cannot exist without delimiters on each side.

x = [] is a list and x = [1] is a list and x = [1,] is a list and x = [,] is a syntax error. [e for e in ...] is a list.

x = {} is a dict and x = {1} is a set and x = {1,} is a set and x = {,} is a syntax error. { e for e in ... } is a set. { k: v for k, v in ... } is a dict.

We can't have (...) denoting anything other than order of precedence, which is fine, so, if anything, (,) should denote the tuple because the comma is the only thing that makes a tuple a tuple in any other context. An alternative might be to use optional <> to denote tuples, then <> becomes an empty tuple and <e for e in ...> returns a tuple, tup = x, becomes the far less confusing tup = <x>, func((x,)) becomes func(<x>), etc.

The list notation is consistent throughout and a set would be too if {} denoted a set, which it should. {:} could denote a dict so as to at least be consistent with the comprehension syntax.

I would also be happy with every sequence-like thing being (,), [,] and {,} and the dict being {:}.

pass is Redundant and Not Well Named

At some point, someone decided that ... should be a thing. You can assign it to a variable, and you can put it on a line all on its own just like pass, and it means about as much to someone who doesn't know anything about Python. In fact, I'd argue that seeing an ellipsis somewhere is more of a visual hint that some code is a placeholder than the word pass. I'm not actually sure what the ellipsis is really supposed to be for. I know you can stick it in an indexing argument in NumPy to mean "and the rest" but it's mostly useless.

Final Thoughts

Like anything else that sees constant use, it's easy to focus on the problems and take all the good for granted. Maybe one day something better will come along or grow out of Python, but as flexible, easy-to-use programming languages go, it's about as good as I'm likely to find.

Search This Blog

Musings about Programming and Data Science