Optional kwarg making attrgetter & itemgetter always return a tuple

Post by Masklinn
attrgetter and itemgetter are both very useful functions, but both have
a significant pitfall if the arguments passed in are validated but not
controlled: if receiving the arguments (list of attributes, keys or
indexes) from an external source and *-applying it, if the external
source passes a sequence of one element both functions will in turn
return an element rather than a singleton (1-element tuple).
This means such code, for instance code "slicing" a matrix of some sort
to get only some columns and getting the slicing information from its
caller (in situation where extracting a single column may be perfectly
sensible) will have to implement a manual dispatch between a "manual"
getitem (or getattr) and an itemgetter (resp. attrgetter) call, e.g.
slicer = (operator.itemgetter(*indices) if len(indices) > 1
else lambda ar: [ar[indices[0]])
This makes for more verbose and less straightforward code, I think it
would be useful to such situations if attrgetter and itemgetter could be
# works the same no matter what len(indices) is
slicer = operator.itemgetter(*indices, force_tuple=True)
which in the example equivalences[0] would be an override (to False) of
the `len` check (`len(items) == 1` would become `len(items) == 1 and not
force_tuple`)
The argument is backward-compatible as neither function currently
accepts any keyword argument.
Uncertainty note: whether force_tuple (or whatever its name is)
silences the error generated when len(indices) == 0, and returns
a null tuple rather than raising a TypeError.
[0] http://docs.python.org/dev/library/operator.html#operator.attrgetter

This seems like a plausible idea. The actual C version requires one
argument. The Python equivalent in the doc does not (hence the different
signature), as it would return an empty tuple for empty *items.

--
Terry Jan Reedy

Steven D'Aprano

2012-09-14 01:20:38 UTC

For those who, like me, had to read this three or four times to work out
what Masklinn is talking about, I think he is referring to the fact that
attrgetter and itemgetter both return a single element if passed a single
index, otherwise they return a tuple of results.

If a call itemgetter(*args)(some_list) returns a tuple, was that tuple
a single element (and args contained a single index) or was the tuple
a collection of individual elements (and args contained multiple
indexes)?

py> itemgetter(*[1])(['a', ('b', 'c'), 'd'])
('b', 'c')
py> itemgetter(*[1, 2])(['a', 'b', 'c', 'd'])
('b', 'c')

Post by Masklinn
This means such code, for instance code "slicing" a matrix of some sort
to get only some columns and getting the slicing information from its
caller (in situation where extracting a single column may be perfectly
sensible) will have to implement a manual dispatch between a "manual"
getitem (or getattr) and an itemgetter (resp. attrgetter) call, e.g.
slicer = (operator.itemgetter(*indices) if len(indices)> 1
else lambda ar: [ar[indices[0]])

Why is this a problem? If you don't like writing this out in place, write
it once in a helper function. Not every short code snippet needs to be in
the standard library.

Post by Masklinn
This makes for more verbose and less straightforward code, I think it
would be useful to such situations if attrgetter and itemgetter could be

-1

There is no need to add extra complexity to itemgetter and attrgetter for
something best solved in your code. Write a helper:

def slicer(*indexes):
getter = itemgetter(*indexes)
if len(indexes) == 1:
return lambda seq: (getter(seq), ) # Wrap in a tuple.
return getter

--
Steven

Masklinn

2012-09-14 07:43:38 UTC

Why is this a problem?

Because it adds significant complexity to the code, and that's for the
trivial version of itemgetter, attrgetter also does keypath resolution
so the code is nowhere near this simple.

It's also anything but obvious what this snippet does on its own.

Post by Steven D'Aprano
If you don't like writing this out in place, write
it once in a helper function. Not every short code snippet needs to be in
the standard library.

It's not really "every short code snippet" in this case, it's a way to
avoid a sometimes deleterious special case and irregularity of the stdlib.

Post by Masklinn
This makes for more verbose and less straightforward code, I think it
would be useful to such situations if attrgetter and itemgetter could be

-1
There is no need to add extra complexity to itemgetter and attrgetter for
something best solved in your code.

I don't agree with this statement, the stdlib flag adds very little
extra complexity, way less than the original irregularity/special case
and way less than necessary to do it outside the stdlib. Furthermore, it
makes the solution (to having a regular output behavior for
(attr|item)getter) far more obvious and makes the code itself much simpler
to read.

Steven D'Aprano

2012-09-14 09:02:54 UTC

Why is this a problem?

Because it adds significant complexity to the code,

I don't consider that to be *significant* complexity.

Post by Masklinn
and that's for the
trivial version of itemgetter, attrgetter also does keypath resolution
so the code is nowhere near this simple.

I don't understand what you mean by "keypath resolution". attrgetter
simply looks up the attribute(s) by name, just like obj.name would do. It
has the same API as itemgetter, except with attribute names instead of
item indexes.

Post by Masklinn
It's also anything but obvious what this snippet does on its own.

Once you get past the ternary if operator, the complexity is pretty much
entirely in the call to itemgetter. You don't even use itemgetter in the
else clause! Beyond the call to itemgetter, it's trivially simple Python
code.

slicer = operator.itemgetter(*indices, force_tuple=flag)

is equally mysterious to anyone who doesn't know what itemgetter does.

Post by Steven D'Aprano
If you don't like writing this out in place, write
it once in a helper function. Not every short code snippet needs to be in
the standard library.

It's not really "every short code snippet" in this case, it's a way to
avoid a sometimes deleterious special case and irregularity of the stdlib.

I disagree that this is a "sometimes deleterious special case". itemgetter
and attrgetter have two APIs:

itemgetter(index)(L) => element
itemgetter(index, index, ...)(L) => tuple of elements

and likewise for attrgetter:

attrgetter(name)(L) => attribute
attrgetter(name, name, ...)(L) => tuple of attributes

Perhaps it would have been better if there were four functions rather than
two. Or if the second API were:

itemgetter(sequence_of_indexes)(L) => tuple of elements
attrgetter(sequence_of_names)(L) => tuple of attributes

so that the two getters always took a single argument, and dispatched on
whether that argument is an atomic value or a sequence. But either way,
it is not what I consider a "special case" so much as two related non-
special cases.

But let's not argue about definitions. Special case or not, can you
demonstrate that the situation is not only deleterious, but cannot be
reasonably fixed with a helper function?

Whenever you call itemgetter, there is no ambiguity because you always know
whether you are calling it with a single index or multiple indexes.

Post by Masklinn
This makes for more verbose and less straightforward code, I think it
would be useful to such situations if attrgetter and itemgetter could be

-1
There is no need to add extra complexity to itemgetter and attrgetter for
something best solved in your code.

I don't agree with this statement, the stdlib flag adds very little
extra complexity, way less than the original irregularity/special case

Whether or not it is empirically less than the complexity already there in
itemgetter, it would still be adding extra complexity. It simply isn't
possible to end up with *less* complexity by *adding* features.

(Complexity is not always a bad thing. If we wanted to program in something
simple, we would program using a Turing machine.)

The reader now has to consider "what does the force_tuple argument do?"
which is not necessarily trivial nor obvious. I expect a certain number of
beginners who don't read documentation will assume that you have to do this:

slicer = itemgetter(1, 2, 3, force_tuple=False)

if they want to pass something other than a tuple to slicer. Don't imagine
that adding an additional argument will make itemgetter and attrgetter
*simpler* to understand.

To me, a major red-flag for your suggested API can be seen here:

itemgetter(1, 2, 3, 4, force_tuple=False)

What should this do? I consider all the alternatives to be less than
ideal:

- ignore the explicit keyword argument and return a tuple anyway
- raise an exception

To say nothing of more... imaginative... semantics:

- return a list, or a set, anything but a tuple
- return a single element instead of four (but which one?)

The suggested API is not as straight-forward as you seem to think it is.

Post by Masklinn
and way less than necessary to do it outside the stdlib. Furthermore, it
makes the solution (to having a regular output behavior for
(attr|item)getter) far more obvious and makes the code itself much simpler
to read.

The only thing I will grant is that it aids in discoverability of a
solution: you don't have to think of the (trivial) solution yourself, you
just need to read the documentation. But I don't see either the problem
or the solution to be great enough to justify adding an argument, writing
new documentation, and doubling the number of tests for both itemgetter and
attrgetter.

--
Steven

Masklinn

2012-09-14 09:29:47 UTC

Post by Masklinn
and that's for the
trivial version of itemgetter, attrgetter also does keypath resolution
so the code is nowhere near this simple.

It takes dotted paths, not just attribute names

Post by Masklinn
It's also anything but obvious what this snippet does on its own.

Once you get past the ternary if operator, the complexity is pretty much
entirely in the call to itemgetter. You don't even use itemgetter in the
else clause! Beyond the call to itemgetter, it's trivially simple Python
code.
slicer = operator.itemgetter(*indices, force_tuple=flag)
is equally mysterious to anyone who doesn't know what itemgetter does.

I would expect either foreknowledge or reading up on it to be obvious
in the context of its usage.

Post by Steven D'Aprano
If you don't like writing this out in place, write
it once in a helper function. Not every short code snippet needs to be in
the standard library.

It's not really "every short code snippet" in this case, it's a way to
avoid a sometimes deleterious special case and irregularity of the stdlib.

I disagree that this is a "sometimes deleterious special case". itemgetter
itemgetter(index)(L) => element
itemgetter(index, index, ...)(L) => tuple of elements
attrgetter(name)(L) => attribute
attrgetter(name, name, ...)(L) => tuple of attributes
Perhaps it would have been better if there were four functions rather than
itemgetter(sequence_of_indexes)(L) => tuple of elements
attrgetter(sequence_of_names)(L) => tuple of attributes
so that the two getters always took a single argument, and dispatched on
whether that argument is an atomic value or a sequence. But either way,
it is not what I consider a "special case" so much as two related non-
special cases.

Which conflict for a sequence of length 1, which is the very reason
why I started this thread.

Post by Steven D'Aprano
But let's not argue about definitions. Special case or not, can you
demonstrate that the situation is not only deleterious, but cannot be
reasonably fixed with a helper function?

Which as usual hinges on the definition of "reasonably", of course the
situation can be "fixed" (with "reasonably" being a wholly personal
value judgement) with a helper function or a reimplementation of an
(attr|item)getter-like function from scratch. As it can pretty much
always be. I don't see that as a very useful benchmark.

Post by Steven D'Aprano
Whenever you call itemgetter, there is no ambiguity because you always know
whether you are calling it with a single index or multiple indexes.

That is not quite correct, even ignoring that you have to call `len` to
do so when the indices are provided by a third party, the correct code
gets yet more complex as the third party could provide an iterator which
would have to be reified before being passed to len(), increasing the
complexity of the "helper" yet again.

Post by Masklinn
This makes for more verbose and less straightforward code, I think it
would be useful to such situations if attrgetter and itemgetter could be

-1
There is no need to add extra complexity to itemgetter and attrgetter for
something best solved in your code.

I don't agree with this statement, the stdlib flag adds very little
extra complexity, way less than the original irregularity/special case

At no point did I deny that, as far as I know or can see.

Post by Steven D'Aprano
(Complexity is not always a bad thing. If we wanted to program in something
simple, we would program using a Turing machine.)
The reader now has to consider "what does the force_tuple argument do?"
which is not necessarily trivial nor obvious. I expect a certain number of
slicer = itemgetter(1, 2, 3, force_tuple=False)
if they want to pass something other than a tuple to slicer. Don't imagine
that adding an additional argument will make itemgetter and attrgetter
*simpler* to understand.
itemgetter(1, 2, 3, 4, force_tuple=False)
What should this do?

The exact same as `itemgetter(1, 2, 3, 4)`, since `force_tuple` defaults
to False.

Post by Steven D'Aprano
I consider all the alternatives to be less than
- ignore the explicit keyword argument and return a tuple anyway
- raise an exception
- return a list, or a set, anything but a tuple
- return a single element instead of four (but which one?)

I have trouble seeing how such interpretations can be drawn up from
explicitly providing the default value for the argument. Does anyone
really expect dict.get(key, None) to always return None?

Post by Steven D'Aprano
The suggested API is not as straight-forward as you seem to think it is.

It's simply a proposal to fix what I see as an issue (as befits to
python-ideas), you're getting way too hung up on something which can
quite trivially be discussed and changed.

The only thing I will grant is that it aids in discoverability of a
solution

It also aids in the discoverability of the problem in the first place, and
in limiting the surprise when unexpectedly encountering it for the first
time.

alex23

2012-09-14 09:41:43 UTC

# works the same no matter what len(indices) is
slicer = operator.itemgetter(*indices, force_tuple=True)

I'd be inclined to write that as:

slicer = force_tuple(operator.itemgetter(*indices))

With force_tuple then just being another decorator.

Nick Coghlan

2012-09-14 11:01:04 UTC

Both attrgetter and itemgetter are really designed to be called with
*literal* arguments, not via *args. In particular, they are designed
to be useful as arguments bound to a "key" parameter, where the object
vs singleton tuple distinction doesn't matter.

If that behaviour is not desirable, *write a different function* that
does what you want, and don't use itemgetter or attrgetter at all.
These tools are designed as convenience functions for a particular use
case (specifically sorting, and similar ordering operations). Outside
those use cases, you will need to drop back down to the underlying
building blocks and produce your *own* tool from the same raw
materials.

For example:

def my_itemgetter(*subscripts):
def f(obj):
return tuple(obj[x] for x in subscripts)
return f

I agree attrgetter is slightly more complex due to the fact that it
*also* handles chained lookups, where getattr does not, but that's a
matter of making the case for providing chained lookup (or even
str.format style field value lookup) as a more readily accessible
building block, not for making the attrgetter API more complicated.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan-***@public.gmane.org | Brisbane, Australia

Masklinn

2012-09-14 11:36:39 UTC

Post by Nick Coghlan

It was my understanding that they are also designed to be useful for
mapping (such a usage is shown in itemgetter's examples), which is
a superset of the use case outlined here.

Post by Nick Coghlan
If that behaviour is not desirable, *write a different function* that
does what you want, and don't use itemgetter or attrgetter at all.
These tools are designed as convenience functions

And save for one stumbling block, they are utilities I love for their
convenience and their plain clarity of purpose.

Oscar Benjamin

2012-09-14 13:23:53 UTC

Post by Nick Coghlan

It was my understanding that they are also designed to be useful for
mapping (such a usage is shown in itemgetter's examples), which is
a superset of the use case outlined here.

I can see why you would expect different behaviour here, though. I tend not
to think of the functions in the operator module as convenience functions
but as *efficient* nameable functions referring to operations that are
normally invoked with a non-function syntax. Which is more convenient out
of the following:

1) using operator
import operator
result = sorted(values, key=operator.attrgetter('name'))

2) using lambda
result = sorted(values, key=lambda v: v.name)

I don't think that the operator module is convenient and I think that it
damages readability in many cases. My primary reason for choosing it in
some cases is that it is more efficient than the lambda expression.

There is no special syntax for 'get several items as a tuple'. I didn't
know about this extended use for attrgetter, itemgetter. I can't see any
other functions in the operator module (abs, add, and_, ...) that extend
the semantics of the operation they are supposed to represent in this way.

In general it is bad to conflate scalar/sequence semantics so that a caller
should get a different type of object depending on the length of a
sequence. I can see how practicality beats purity in adding this feature
for people who want to use these functions for sorting by a couple of
elements/attributes. I think it would have been better though to add these
as separate functions itemsgetter and attrsgetter that always return tuples.

Oscar

Jim Jewett

2012-09-14 21:02:31 UTC

Received: from localhost (HELO mail.python.org) (127.0.0.1)
by albatross.python.org with SMTP; 14 Sep 2012 23:02:32 +0200
Received: from mail-lb0-f174.google.com (mail-lb0-f174.google.com
[209.85.217.174])
(using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
(No client certificate requested)
by mail.python.org (Postfix) with ESMTPS
for <python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>; Fri, 14 Sep 2012 23:02:32 +0200 (CEST)
Received: by lbbgj3 with SMTP id gj3so3061968lbb.19
for <python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>; Fri, 14 Sep 2012 14:02:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:cc:content-type;
bh=xLlFugDbU04i37uM0DmHYCTJE4LIVd2Sb/5sZ7Vg7Lw=;
b=ZQTtRjuhXSVf2J3k+W89lCmj3KstYLn3r0ramCOAUbqXTtyymur4ZestXG9JwpfBKl
ZyAJ7aGxvQMNaDDFbX6GPdrzw4VaazzSsh2u80NsZ2g6wlOsXCJUJnItkd+L7oPkK9J+
5e7Rgj4odffQea3OMj7qduFWCjEVR5MSog7LRideT/xXSwgFx2pARocQQSgRHWKBXIRL
ehH+O8AgREZqWEdg6ZA3mMGCXS65tkLCNNk/hdfplVdBrktATUngFXCeQ99r390H3S3X
u9LoDnyRxFhD+gJFa3kAYLQi8aGLLO3LEHZTOPNJEu7moaz42eZ3GTMXP3DwL+wNdFG7
WdrA==
Received: by 10.152.104.202 with SMTP id gg10mr3600272lab.56.1347656551279;
Fri, 14 Sep 2012 14:02:31 -0700 (PDT)
Received: by 10.114.12.201 with HTTP; Fri, 14 Sep 2012 14:02:31 -0700 (PDT)
In-Reply-To: <CAHVvXxRAByXPYdRZ_y39gnTKrFL2HDwJn8nqVs9oJFaaa7SCRQ-JsoAwUIsXosN+***@public.gmane.org>
X-BeenThere: python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Discussions of speculative Python language ideas
<python-ideas.python.org>
List-Unsubscribe: <http://mail.python.org/mailman/options/python-ideas>,
<mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=unsubscribe>
List-Archive: <http://mail.python.org/pipermail/python-ideas/>
List-Post: <mailto:python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>
List-Help: <mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=help>
List-Subscribe: <http://mail.python.org/mailman/listinfo/python-ideas>,
<mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=subscribe>
Errors-To: python-ideas-bounces+gcpi-python-ideas=m.gmane.org-+ZN9ApsXKcEdnm+***@public.gmane.org
Sender: "Python-ideas"
<python-ideas-bounces+gcpi-python-ideas=m.gmane.org-+ZN9ApsXKcEdnm+***@public.gmane.org>
Archived-At: <http://permalink.gmane.org/gmane.comp.python.ideas/16115>

Post by Oscar Benjamin
I can see why you would expect different behaviour here, though. I tend not
to think of the functions in the operator module as convenience functions
but as *efficient* nameable functions referring to operations that are
normally invoked with a non-function syntax. Which is more convenient out
1) using operator
import operator
result = sorted(values, key=operator.attrgetter('name'))

I would normally write that as

from operator import attrgetter as attr
... # may use it several times

result=sorted(values, key=attr('name'))

which is about the best I could hope for, without being able to use
the dot itself.

Post by Oscar Benjamin
2) using lambda
result = sorted(values, key=lambda v: v.name)

And I honestly think that would be worse, even if lambda didn't have a
code smell. It focuses attention on the fact that you're creating a
callable, instead of on the fact that you're grabbing the name
attribute.

Post by Oscar Benjamin
In general it is bad to conflate scalar/sequence semantics so that a caller
should get a different type of object depending on the length of a
sequence.

Yeah, but that can't really be solved well in python, except maybe by
never extending an API to handle sequences. I would personally not
consider that an improvement.

Part of the problem is that the cleanest way to take a variable number
of arguments is to turn them into a sequence under the covers (*args),
even if they weren't passed that way.

-jJ

Oscar Benjamin

2012-09-15 11:09:12 UTC

Post by Jim Jewett

I would normally write that as
from operator import attrgetter as attr
... # may use it several times
result=sorted(values, key=attr('name'))
which is about the best I could hope for, without being able to use
the dot itself.

To be clear, I wasn't complaining about the inconvenience of importing and
referring to attrgetter. I was saying that if the obvious alternative
(lambda functions) is at least as convenient then it's odd to describe
itemgetter/attrgetter as convenience functions.

Post by Jim Jewett

Post by Oscar Benjamin
2) using lambda
result = sorted(values, key=lambda v: v.name)

I disagree here. I find the fact that a lambda function shows me the
expression I would normally use to get the quantity I'm interested in makes
it easier for me to read. When I look at it I don't see it as a callable
function but as an expression that I'm passing for use somewhere else.

Post by Jim Jewett

Post by Oscar Benjamin
In general it is bad to conflate scalar/sequence semantics so that a caller
should get a different type of object depending on the length of a
sequence.

Yeah, but that can't really be solved well in python, except maybe by
never extending an API to handle sequences. I would personally not
consider that an improvement.
Part of the problem is that the cleanest way to take a variable number
of arguments is to turn them into a sequence under the covers (*args),
even if they weren't passed that way.
-jJ

You can extend an API to support sequences by adding a new entry point.
This is a common idiom in python: think list.append vs list.extend.

Oscar

Nick Coghlan

2012-09-15 12:43:59 UTC