PEP pre-draft: Support for indexing with keyword arguments

Discussion:

Stefano Borini

2014-07-01 22:36:48 UTC

Dear all,

after the first mailing list feedback, and further private discussion
with Joseph Martinot-Lagarde, I drafted a first iteration of a PEP for
keyword arguments in indexing. The document is available here.

https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt

The document is not in final form when it comes to specifications. In
fact, it requires additional discussion about the best strategy to
achieve the desired result. Particular attention has been devoted to
present alternative implementation strategies, their pros and cons. I
will examine all feedback tomorrow morning European time (in approx 10
hrs), and apply any pull requests or comments you may have.

When the specification is finalized, or this community suggests that the
PEP is in a form suitable for official submission despite potential open
issues, I will submit it to the editor panel for further discussion, and
deploy an actual implementation according to the agreed specification
for a working test run.

I apologize for potential mistakes in the PEP drafting and submission
process, as this is my first PEP.

Kind Regards,

Stefano Borini

Chris Angelico

2014-07-02 01:06:24 UTC

Permalink

On Wed, Jul 2, 2014 at 8:36 AM, Stefano Borini

Post by Stefano Borini
https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt

A good start!

"""
C0: a[1] -> idx = 1 # integer
a[1,2] -> idx = (1,2) # tuple
C1: a[Z=3] -> idx = {"Z": 3} # dictionary with single key
C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4} #
dictionary/ordereddict [*]
or idx = ({"Z": 3}, {"R": 4}) # tuple of two
single-key dict [**]
...
C5. a[1, 2, Z=3] -> idx = (1, 2, {"Z": 3})
"""

Another possibility for the keyword arguments is a two-item tuple,
which would mean that C1 comes up as ("Z", 3) (or maybe (("Z", 3),) -
keyword arguments forcing a tuple of all args for
consistency/clarity), C2 as (("Z", 3), ("R", 4)), and C5 as (1, 2,
("Z", 3)). This would be lighter and easier to use than the tuple of
dicts, and still preserves order (unlike the regular dict); however,
it doesn't let you easily fetch up the one keyword you're interested
in, which is normally something you'd want to support for a
**kwargs-like feature:

def __getitem__(self, item, **kwargs):
# either that, or kwargs is part of item in some way
ret = self.base[item]
if "precis" in kwargs: ret.round(kwargs["precis"])
return ret

To implement that with a tuple of tuples, or a tuple of dicts, you'd
have to iterate over it and check each one - much less clean code.

I would be inclined to simply state, in the PEP, that keyword
arguments in indexing are equivalent to kwargs in function calls, and
equally unordered (that is to say: if a proposal to make function call
kwargs ordered is accepted, the same consideration can be applied to
this, but otherwise they have no order). This does mean that it
doesn't fit the original use-case, but it seems very odd to start out
by saying "here, let's give indexing the option to carry keyword args,
just like with function calls", and then come back and say "oh, but
unlike function calls, they're inherently ordered and carried very
differently".

For the OP's use-case, though, it would actually be possible to abuse
slice notation. I don't remember this being mentioned, but it does
preserve order; the cost is that all the "keywords" have to be defined
as objects.

class kw: pass # because object() doesn't have attributes
def make_kw(names):
for i in names.split():
globals()[i] = obj = kw()
obj.keyword_arg = i
make_kw("Z R X")

# Now you can use them in indexing
some_obj[5, Z:3]
some_obj[7, Z:3, R:4]

The parameters will arrive in the item tuple as slice objects, where
the start is a signature object and the stop is its value.

Post by Stefano Borini

some_obj[5, Z:3]

getitem: (5, slice(<__main__.kw object at 0x016C5E10>, 3, None))

Yes, it uses a colon rather than an equals sign, but on the flip side,
it already works :)

ChrisA

C Anthony Risinger

2014-07-02 02:58:44 UTC

Permalink

Post by Chris Angelico
[...]
For the OP's use-case, though, it would actually be possible to abuse
slice notation. I don't remember this being mentioned, but it does
preserve order; the cost is that all the "keywords" have to be defined
as objects.
class kw: pass # because object() doesn't have attributes
globals()[i] = obj = kw()
obj.keyword_arg = i
make_kw("Z R X")
# Now you can use them in indexing
some_obj[5, Z:3]
some_obj[7, Z:3, R:4]
The parameters will arrive in the item tuple as slice objects, where
the start is a signature object and the stop is its value.

Post by Chris Angelico
some_obj[5, Z:3]

getitem: (5, slice(<__main__.kw object at 0x016C5E10>, 3, None))
Yes, it uses a colon rather than an equals sign, but on the flip side,
it already works :)

This works great, IIRC you can pretty much pass *anything*:

dict[{}:]
dict[AType:lambda x: x]
dict[::]
dict[:]

...don't forget extended slice possibilities :)

I've dabbled with this in custom dict implementations and it usefully
excludes all normal dicts, which quickly reject slice objects.

--
C Anthony [mobile]

Rob Cliffe

2014-07-02 02:36:23 UTC

Permalink

A small bit of uninformed feedback (no charge :-) ):

1) Ahem, doesn't a[3] (usually) return the *fourth* element of a ?

2)

""" Compare e.g. a[1:3, Z=2] with a.get(slice(1,3,None), Z=2). """

I think this is slightly unfair as the second form can be abbreviated to a.get(slice(1,3), Z=2),
just as the first is an abbreviation for a[1:3:None, Z=2].

3) You may not consider this relevant. But as an (I believe)
intelligent reader, but one unfamiliar with the material, I cannot
understand what your first example

""" low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3]) """

is about, and whether it is really (conceptually) related to indexing,
or just a slick hack. I guess it could be anything, depending on the
implementation of __getitem__.

Best wishes,
Rob Cliffe

Stefano Borini

2014-07-02 10:08:25 UTC

Permalink

Post by Rob Cliffe
1) Ahem, doesn't a[3] (usually) return the *fourth* element of a ?

Yes. I changed the indexes many times for consistency and that slipped
through. It used to be a[2]

Post by Rob Cliffe
low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3]) """
is about, and whether it is really (conceptually) related to indexing,
or just a slick hack. I guess it could be anything, depending on the
implementation of __getitem__.

The reason behind an indexing is that the BasisSet object could be internally
represented as a numeric table, where rows are associated to individual elements
(e.g. row 0:5 to element 1, row 5:8 to element 2) and each column is associated
to a given degree of accuracy (e.g. first column is low accuracy, second column
is medium accuracy etc). You could say that users are not concerned with the
internal representation, but if they are eventually allowed to create these
basis sets in this tabular form, it makes a nice conceptual model to keep the
association column <-> accuracy and keep it explicit in the interface.

Xavier Combelle

2014-07-02 11:47:03 UTC

Permalink

in this case:

C1: a[Z=3] -> idx = {"Z": 3} # P1/P2
dictionary with single key

as we can index with any object, I wonder how one could differency between
the calls, a[z=3]
and the actual a[{"Z":3}]. Do they should be return the same?

Stefano Borini

2014-07-02 12:20:03 UTC

Permalink

Post by Xavier Combelle
C1: a[Z=3] -> idx = {"Z": 3} # P1/P2
dictionary with single key
as we can index with any object, I wonder how one could differency between
the calls, a[z=3]
and the actual a[{"Z":3}]. Do they should be return the same?

indeed you can't, and if I recall correctly I wrote it somewhere. The point is eventually
if such distinction is worth considering or if, instead, the two cases should be handled
as degenerate (equivalent) notations.

IMHO, they should be kept distinct, and this disqualifies that implementation strategy.
Too much magic would happen otherwise.

--
------------------------------------------------------------

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS d- s+:--- a? C++++ UL++++ P+ L++++ E--- W- N+ o K- w---
O+ M- V- PS+ PE+ Y PGP++ t+++ 5 X- R* tv+ b DI-- D+
G e h++ r+ y*
------------------------------------------------------------

Nick Coghlan

2014-07-02 07:06:53 UTC

Permalink

It's a well written PEP, but the "just use call notation instead"
argument is going to be a challenging one to overcome.

Given that part of the rationale given is that "slice(start, stop,
step)" is uglier than the "start:stop:step" permitted in an indexing
operation, the option of allowing "[start:]",
"[:stop]","[start:stop:step]", etc as dedicated slice syntax should
also be explicitly considered.

Compare:

a.get(slice(1,3), Z=2) # today
a.get([1:3], Z=2) # slice sytax
a[1:3, Z=2] # PEP

Introducing a more general slice notation would make indexing *less*
special (reducing the current "allows slice notation" special case to
"allows slice notation with the surrounding square brackets implied".

The reduction of special casing could be taken further, by allowing
the surrounding square brackets to be omitted in tuple and list
displays, just as they are in indexing operations.

I'm not saying such a proposal would necessarily be accepted - I just
see a proposal that takes an existing special case and proposes to
make it *less* special as more appealing than one that proposes to
make it even *more* special.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan-***@public.gmane.org | Brisbane, Australia

Nicholas Cole

2014-07-02 07:45:47 UTC

Permalink

Post by Nick Coghlan
It's a well written PEP, but the "just use call notation instead"
argument is going to be a challenging one to overcome.

+1

The advantages the PEP suggests are very subjective ones to do with
readability.

Stefano Borini

2014-07-02 07:59:54 UTC

Permalink

Post by Nicholas Cole

Post by Nick Coghlan
It's a well written PEP, but the "just use call notation instead"
argument is going to be a challenging one to overcome.

+1
The advantages the PEP suggests are very subjective ones to do with
readability.

I want to be honest, I agree with this point of view myself. it's not _needed_.
it would be a nice additional feature but maybe only rarely used and in very
specialized cases, and again, there are always workarounds.

Even if rejected on the long run, it rationalizes and analyzes motivations and
alternatives, and enshrines them formally on why it's a "not worth it"
scenario.

Thank you for all the feedback. I am including all the raised points in the PEP
and I'll follow up with a revised version ASAP.

Stefano Borini

Joseph Martinot-Lagarde

2014-07-02 19:17:15 UTC

Permalink

Post by Nick Coghlan
It's a well written PEP, but the "just use call notation instead"
argument is going to be a challenging one to overcome.
+1
The advantages the PEP suggests are very subjective ones to do with
readability.

Well, "Readability counts" is in the zen of python !

Having recently translated a Matlab program to python, I can assure you
that the notation difference between call and indexing is really useful.
.get() does not looks like indexing.

Akira Li

2014-07-02 15:14:47 UTC

Permalink

Post by Stefano Borini
Dear all,
after the first mailing list feedback, and further private discussion
with Joseph Martinot-Lagarde, I drafted a first iteration of a PEP for
keyword arguments in indexing. The document is available here.
https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt
The document is not in final form when it comes to specifications. In
fact, it requires additional discussion about the best strategy to
achieve the desired result. Particular attention has been devoted to
present alternative implementation strategies, their pros and cons. I
will examine all feedback tomorrow morning European time (in approx 10
hrs), and apply any pull requests or comments you may have.
When the specification is finalized, or this community suggests that
the PEP is in a form suitable for official submission despite
potential open issues, I will submit it to the editor panel for
further discussion, and deploy an actual implementation according to
the agreed specification for a working test run.
I apologize for potential mistakes in the PEP drafting and submission
process, as this is my first PEP.

Strategy 3b: builtin named tuple

C0. a[2] -> idx = 2; # scalar
a[2,3] -> idx = (2, 3) # tuple
idx[0] == 2
idx[1] == 3
C1. a[Z=3] -> idx = (Z=3) # builtin named tuple (pickable, etc)
idx[0] == idx.Z == 3
C2. a[Z=3, R=2] -> idx = (Z=3, R=2)
idx[0] == idx.Z == 3
idx[1] == idx.R == 2
C3. a[1, Z=3] -> idx = (1, Z=3)
idx[0] == 1
idx[1] == idx.Z == 3
C4. a[1, Z=3, R=2] -> idx = (1, Z=3, R=2)
idx[0] == 1
idx[1] == idx.Z == 3
idx[2] == idx.R == 2
C5. a[1, 2, Z=3] -> idx = (1, 2, Z=3)
C6. a[1, 2, Z=3, R=4] -> (1, 2, Z=3, R=4)
C7. a[1, Z=3, 2, R=4] -> SyntaxError: non-keyword arg after keyword arg

Pros:

- looks nice
- easy to explain: a[1,b=2] is equivalent to a[(1,b=2)] like a[1,2] is
equivalent to a[(1,2)]
- it makes `__getitem__` *less special* if Python supports a builtin
named tuple and/or ordered keyword args (the call syntax)

Cons:

- Python currently has no builtin named tuple (an ordered collection of
named (optionally) values)
- Python currently doesn't support ordered keyword args (it might have
made the implementation trivial)

Note: `idx = (Z=3)` is a SyntaxError so it is safe to produce a named tuple
instead of a scalar.

--
Akira

d***@public.gmane.org

2014-07-02 16:40:43 UTC

Permalink

Hello, just some remarks:

Ad degeneracy of notation: The case of a[Z=3] and a[{"Z": 3}] is similar to current a[1, 2] and a[(1, 2)]. Even though one may argue that the parentheses are actually not part of tuple notation but are just needed because of syntax, it may look as degeneracy of notation when compared to function call: f(1, 2) is not the same thing as f((1, 2)).

Ad making dict.get() obsolete: There is still often used a_dict.get(key) which has to be spelled a_dict[key, default=None] with index notation.

The _n keys used in strategy 3 may be indexed from zero like list indices.

Regards, Drekin

Tim Delaney

2014-07-02 20:12:30 UTC

Permalink

Post by Stefano Borini
Dear all,
after the first mailing list feedback, and further private discussion with
Joseph Martinot-Lagarde, I drafted a first iteration of a PEP for keyword
arguments in indexing. The document is available here.
https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt
The document is not in final form when it comes to specifications. In
fact, it requires additional discussion about the best strategy to achieve
the desired result. Particular attention has been devoted to present
alternative implementation strategies, their pros and cons. I will examine
all feedback tomorrow morning European time (in approx 10 hrs), and apply
any pull requests or comments you may have.
When the specification is finalized, or this community suggests that the
PEP is in a form suitable for official submission despite potential open
issues, I will submit it to the editor panel for further discussion, and
deploy an actual implementation according to the agreed specification for a
working test run.
I apologize for potential mistakes in the PEP drafting and submission
process, as this is my first PEP.

One option I don't see is to have a[b=1, c=2] be translated to
a.__getitem__((slice('b', 1, None), slice['c', 2, None)) automatically.
That completely takes care of backwards compatibility in __getitem__ (no
change at all), and also deals with your issue with abusing slice objects:

a[K=1:10:2] -> a.__getitem__(slice('K', slice(1, 10, 2)))

And using that we can have an ordered dict "literal"

class OrderedDictLiteral(object):
def __getitem__(self, t):
try:
i = iter(t)
except TypeError:
i = (t,)

return collections.OrderedDict((s.start, s.stop) for s in i)

odict = OrderedDictLiteral()

o = odict[a=1, b='c']
print(o) # prints OrderedDict([('a', 1), ('b', 'c')])

On a related note, if we combined this with the idea that kwargs should be
constructed using the type of the passed dict (i.e. if you pass an
OrderedDict as **kwargs you get a new OrderedDict in the function) we could
do:

kw = OrderedDictLiteral()

def f(**kw):
print(kw)

f('a', 'b', **kw[c='d', e=2])

always resulting in:

{'c': 'd', 'e': 2}

Tim Delaney

Tim Delaney

2014-07-02 20:14:00 UTC

Permalink

Post by Tim Delaney
a[K=1:10:2] -> a.__getitem__(slice('K', slice(1, 10, 2)))

Of course, that should have been:

a[K=1:10:2] -> a.__getitem__(slice('K', slice(1, 10, 2), None))

Tim Delaney

Stefano Borini

2014-07-02 21:29:53 UTC

Permalink

Post by Tim Delaney
One option I don't see is to have a[b=1, c=2] be translated to
a.__getitem__((slice('b', 1, None), slice['c', 2, None)) automatically.

it would be weird, since it's not technically a slice, but it would work.
I personally think that piggybacking on the slice would appear hackish.
One could eventually think to have a keyword() object similar to slice(),
but then it's basically a single item dictionary (Strategy 1) with a fancy
name.

Tim Delaney

2014-07-02 23:10:18 UTC

Permalink

Post by Stefano Borini

Post by Tim Delaney
One option I don't see is to have a[b=1, c=2] be translated to
a.__getitem__((slice('b', 1, None), slice['c', 2, None)) automatically.

I really do think that a[b=c, d=e] should just be syntax sugar for a['b':c,
'd':e]. It's simple to explain, and gives the greatest backwards
compatibility. In particular, libraries that already abused slices in this
way will just continue to work with the new syntax.

I'd maybe thought a subclass of slice, with .key (= .start) and and .value
(= .stop) variables would work, but slice isn't subclassable so it would be
a bit more difficult. That would also be backwards-compatible with existing
__getitem__ that used slice, but would preclude people calling that
__getitem__ with slice syntax, which I personally don't think is
desireable. Instead, maybe recommend something like:

ordereddict = OrderedDictLiteral() # using the definition from previous
email

class GetItemByName(object):
def __getitem__(self, t):
# convert the parameters to a dictionary
d = ordereddict[t]
return d['name']

Hmm - here's an anonymous named tuple "literal" as another example:

class AnonymousNamedTuple(object):
def __getitem__(self, t):
d = ordereddict[t]
t = collections.namedtuple('_', d)
return t(*d.values())

namedtuple = AnonymousNamedTuple()
print(namedtuple[a='b', c=1]) # _(a='b', c=1)

As you can see, I'm in favour of keeping the order of the keyword arguments
to the index - losing it would prevent things like the above.

Tim Delaney

Ethan Furman

2014-07-02 23:40:39 UTC

Permalink

+0.5 for keywords in __getitem__

+1 for this version of it

~Ethan~

Bruce Leban

2014-07-03 07:37:45 UTC

Permalink

Post by Ethan Furman

Post by Tim Delaney
I really do think that a[b=c, d=e] should just be syntax sugar for
a['b':c, 'd':e]. It's simple to explain, and gives
the greatest backwards compatibility. In particular, libraries that
already abused slices in this way will just continue
to work with the new syntax.

+0.5 for keywords in __getitem__
+1 for this version of it

If there weren't already abuse of slices for this purpose, would this be
the first choice? I think not. This kind of abuse makes it more likely that
there will be mysterious failures when someone tries to use keyword
indexing for objects that don't support it. In contrast, using kwargs means
you'll get an immediate meaningful exception.

Tangentially, I think the PEP can reasonably reserve the keyword argument
name 'default' for default values specifying that while __getitem__ methods
do not need to support default, they should not use that keyword for any
other purpose.

Also, the draft does not explain why you would not allow defining
__getitem__(self, idx, x=1, y=2) rather than only supporting the kwargs
form. I don't know if I think it should or shouldn't at this point, but it
definitely think it need to be discussed and justified one way or the other.

--- Bruce
Learn how hackers think: http://j.mp/gruyere-security
https://www.linkedin.com/in/bruceleban

Stephan Hoyer

2014-07-03 17:57:48 UTC

Permalink

don't have strong opinions about the implementation, but I am strongly
supportive of this PEP for the second case it lists -- the ability to index
an a multi-dimensional array by axis name or label instead of position.

Why? Suppose you're working with high dimensional data, where arrays may
have any number of axes such as time, x, y and z. I work with this sort of
data every day, as do many scientists.

It is awkward and error prone to use the existing __getitem__ and
__setitem__ syntax, because it's difficult to reliably keep track of axis
order with this many indices:

a[:, :, 0:10]

vs.

a[y=0:10]

Keyword getitem syntax should be encouraged for the same reasons that
keyword arguments are often preferable to positional arguments: it is both
explicit (no implicit reliance on axis order), and more flexible (the same
code will work on arrays with transposed or altered axes). This is
particularly important because it is typical to be working with arrays that
use some but not all the same axes.

A method does allow for an explicit (if verbose) alternative to __getitem__
syntax:

a.getitem(y=slice(0, 10))

But it's worse for __setitem__:

a.setitem(dict(y=slice(0, 10)), 0)

vs.

a[y=0:10] = 0

------------

Another issue: The PEP should address whether expressions with slice
abbreviations like the following should be valid syntax:

a[x=:, y=:5, z=::-1]

These look pretty strange (=: looks like a form of assign), but the
functionality would certainly be nice to support in some way.

Surrounding the indices with [] might help:

a[x=[:], y=[:5], z=[::-1]]

-------------

Post by Bruce Leban
Tangentially, I think the PEP can reasonably reserve the keyword argument
name 'default' for default values specifying that while __getitem__ methods
do not need to support default, they should not use that keyword for any
other purpose.

-1 from me. The existing get method handles this case pretty well, with
fewer keystrokes than the keyword only "default" index (as I think has
already been pointed out).

In my opinion, case 1 (labeled indices for a physics DSL) and case 2
(labeled indices to removed ambiguity) are basically the same, and the only
use-cases that should be encouraged. Labeling tensor indices with names in
mathematical notation is standard for precisely the same reasons that it's
a good idea for Python.

Best,
Stephan

(note: apologies for any redundant messages, I tried sending this message
from the google groups mirror before I signed up, which didn't go out to
the main listing list)

Stefano Borini

2014-07-03 19:33:56 UTC

Permalink

Post by Stephan Hoyer
don't have strong opinions about the implementation, but I am strongly
supportive of this PEP for the second case it lists -- the ability to index
an a multi-dimensional array by axis name or label instead of position.

thinking aloud.
The biggest problem is that there's no way of specifying which labels the
object supports, and therefore no way of binding a specified keyword, unless
the __getitem__ signature is deeply altered.

Post by Stephan Hoyer
It is awkward and error prone to use the existing __getitem__ and
__setitem__ syntax, because it's difficult to reliably keep track of axis
a[:, :, 0:10]
vs.
a[y=0:10]

This is indeed an important use case. I should probably stress it more in the
PEP.

Post by Stephan Hoyer
Another issue: The PEP should address whether expressions with slice
a[x=:, y=:5, z=::-1]

looks ugly indeed

Post by Stephan Hoyer
a[x=[:], y=[:5], z=[::-1]]

better, but unusual

Post by Stephan Hoyer
-1 from me. The existing get method handles this case pretty well, with
fewer keystrokes than the keyword only "default" index (as I think has
already been pointed out).
In my opinion, case 1 (labeled indices for a physics DSL) and case 2
(labeled indices to removed ambiguity) are basically the same, and the only
use-cases that should be encouraged. Labeling tensor indices with names in
mathematical notation is standard for precisely the same reasons that it's
a good idea for Python.

Meaning dropping the use of keyword indexing for "options" use cases.

Stephan Hoyer

2014-07-03 19:43:59 UTC

Permalink

On Thu, Jul 3, 2014 at 12:33 PM, Stefano Borini <

Post by Stefano Borini
thinking aloud.
The biggest problem is that there's no way of specifying which labels the
object supports, and therefore no way of binding a specified keyword, unless
the __getitem__ signature is deeply altered.

I don't I follow you here. The object itself handles the __getitem__ logic
in whatever way it sees fit, and it would be up to it to raise KeyError
when an invalid label is supplied, much like the current situation with
invalid keys.

Stephan

Stefano Borini

2014-07-03 19:59:11 UTC

Permalink

Post by Stephan Hoyer
On Thu, Jul 3, 2014 at 12:33 PM, Stefano Borini <

NB: Still thinking aloud here...

True, but the problem is that in a function

def foo(x,y,z): pass

calling the following will give the exact same result

foo(1,2,3)
foo(x=1, y=2, z=3)
foo(z=3, x=1, y=2)

this happens because at function definition you can specify the argument names.
with __getitem__ you can't explain this binding. its current form precludes it

__getitem__(self, idx)

if you use a[1,2,3], you have no way of saying that "the first index is called x", so you have
no way for these two to be equivalent in a similar way a function does

a[1,2,3]
a[z=3, x=1, y=2]

unless you allow getitem in the form

__getitem__(self, x, y, z)

which I feel it would be a wasps' nest in terms of backward compatibility, both at the
python and C level. I doubt this would fly.

So if you want to keep __getitem__ signature unchanged, you will have to map labels
to positions manuallyi inside __getitem__, a potentially complex task. Not even
strategy 3 (namedtuple) would solve this issue.

Stephan Hoyer

2014-07-03 20:20:45 UTC

Permalink

On Thu, Jul 3, 2014 at 12:59 PM, Stefano Borini <

Post by Stefano Borini
So if you want to keep __getitem__ signature unchanged, you will have to map labels
to positions manuallyi inside __getitem__, a potentially complex task. Not even
strategy 3 (namedtuple) would solve this issue.

Yes, this is true. However, in practice many implementations of labeled
arrays would have generic labeled axes, so they would need to use their own
logic to do the mapping in __getitem__ anyways.

Sturla Molden

2014-07-03 20:48:06 UTC

Permalink

Post by Stephan Hoyer
Yes, this is true. However, in practice many implementations of labeled
arrays would have generic labeled axes, so they would need to use their own
logic to do the mapping in __getitem__ anyways.

If you are thiniking about Pandas, then each keyword should be allowed to
take a slice as well.

dataframe[apples=1:3, oranges=2:6]

Sturla

Stephan Hoyer

2014-07-03 21:00:20 UTC

Permalink

Post by Sturla Molden

Post by Stephan Hoyer
Yes, this is true. However, in practice many implementations of labeled
arrays would have generic labeled axes, so they would need to use their

own

Post by Stephan Hoyer
logic to do the mapping in __getitem__ anyways.

If you are thiniking about Pandas, then each keyword should be allowed to
take a slice as well.
dataframe[apples=1:3, oranges=2:6]

Yes, I am indeed thinking about pandas and other similar libraries.
Supporting slices with keywords would be essential.

Stephan

Nick Coghlan

2014-07-03 21:48:10 UTC

Permalink

Post by Stephan Hoyer

Post by Sturla Molden

If you are thiniking about Pandas, then each keyword should be allowed to
take a slice as well.
dataframe[apples=1:3, oranges=2:6]

Yes, I am indeed thinking about pandas and other similar libraries.
Supporting slices with keywords would be essential.

Some more concrete pandas-based examples could definitely help make a
more compelling case. I genuinely think the hard part here is to make
the case for offering the feature *at all*, so adding a "here is
current real world pandas based code" and "here is how this PEP could
make that code more readable" example could be worthwhile.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan-***@public.gmane.org | Brisbane, Australia

Stefano Borini

2014-07-04 06:25:13 UTC

Permalink

Post by Nick Coghlan
Some more concrete pandas-based examples could definitely help make a
more compelling case. I genuinely think the hard part here is to make
the case for offering the feature *at all*, so adding a "here is
current real world pandas based code" and "here is how this PEP could
make that code more readable" example could be worthwhile.

I agree. I will examine pandas this evening for more context.

Stefano Borini

2014-07-04 15:44:30 UTC

Permalink

Post by Stefano Borini

I agree. I will examine pandas this evening for more context.

Ok, I examined pandas, and I think it solves a completely different problem

In [27]: df.loc[:,['A','B']]
Out[27]:
A B
2013-01-01 0.469112 -0.282863
2013-01-02 1.212112 -0.173215

Pandas is naming the columns. With keyword arguments you would be naming the _axes_.

Stefano Borini

2014-07-03 17:00:36 UTC

Permalink

Post by Tim Delaney
I really do think that a[b=c, d=e] should just be syntax sugar for a['b':c,
'd':e]. It's simple to explain, and gives the greatest backwards
compatibility

This is indeed a point, as the initialization for a dictionary looks very, very similar,
however, it would definitely collide with the slice object. At the very least, it would be
confusing.

Post by Tim Delaney
In particular, libraries that already abused slices in this
way will just continue to work with the new syntax.

Are there any actual examples in the wild of this behavior?

Stefano Borini

2014-07-03 17:15:09 UTC

Permalink

Post by Stefano Borini
https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt

I committed and pushed the most recent changes and they are now available.
Some points have been clarified and expanded. Also, there's a new section about
C interface compatibility. Please check the diffs for tracking the changes.

Tonight I will comb the document and the thread again, further distilling the
current hot spots.

Stefano Borini

2014-07-03 18:30:59 UTC

Permalink

Post by Stefano Borini

Post by Stefano Borini
https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt

Forgot: I also added a possibility P4 for the first strategy: keyword
(alternative name "keyindex") which was proposed in the thread.
This solution would look rather neat

Post by Stefano Borini

a[3]

Post by Stefano Borini

a[3:1]

slice(3, 1, None)

Post by Stefano Borini

a[slice(3,1,None)] # <- Note how this notation is a long and equivalent form of the

slice(3, 1, None) # syntactic sugar above

Post by Stefano Borini

a[z=4] # <- Again, note how this notation would be a syntactic sugar

keyindex("z", 4) # for a[keyindex("z", 4)]

Post by Stefano Borini

a[z=1:5:2] # <- Supports slices too.

keyindex("z", slice(1,5,2)) # No ambiguity with dictionaries, and C compatibility is
# straightforward

Post by Stefano Borini

keyindex("z", 4).key

"z"

Another thing I observed is that the point of indexing operation is indexing,
and a keyed _index_ is not the same thing as a keyed _option_ during an
indexing operation. This has been stated during the thread but it's worth to
point out explicitly in the PEPi (it isn't). Using it for options such as
default would technically be a misuse, but an acceptable one for... broad
definitions of indexing.

The keyindex object could be made to implement the same interface as its value
through forwarding, so it can behave just as its value if your logic cares only about
position, and not key

Post by Stefano Borini

keyindex("z", 4) + 1

5

Another rationalization: current indexing has only one degree of freedom, that
is: positioning. Add keywords and now there are two degrees of freedom: position
and key. How are these two degrees of freedom supposed to interact?

Jonas Wielicki

2014-07-04 08:21:53 UTC

Permalink

Post by Stefano Borini
The keyindex object could be made to implement the same interface as its value
through forwarding, so it can behave just as its value if your logic cares only about
position, and not key

keyindex("z", 4) + 1

What about a value which has a .key attribute?

regards,
jwi

Stefano Borini

2014-07-04 09:20:50 UTC

Permalink

Post by Jonas Wielicki

keyindex("z", 4) + 1

What about a value which has a .key attribute?

that would have to be added, and unless you copy the passed index it would be a
side effect of getitem on the passed entity, which would not be nice.

d***@public.gmane.org

2014-07-04 09:29:34 UTC

Permalink

Just some ideas, not claiming they are good:

â¢ As already stated in the thread and also in the PEP, there are two different classes of uses cases of indexing with keyword arguments: as a named index, and as an option contextual to the indexing. I think that the cases ask for different signatures. Even if I have a complex indexing scheme, the signature is (assumming Strategy 1 or 3):

def __getitem__(self, idx): âŠ

However if I now want to add support for default value, I would do it like:

_Empty = object()
def __getitem__(self, idx, *, default=_Empty): âŠ

That leads to the following strategies.

â¢ Just for sake of completeness, maybe the easiest and also most powerful strategy would be just copying of behaviour of function call just with arguments going to __getitem__ instead of __call__ and allowing the syntax sugar for slices (which would raise the question whether to allow slice literals also in functin call or even in every expression).

This strategy has two serious problems:
1. It is not backwards compatible with current mechanism of automatic packing of positional arguments.
2. It is not clear how to intercorporate the additional parameter of __setitem__.

â¢ This takes me to the following hybrid strategy. Both strategies 1 and 3 pack everything into one idx object whereas stratery 2 leaves key indices in separate kwargs parameter. The hybrid strategy takes as much as possible from function call strategy and generalizes strategies 1, 2, 3 at the same time.

The general signature looks like this:
def __getitem__(self, idx, *, key1, key2=default, **kwargs): âŠ
During the call, every provided keyword argument with present corresponding parameter is put into that parameter. If there is **kwargs parameter then the remaining keyword arguments are put into kwargs and if not then they are somehow (strategy 1 or 3) packed into idx parameter.

Also the additional __setitem__ argument is just added as positional argument:
def __setitem__(self, idx, value, *, key1, key2=default, **kwargs): âŠ

Regards, Drekin

Stefano Borini

2014-07-04 18:10:51 UTC

Permalink

Post by Stefano Borini
https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt

I just added a new strategy. This one cuts the problem down.

Strategy 4: Strict dictionary
-----------------------------

This strategy accepts that __getitem__ is special in accepting only one object,
and the nature of that object must be non-ambiguous in its specification of the
axes: it can be either by order, or by name. As a result of this assumption,
in presence of keyword arguments, the passed entity is a dictionary and all
labels must be specified.

C0. a[1]; a[1,2] -> idx = 1; idx=(1, 2)
C1. a[Z=3] -> idx = {"Z": 3}
C2. a[Z=3, R=4] -> idx = {"Z"=3, "R"=4}
C3. a[1, Z=3] -> raise SyntaxError
C4. a[1, Z=3, R=4] -> raise SyntaxError
C5. a[1, 2, Z=3] -> raise SyntaxError
C6. a[1, 2, Z=3, R=4] -> raise SyntaxError
C7. a[1, Z=3, 2, R=4] -> raise SyntaxError

Pros:
- strong conceptual similarity between the tuple case and the dictionary case.
In the first case, we are specifying a tuple, so we are naturally defining
a plain set of values separated by commas. In the second, we are specifying a
dictionary, so we are specifying a homogeneous set of key/value pairs, as
in dict(Z=3, R=4)
- simple and easy to parse on the __getitem__ side: if it gets a tuple,
determine the axes using positioning. If it gets a dictionary, use
the keywords.
- C interface does not need changes.

Cons:
- degeneracy of a[{"Z": 3, "R": 4}] with a[Z=3, R=4], but the same degeneracy exists
for a[(2,3)] and a[2,3].
- very strict.
- destroys the use case a[1, 2, default=5]

i

Oleg Broytman

2014-07-04 18:20:18 UTC

Permalink

Post by Stefano Borini
C1. a[Z=3] -> idx = {"Z": 3}
C2. a[Z=3, R=4] -> idx = {"Z"=3, "R"=4}

Huh? Shouldn't it be
C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4}
???

Post by Stefano Borini
- degeneracy of a[{"Z": 3, "R": 4}] with a[Z=3, R=4], but the same degeneracy exists
for a[(2,3)] and a[2,3].

There is no degeneration in the second case. Tuples are created by
commas, not parentheses (except for an empty tuple), hence (2,3) and 2,3
are simply the same thing. While Z=3, R=4 is far from being the same as
{"Z": 3, "R": 4}.

Oleg.

--
Oleg Broytman http://phdru.name/ phd-54Rvo0EEewRBDLzU/***@public.gmane.org
Programmers don't die, they just GOSUB without RETURN.

Stefano Borini

2014-07-04 18:34:24 UTC

Permalink

Post by Oleg Broytman

Post by Stefano Borini
C1. a[Z=3] -> idx = {"Z": 3}
C2. a[Z=3, R=4] -> idx = {"Z"=3, "R"=4}

Huh? Shouldn't it be
C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4}

yes. typo. already fixed in the PEP

Post by Oleg Broytman

Post by Stefano Borini
- degeneracy of a[{"Z": 3, "R": 4}] with a[Z=3, R=4], but the same degeneracy exists
for a[(2,3)] and a[2,3].

There is no degeneration in the second case. Tuples are created by
commas, not parentheses (except for an empty tuple), hence (2,3) and 2,3
are simply the same thing.

We discussed this point above in the thread, and you are of course
right in saying so, yet it stresses the fact that no matter what you pass
inside those square brackets, they always end up funneled inside a single
object, which happens to be a tuple that you just created

Post by Oleg Broytman
While Z=3, R=4 is far from being the same as
{"Z": 3, "R": 4}.

but dict(Z=3, R=4) is the same as {"Z": 3, "R": 4}.
this is exactly like tuple((2,3)) is the same as (2,3)
See the similarity? the square brackets "call a constructor"
on its content. This constructor is tuple if entries are not
key=values (except for the single index case, of course),
and dict if entries are key=values.

Oleg Broytman

2014-07-04 18:39:15 UTC

Permalink

Post by Stefano Borini

Post by Oleg Broytman
Z=3, R=4 is far from being the same as
{"Z": 3, "R": 4}.

I didn't like the idea from the beginning and I am still against it.

d = dict
a[d(Z=3, R=4)]

looks good enough for me without adding any magic to the language.

Oleg.

--
Oleg Broytman http://phdru.name/ phd-54Rvo0EEewRBDLzU/***@public.gmane.org
Programmers don't die, they just GOSUB without RETURN.

Stefano Borini

2014-07-04 18:40:56 UTC

Permalink

Post by Stefano Borini
but dict(Z=3, R=4) is the same as {"Z": 3, "R": 4}.
this is exactly like tuple((2,3)) is the same as (2,3)
See the similarity? the square brackets "call a constructor"
on its content. This constructor is tuple if entries are not
key=values (except for the single index case, of course),
and dict if entries are key=values.

On this regard, one can of course do

idx=(2,3)
print(a[idx])

idx={"x":2, "y":3}
print(a[idx])

the above syntax is already legal today, and calls back to a comment from
a previous post. keywords would just be a shorthand for it.

Tim Delaney

2014-07-04 20:10:15 UTC

Permalink

1. I think you absolutely *must* address the option of purely syntactic
sugar in the PEP. It will come up on python-dev, so address it now.

a[b, c=f, e=f:g:h]
-> a[b, 'c':d, 'e':slice(f, g, h)]

The rationale is readability and being both backwards and forwards
compatible - existing __getitem__ designed to abuse slices will continue to
work, and __getitem__ designed to work with the new syntax will work by
abusing slices in older versions of Python.

Pandas could be cited as an example of an existing library that could
potentially benefit. It would be good if there were precise examples of
Pandas syntax that would benefit immediately, but I don't know it beyond a
cursory glance over the docs. My gut feeling from that is that if the
syntax were available Pandas might be able to use it effectively.

2. I think you're at the point that you need to pick a single option as
your preferred option, and everything else needs to be in the alternatives.

FWIW, I would vote:

+1 for syntax-sugar only (zero backwards-compatibility concerns). If I were
starting from scratch this would not be my preferred option, but I think
compatibility is important.

+0 for a keyword(key, value) parameter object i.e.

a[b, c=d, e=f:g:h]
-> a[b, keyword('c', d), keyword('e', slice(f, g, h))]

My objection is that either __getitem__ will be more complicated if you
want to support earlier versions of Python (abuse slices for earlier
versions, use keyword object for current) or imposes an additional burden
on the caller in earlier versions (need to create a keyword-equivalent
object to call with). If we were starting from scratch this would be one of
my preferred options.

-1 to any option that loses the order of the parameters (I'm strongly in
favour of bringing order to keyword arguments - let's not take a backwards
step here).

-0 to any option that doesn't allow arbitrary ordering of positional and
keyword arguments i.e. any option where the following is not legal:

a[b, c=d, e]

This is something we can do now (albeit in a fairly verbose way at times)
and I think restricting this is likely to remove options for DSLs, etc.

-0 for namedtuple (BTW you might want to mention that
collections.namedtuple() already has precedent for _X positional parameter
names)

My objection is that it's not possible to determine definitively in
__getitem__ if the the call was:

a[b, c]

or

a[_0=b, _1=c]

which might be important in some use cases. The same objection would apply
to passing an OrderedDict (but that's got additional compatibility issues).

Cheers,

Tim Delaney

Nathaniel Smith

2014-07-04 20:39:00 UTC

Permalink

Post by Tim Delaney
1. I think you absolutely *must* address the option of purely syntactic
sugar in the PEP. It will come up on python-dev, so address it now.
a[b, c=f, e=f:g:h]
-> a[b, 'c':d, 'e':slice(f, g, h)]
The rationale is readability and being both backwards and forwards
compatible - existing __getitem__ designed to abuse slices will continue to
work, and __getitem__ designed to work with the new syntax will work by
abusing slices in older versions of Python.

I don't know of any existing code that abuses slices in this way (so
worrying about compatibility with it seems odd?).

Post by Tim Delaney
Pandas could be cited as an example of an existing library that could
potentially benefit. It would be good if there were precise examples of
Pandas syntax that would benefit immediately, but I don't know it beyond a
cursory glance over the docs. My gut feeling from that is that if the syntax
were available Pandas might be able to use it effectively.

Your hack (aside from being pointlessly ugly) would actually prevent
pandas from using this feature. In pandas, slices like foo["a":"b"]
already have a meaning (i.e., take all items from the one labeled "a"
to the one labeled "b").

-n

--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

Tim Delaney

2014-07-04 20:46:58 UTC

Permalink

Post by Nathaniel Smith

Post by Tim Delaney
work, and __getitem__ designed to work with the new syntax will work by
abusing slices in older versions of Python.

pandas from using this feature. In pandas, slices like foo["a":"b"]
already have a meaning (i.e., take all items from the one labeled "a"
to the one labeled "b").

If that's the case then it should be listed as a reason in the PEP for a
change larger than syntax sugar, otherwise this important information will
be lost.

One of the first suggestions when this PEP came up was to just (ab)use
slices - people will use the syntax they have available to them.

Tim Delaney

Ethan Furman

2014-07-04 21:07:41 UTC

Permalink

Post by Nathaniel Smith
Your hack (aside from being pointlessly ugly) would actually prevent
pandas from using this feature. In pandas, slices like foo["a":"b"]
already have a meaning (i.e., take all items from the one labeled "a"
to the one labeled "b").

Isn't that the standard way slices are supposed to be used though? Instead of integers Panda is allowing strings. How
would Pandas use the new feature?

--
~Ethan~

Tim Delaney

2014-07-04 21:40:23 UTC

Permalink

Post by Ethan Furman

Isn't that the standard way slices are supposed to be used though?
Instead of integers Panda is allowing strings. How would Pandas use the
new feature?

I think Nathaniel is saying that pandas is already using string slices in
an appropriate way (rather than abusing them), and so if this was just
syntax sugar they wouldn't be able to use the new syntax for new
functionality (since you couldn't distinguish the two).

It would be possible to make both approaches "work" by having an object
that had all of .start, .stop, .step, .key and .value (and trying
.key/.value first), but IMO that's going too far - I'd rather have a
separate object with just .key and .value to test for.

Tim Delaney

Stefano Borini

2014-07-04 21:41:44 UTC

Permalink

Post by Ethan Furman
Isn't that the standard way slices are supposed to be used though?
Instead of integers Panda is allowing strings. How would Pandas use the
new feature?

It would not. Pandas is using it to use labels as indexes. adding
keywords would allow to name the axes. These are two completely
different use cases.

For example, one could have a table containing the temperature
with the city on one axis and the time on the other axis.

So one could have

temperature["London", 12]

Pandas would have text indexes for "London", "New York", "Chicago" and
so on. One could say

temperature["London":"Chicago", 12]

to get the temperature of the cities between "London" and "Chicago" at noon.

The PEP would allow instead to name the axes in the query

temperature[city="London":"Chicago", hour=12]

Ethan Furman

2014-07-04 20:19:02 UTC

Permalink

Also +1 for this approach.

--
~Ethan~

Alexander Belopolsky

2014-07-04 19:00:56 UTC

Permalink

On Fri, Jul 4, 2014 at 2:10 PM, Stefano Borini <

Post by Stefano Borini
I just added a new strategy. This one cuts the problem down.
Strategy 4: Strict dictionary

Did anyone consider treating = inside [] in a similar way as : is treated
now. One can even (re/ab)use the slice object:

a[1, 2, 5:7, Z=42] -> a.__getitem__((1, 2, slice(5, 7, None), slice('Z',
'=', 42)))
... def __getitem__(self, key):
... print(key)
...

Post by Stefano Borini

c = C()
c['Z':'=':42]

slice('Z', '=', 42)

Greg Ewing

2014-07-04 23:05:22 UTC

Permalink

Post by Stefano Borini
Strategy 4: Strict dictionary
-----------------------------
in presence of keyword arguments, the passed entity is a dictionary and all
labels must be specified.

This wouldn't solve the OP's problem, because he apparently
needs to preserve the order of the keywords.

I don't really understand what he's trying to do, but
labelling the axes doesn't seem to be it, or at least not
just that.

--
Greg

Bruce Leban

2014-07-05 03:16:59 UTC

Permalink

On Fri, Jul 4, 2014 at 11:10 AM, Stefano Borini <

Post by Stefano Borini

Post by Stefano Borini
https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt

Strategy 4: Strict dictionary
-----------------------------
This strategy accepts that __getitem__ is special in accepting only one object,
and the nature of that object must be non-ambiguous in its specification of the
axes: it can be either by order, or by name. As a result of this assumption,
in presence of keyword arguments, the passed entity is a dictionary and all
labels must be specified.

The result that "all labels must be specified" does not follow from that
assumption that the object must be unambiguous. Numbers are not valid
keyword names but are perfectly useful as index values. See below. Note
that I am not advocating for/against strategy 4, just commenting on it.

Post by Stefano Borini
C0. a[1]; a[1,2] -> idx = 1; idx=(1, 2)
C1. a[Z=3] -> idx = {"Z": 3}
C2. a[Z=3, R=4] -> idx = {"Z"=3, "R"=4}
C3. a[1, Z=3] -> {0: 1, "Z": 3}
C4. a[1, Z=3, R=4] -> {0: 1, "Z": 3, "R": 4}

C5. a[1, 2, Z=3] -> {0: 1, 1: 2, "Z": 3}

Post by Stefano Borini
C6. a[1, 2, Z=3, R=4] -> {0: 1, 1: 2, "Z": 3, "R": 4}

C7. a[1, Z=3, 2, R=4] -> raise SyntaxError

Post by Stefano Borini
Note that idx[0] would have the same value it would have in the normal

__getitem__ call while in all cases above idx[3] would raise an exception.
It would not be the case that a[1,2] and a[x=1,y=2] would be
interchangeable as they would for function calls. That would still have to
be handled by the __getitem__ function itself. But it's fairly easy to
write a function that does that:

def extract_indexes(idx, args):
# args is a list of tuples either (key, default) or (key,) if no default
result = []
for i, arg in zip(itertools.count(), args):
if i in idx and arg[0] in idx:
raise IndexError
result.append(idx[i] if i in idx else idx[arg[0]] if arg[0] in idx
else arg[1])
return result

This raises IndexError if a key value is specified both positionally and by
name or if a missing key value does not have a default. It should also (but
does not) raise IndexError when idx contains extra keys not listed in args.
It also doesn't support unnamed (positional only) indexes. Neither of those
is difficult to add.

--- Bruce
Learn how hackers think: http://j.mp/gruyere-security
https://www.linkedin.com/in/bruceleban