Discussion:
lazy tuple unpacking
Paul Tagliamonte
2014-07-05 00:59:30 UTC
Permalink
... for y in range(n):
... yield y
...
a, b, c, *others = g_range(100)
Works great. Super useful stuff there. Looks good.

I also notice that this causes *others to consume the generator
in a greedy way.
type(others)
<class 'list'>

And this makes me sad.
a, b, c, *others = g_range(10000000000)
# will also make your machine very sad. Eventually resulting
# (ok, unless you've got a really fancy bit of kit) in:
Killed
_x = g_range(1000000000000)
a = next(_x)
b = next(_x)
c = next(_x)
others = _x
Of course, this leads to all sorts of fun errors, like the fact you
couldn't iterate over it twice. This might not be expected. However, it
might be nice to have this behavior when you're unpacking a generator.

Thoughts?
Paul
Chris Rebert
2014-07-05 01:10:44 UTC
Permalink
On Fri, Jul 4, 2014 at 5:59 PM, Paul Tagliamonte <paultag-***@public.gmane.org> wrote:
<snip>
Post by Paul Tagliamonte
a, b, c, *others = g_range(100)
Works great. Super useful stuff there. Looks good.
I also notice that this causes *others to consume the generator
in a greedy way.
type(others)
<class 'list'>
And this makes me sad.
a, b, c, *others = g_range(10000000000)
# will also make your machine very sad. Eventually resulting
Killed
_x = g_range(1000000000000)
a = next(_x)
b = next(_x)
c = next(_x)
others = _x
Of course, this leads to all sorts of fun errors, like the fact you
couldn't iterate over it twice. This might not be expected. However, it
might be nice to have this behavior when you're unpacking a generator.
Thoughts?
It would mean an (IMHO undesirable) loss of consistency/symmetry,
type-wise, with other unpackings where this generator optimization
isn't possible:

Python 3.4.1 (default, May 19 2014, 13:10:29)
Post by Paul Tagliamonte
x = [1,2,3,4,5,6,7]
a,b,*c,d = x
c
[3, 4, 5, 6]
Post by Paul Tagliamonte
*e,f,g = x
e
[1, 2, 3, 4, 5]

Cheers,
Chris
Paul Tagliamonte
2014-07-05 01:12:49 UTC
Permalink
Post by Chris Rebert
It would mean an (IMHO undesirable) loss of consistency/symmetry,
type-wise, with other unpackings where this generator optimization
Python 3.4.1 (default, May 19 2014, 13:10:29)
x = [1,2,3,4,5,6,7]
a,b,*c,d = x
c
[3, 4, 5, 6]
*e,f,g = x
e
[1, 2, 3, 4, 5]
Cheers,
Chris
Euch, good point. This feature might just be DOA.

Thanks!
Paul
Andrew Barnert
2014-07-05 11:26:26 UTC
Permalink
Post by Chris Rebert
<snip>
    >>> a, b, c, *others = g_range(100)
Works great. Super useful stuff there. Looks good.
I also notice that this causes *others to consume the generator
in a greedy way.
    >>> type(others)
    <class 'list'>
And this makes me sad.
    >>> a, b, c, *others = g_range(10000000000)
    # will also make your machine very sad. Eventually resulting
    Killed
    >>> _x = g_range(1000000000000)
    >>> a = next(_x)
    >>> b = next(_x)
    >>> c = next(_x)
    >>> others = _x
    >>>
Of course, this leads to all sorts of fun errors, like the fact you
couldn't iterate over it twice. This might not be expected. However, it
might be nice to have this behavior when you're unpacking a generator.
Thoughts?
It would mean an (IMHO undesirable) loss of consistency/symmetry,
type-wise, with other unpackings where this generator optimization
Python 3.4.1 (default, May 19 2014, 13:10:29)
x = [1,2,3,4,5,6,7]
a,b,*c,d = x
c
[3, 4, 5, 6]
*e,f,g = x
e
[1, 2, 3, 4, 5]
When I was experimenting with adding lazy lists (sequences that wrap an iterator and request each value on first request), I played around with PyPy (Python 2.x, not 3.x), making unpacking (as well as map, filter, etc.) return them instead of iterators. (I also tried to make generator functions and expressions return them, but I couldn't get that to work in a quick hack…)

It worked nicely, and I think it would work with the expanded unpacking in 3.x. In Paul's original example, others is a lazy list of 9999999997 elements, which will only evaluate the ones you actually ask for. In Chris's examples, c and e are fully-evaluated lazy lists of 4 or 3 elements, respectively.

But, even if this weren't a ridiculously radical change to the language, I don't think it's what you'd want.

First, an iterator over 9999999997 elements is a lot more useful than a lazy list of 9999999997 elements, because you can iterate the whole thing without running out of memory.

Second, it wouldn't help in a case like this:

    a, b, *c, d, e = range(10000000000)

To make that work, you'd need something smarter than just using a lazy list instead of an iterator.

One way to solve it is to try to keep the original type, so unpacking maps to something like this:

    try:
        a, b, c, d, e = i[0], i[1], i[2:-2], i[-2], i[-1]
    except TypeError:
        # current behavior

Then c ends up as range(2, 9999999998), which is the best possible thing you could get there.

You could take this even further by adding either a notion of bidirectional and forward-only sequences, or a notion of reversible iterables, but that's getting much farther into left field; if anyone's interested, see http://stupidpythonideas.blogspot.com/2014/07/lazy-tuple-unpacking.html for details.
Ian Cordasco
2014-07-05 01:19:01 UTC
Permalink
Received: from localhost (HELO mail.python.org) (127.0.0.1)
by albatross.python.org with SMTP; 05 Jul 2014 03:19:08 +0200
Received: from mail-la0-x235.google.com (unknown
[IPv6:2a00:1450:4010:c03::235])
(using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits))
(No client certificate requested)
by mail.python.org (Postfix) with ESMTPS
for <python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>; Sat, 5 Jul 2014 03:19:08 +0200 (CEST)
Received: by mail-la0-f53.google.com with SMTP id b8so1535652lan.12
for <python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>; Fri, 04 Jul 2014 18:19:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:cc:content-type;
bh=Uhkt/2HaJcfkhfj4o+/w8MHTwLyQX9vSoIndcIBbjJ4=;
b=tY/fTX4MCGw3TE95bO11TB/nn+LJzbPIoJMkeUQ3K6rEXexx2WdThKyWtP8Y7MF23o
5puYlJlS/IKfHMdt/PLJ4RPf6k5YRmjmc2YtP/bIDEcPluwBzEDHCwV72qX641gUmngB
sVYjhSmZ3tU01Iu/Oldgx1UF0TIhSJOMr+P/TMj/uPHKnJYQtDlGtRIBcdTwbaQTODFT
VtuNvMmAkxpqfuWXqrRt9RfAa5vS3jObeeDTdkuNOYvK0LSgBu7zAu8le9A2Nsa1AXh2
TIKdeDAZH6eXmsrL6cMD2tDR7Ce+WhMRFXnb3KJhKpx6Aq5K79hQymcLUwwMQAJqYFjD
HS/w==
X-Received: by 10.112.180.70 with SMTP id dm6mr10494873lbc.32.1404523141688;
Fri, 04 Jul 2014 18:19:01 -0700 (PDT)
Received: by 10.112.137.225 with HTTP; Fri, 4 Jul 2014 18:19:01 -0700 (PDT)
In-Reply-To: <20140705005930.GA7612-***@public.gmane.org>
X-BeenThere: python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Discussions of speculative Python language ideas
<python-ideas.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-ideas>,
<mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=unsubscribe>
List-Archive: <http://mail.python.org/pipermail/python-ideas/>
List-Post: <mailto:python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>
List-Help: <mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-ideas>,
<mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=subscribe>
Errors-To: python-ideas-bounces+gcpi-python-ideas=m.gmane.org-+ZN9ApsXKcEdnm+***@public.gmane.org
Sender: "Python-ideas"
<python-ideas-bounces+gcpi-python-ideas=m.gmane.org-+ZN9ApsXKcEdnm+***@public.gmane.org>
Archived-At: <http://permalink.gmane.org/gmane.comp.python.ideas/28303>
Post by Paul Tagliamonte
... yield y
...
a, b, c, *others = g_range(100)
Works great. Super useful stuff there. Looks good.
I also notice that this causes *others to consume the generator
in a greedy way.
type(others)
<class 'list'>
And this makes me sad.
a, b, c, *others = g_range(10000000000)
# will also make your machine very sad. Eventually resulting
Killed
_x = g_range(1000000000000)
a = next(_x)
b = next(_x)
c = next(_x)
others = _x
Of course, this leads to all sorts of fun errors, like the fact you
couldn't iterate over it twice. This might not be expected. However, it
might be nice to have this behavior when you're unpacking a generator.
Thoughts?
I agree that the behaviour is suboptimal, but as Chris already pointed
out it would introduce a significant inconsistency in the API of
unpacking. I'm struggling to see a *good* way of doing this. My first
instinct was that we could make something like this do what you
Post by Paul Tagliamonte
a, b, c, others = g_range(some_really_big_number)
others
<generator ...>

But this doesn't work like this currently because Python currently
raises a ValueError because there were too many values to unpack. I'm
also against introducing some new syntax to add the behaviour.
Todd
2014-07-08 08:30:16 UTC
Permalink
Post by Paul Tagliamonte
... yield y
...
a, b, c, *others = g_range(100)
Works great. Super useful stuff there. Looks good.
I also notice that this causes *others to consume the generator
in a greedy way.
type(others)
<class 'list'>
And this makes me sad.
a, b, c, *others = g_range(10000000000)
# will also make your machine very sad. Eventually resulting
Killed
_x = g_range(1000000000000)
a = next(_x)
b = next(_x)
c = next(_x)
others = _x
Of course, this leads to all sorts of fun errors, like the fact you
couldn't iterate over it twice. This might not be expected. However, it
might be nice to have this behavior when you're unpacking a generator.
Thoughts?
Paul
Besides the issues others have discussed, another issue I see here is that
you are basically copying the iterator. In the case of this, where "gen_a"
Post by Paul Tagliamonte
a, b, c, *others = gen_a
"others" should be the same as "gen_a" in the end (i.e. "others is gen_a ==
True"). This seems redundant, especially when we have the itertools "take"
recipe which can be used to retrieve the first "n" values of an iterator,
which can then be unpacked in whatever way you want.


However, there might be an alternative. You could have something where, if
you are unpacking an iterable to N variables, you can tell it to just
unpack the first N values, and the iterable then remains at position N
(similar to "take", but integrated more deeply). For the case of something
like a list or tuple, it will just unpack those variables and skip the
Post by Paul Tagliamonte
a, b, c = gen_a.unpack()
Or some sort of syntax to say that the remaining values should be skipped
(although I don't really know what syntax would be good here, the syntax I
Post by Paul Tagliamonte
a, b, c, [] = gen_a
Of course with "take" so simple to implement, this is probably way
overkill. I also don't know if it is even possible for the right side of
the expression to know how the layout of the left in that way.
Paul Tagliamonte
2014-07-08 16:17:37 UTC
Permalink
Post by Todd
Besides the issues others have discussed, another issue I see here is that
you are basically copying the iterator.  In the case of this, where
a, b, c, *others = gen_a
"others" should be the same as "gen_a" in the end (i.e. "others is gen_a
== True").  This seems redundant, especially when we have the itertools
"take" recipe which can be used to retrieve the first "n" values of an
iterator, which can then be unpacked in whatever way you want.
However, there might be an alternative.  You could have something where,
if you are unpacking an iterable to N variables, you can tell it to just
unpack the first N values, and the iterable then remains at position N
(similar to "take", but integrated more deeply).  For the case of
something like a list or tuple, it will just unpack those variables and
a, b, c = gen_a.unpack()
Or some sort of syntax to say that the remaining values should be skipped
(although I don't really know what syntax would be good here, the syntax I
a, b, c, [] = gen_a
Of course with "take" so simple to implement, this is probably way
overkill.  I also don't know if it is even possible for the right side of
the expression to know how the layout of the left in that way.
Yeah, I think all the productive ideas (thanks, Andrew and Todd) to make
this happen are mostly starting to converge on full-blown lazy lists,
which is to say, generators which are indexable, sliceable, and work from both
ends (which is to say: more than the current iterator protocol).

I totally like the idea, not sure how keen everyone will be about it.

I'm not sure I have the energy or drive to try and convince everyone on
python-dev this is a good idea, but I'd totally love to play around
with this. Anyone else?

Cheers,
Paul
--
#define sizeof(x) rand()
</paul>
:wq
Andrew Barnert
2014-07-08 18:25:36 UTC
Permalink
Post by Paul Tagliamonte
Yeah, I think all the productive ideas (thanks, Andrew and Todd) to make
this happen are mostly starting to converge on full-blown lazy lists,
which is to say, generators which are indexable, sliceable, and work from both
ends (which is to say: more than the current iterator protocol).
I don't think that's necessarily true.

First, I think the idea of just trying to index and slice the iterable and falling back to the current behavior is at least worth thinking about. It wouldn't solve the problem for iterators, but it would for your example (range), or any other kind of sequence that knows how to slice itself in a better way than making a list.

And to take that farther, you don't necessarily need to replace iterators with lazy lists, just with some kind of sequence. A view is just as indexable, sliceable, and reusable as a lazy list, and better in many ways, and Python already has views like dict_keys built in, and NumPy already uses a similar idea for slicing.

I believe either Guido or Nick has a writeup somewhere on why list slices being new lists rather than views is a good thing, not just a historical accident we're stuck with. But that doesn't mean that having views for many of the cases we use iterators for today (including a view comprehension, a viewtools library, etc.) would necessarily be a bad idea.

And, as I mentioned, expanding the notion of sequence to include weaker notions of bidirectional-only sequence and forward-only sequence eliminates many of the other need for iterators (but not all—a general generator function obviously can't return a reusable forward-only sequence).

If you're interest in more on this, see http://stupidpythonideas.blogspot.com/2014/07/lazy-tuple-unpacking.html and http://stupidpythonideas.blogspot.com/2014/07/swift-style-map-and-filter-views.html for some ideas.
Post by Paul Tagliamonte
I totally like the idea, not sure how keen everyone will be about it.
I'm not sure I have the energy or drive to try and convince everyone on
python-dev this is a good idea, but I'd totally love to play around
with this. Anyone else?
Before trying to convince anyone this is a good idea, first you want to build a lazy-list library: a lazy list type, lazy-list versions of map/filter/dropwhile, etc.

It's actually pretty simple. A lazy list basically looks like this:

    class LazyList(collections.abc.Sequence):
        def __init__(self, iterable):
            self.lst, self.it = [], iter(iterable)
        def __getitem__(self, index):
            while index >= len(self.lst):
                self.lst.append(next(self.it))
            return self.lst[index]

You have to add slice support, set/del, etc., but it's all pretty simply. The only tricky question is what to do about slicing, because you have a choice there. You could just loop over the slice and get/set/del each index, or you could return a new LazyList around islice(self), or you could do the latter if stop is None else the former.

And then all the lazy functions just call the iterator function and wrap the result in a LazyList.

The big problem with lazy lists is that once a value is instantiated, it stays around as long as the list does. So, if you use a lazy list as an iterable, you're basically building the whole list in memory. Iterators obviously don't have that problem.


It's worth looking at Haskell and other lazy functional languages to see why they don't have that problem. Their lists are conses (singly-linked lists with tail sharing). So, making it lazy automatically means that if you just iterate L without keeping a reference, only one cons is around at a time, while if you keep a reference to L, the whole list is available in memory. That won't work for array-based lists like Python's, and I'm not sure how you'd solve that without completely changing the way iteration works in Python. (Of course you can easily implement cons lists in Python, and then make them lazy, but then they're not sequences—in particular, they're not indexable, and generally won't work with any typical Python algorithms that aren't already happy with iterators.)
Andrew Barnert
2014-07-08 17:41:09 UTC
Permalink
a, b, c, *others = gen_a
"others" should be the same as "gen_a" in the end (i.e. "others is gen_a == True").  This seems redundant, especially when we have the itertools "take" recipe which can be used to retrieve the first "n" values of an iterator, which can then be unpacked in whatever way you want.
I don't think the issue he's trying to solve is that others is gen_a, but just that gen_a is not exhausted and copied into a list. If others were a wrapper around gen_a instead, I think that would solve all of the interesting use cases. (But I don't want to put words in Paul Tagliamonte's mouth here, so make sure he confirms it before replying in depth…)
a, b, c = gen_a.unpack()
a, b, c, [] = gen_a
a, b, *, c, d = range(10)
a, b, c, d
(0, 1, 8, 9)
Of course with "take" so simple to implement, this is probably way overkill.  I also don't know if it is even possible for the right side of the expression to know how the layout of the left in that way.
There was a thread a few months back about enhancing the unpacking protocol by asking the iterable itself to do the unpacking (possibly feeding it more information about what needs to be unpacked, something like an __unpack__(self, prestar_count, star_flag, poststar_count)), which would allow the flexibility you're looking for.

I don't want to repeat someone else's use cases and arguments out of my own faulty memory; if you're interested in following up, search the python-ideas archive.

Anyway, your explicit method version could be written today. Obviously it wouldn't work on generators or other arbitrary iterators, but you could write a simple wrapper that takes an iterator and returns an iterator with an unpack method, which I think would be enough for experimenting with the feature.
a, b, *others = [1, 2, 3, 4, 5]
others
<list_iterator at 0x12345678>
next(others)
3
Loading...