Discussion:
pop multiple elements of a list at once
Diego Jacobi
2010-07-11 17:58:43 UTC
Permalink
Hi.
As recommended here: http://bugs.python.org/issue9218
I am posting this to this list.



I am currently working with buffer in an USB device and pyusb.
So when i read from the buffer of endpoint, i get an array.Array() list.
I handle this chunk of data with a thread to send a receive the
information that i need.
In this thread, i load a list with all the information that is read
from the USB device, and another layer with pop this information from
the threads buffer.

The thing i found is that, to pop a variable chunk of data from this
buffer without copying it and deleting the elements, i have to pop one
element at the time.

def get_chunk(self, size):
for x in range(size):
yield self.recv_buffer.pop()

I guess that it would be improved if i can just pop a defined number
of elements, like this:

pop self.recv_buffer[:-size]
or
self.recv_buffer.pop(,-size)

That would be... "pop from (the last element minus size) to (the last element)"
in that way there is only one memory transaction.
The new list (or maybe a tuple) points to the old memory address and
the recv_buffer is advanced to a one new address. Data is not moved.

Note that i like the idea of using "pop" as the "del" operator for
lists, but i am concient that this would not be backward compatible.

Thanks.
Diego
Brett Cannon
2010-07-11 18:47:53 UTC
Permalink
Post by Diego Jacobi
Hi.
As recommended here: http://bugs.python.org/issue9218
I am posting this to this list.
I am currently working with buffer in an USB device and pyusb.
So when i read from the buffer of endpoint, i get an array.Array() list.
I handle this chunk of data with a thread to send a receive the
information that i need.
In this thread, i load a list with all the information that is read
from the USB device, and another layer with pop this information from
the threads buffer.
The thing i found is that, to pop a variable chunk of data from this
buffer without copying it and deleting the elements, i have to pop one
element at the time.
           yield self.recv_buffer.pop()
I guess that it would be improved if i can just pop a defined number
pop self.recv_buffer[:-size]
or
self.recv_buffer.pop(,-size)
That would be... "pop from (the last element minus size) to (the last element)"
in that way there is only one memory transaction.
The new list (or maybe a tuple) points to the old memory address and
the recv_buffer is advanced to a one new address. Data is not moved.
Why can't you do ``del self.recv_buffer[-size:]``?
Post by Diego Jacobi
Note that i like the idea of using "pop" as the "del" operator for
lists, but i am concient that this would not be backward compatible.
Too specialized, so that will never fly.

-Brett
Nick Coghlan
2010-07-11 21:18:49 UTC
Permalink
I think you overestimate how standardised we could make this across
all platforms and data structures. Under the hood, any such expansion
to the .pop API would almost certainly be defined as equivalent to:

def pop(self, index):
result = self[index]
del self[index]
return result

such that slice objects could be passed in as well as integers (or
integer equivalents). (Currently pop on builtin objects rejects slice
objects, as it only works with integers)

In the meantime, if you want to manipulate memory while minimising
copying, then the 2.7 memoryview object may be for you (assuming you
can switch to the later version).

Cheers,
Nick.
--
Nick Coghlan   |   ncoghlan-***@public.gmane.org   |   Brisbane, Australia
Terry Reedy
2010-07-12 05:29:01 UTC
Permalink
Post by Diego Jacobi
The thing i found is that, to pop a variable chunk of data from this
buffer without copying it and deleting the elements, i have to pop one
element at the time.
In CPython, popping copies a reference and them deletes it from the
list. The item popped is not copied. It is a convenience, which I
proposed, but not a necessity. You can easily write a function that
returns a slice after deleting it.

def pop_slice(lis, n):
tem = lis[:-n]
del lis[:-n]
return tem

I expect this to run faster than popping more than a few items one at a
time.
--
Terry Jan Reedy
Guido van Rossum
2010-07-12 06:35:13 UTC
Permalink
Post by Diego Jacobi
I guess that it would be improved if i can just pop a defined number
pop self.recv_buffer[:-size]
or
self.recv_buffer.pop(,-size)
That would be... "pop from (the last element minus size) to (the last element)"
in that way there is only one memory transaction.
The new list (or maybe a tuple) points to the old memory address and
the recv_buffer is advanced to a one new address. Data is not moved.
I think yo misunderstand the implementation of lists (and the
underlying malloc()). You can't break the memory used for the list
elements into two pieces and give new ownership to a (leading) section
of it. However you also seem to be worrying about "copying" too much
-- the only things that get copied are the *pointers* to the objects
popped off the stack, which is very cheap compared to the rest of the
operation. It is true that to pop off a whole slice there is a more
efficient way than calling pop() repeatedly -- but there's no need for
a new primitive operation, as it can already be done by copying and
then deleting the slice (again, the copying only copies the pointers).

Try reading up on Python's memory model for objects, it will be quite
enlightening.
--
--Guido van Rossum (python.org/~guido)
Nick Coghlan
2010-07-12 12:45:10 UTC
Permalink
Post by Guido van Rossum
I think yo misunderstand the implementation of lists (and the
underlying malloc()). You can't break the memory used for the list
elements into two pieces and give new ownership to a (leading) section
of it. However you also seem to be worrying about "copying" too much
-- the only things that get copied are the *pointers* to the objects
popped off the stack, which is very cheap compared to the rest of the
operation. It is true that to pop off a whole slice there is a more
efficient way than calling pop() repeatedly -- but there's no need for
a new primitive operation, as it can already be done by copying and
then deleting the slice (again, the copying only copies the pointers).
Note that the original poster was apparently talking about
array.array() rather than an actual list (at least, that's the way I
interpreted the phrase "array,Array() list"). In that context, the
desire to avoid copying when invoking pop() makes a lot more sense
than it does when using a builtin list.

I agree that the suggestion of reassigning ownership of a chunk of an
array is based on a misunderstanding of the way memory allocation
works at the pymalloc and OS levels though.

For the record, neither pymalloc nor the OS support breaking a chunk
of already allocated memory in two that way - you need some master
object to maintain control of it, and then use other pointers to look
at subsections. Since memoryview objects in 3.x and 2.7 are designed
specifically to provide a window onto a chunk of memory owned by
another object (such as the storage underlying an array object)
without copying, it seems like that is the kind of thing the original
poster is actually looking for.

(That said, supporting slice objects in pop() still doesn't strike me
as an insane idea, although I'd probably want to see some use cases
before we went to the hassle of adding it).

Cheers,
Nick.
--
Nick Coghlan   |   ncoghlan-***@public.gmane.org   |   Brisbane, Australia
Antoine Pitrou
2010-07-12 12:59:56 UTC
Permalink
On Mon, 12 Jul 2010 22:45:10 +1000
Post by Nick Coghlan
For the record, neither pymalloc nor the OS support breaking a chunk
of already allocated memory in two that way - you need some master
object to maintain control of it, and then use other pointers to look
at subsections. Since memoryview objects in 3.x and 2.7 are designed
specifically to provide a window onto a chunk of memory owned by
another object (such as the storage underlying an array object)
without copying, it seems like that is the kind of thing the original
poster is actually looking for.
memoryviews don't provide a high-level view over their chunk of memory,
though, only bytes-level.
(they were specified to provide such a view, but it was never
implemented)
Nick Coghlan
2010-07-12 13:35:57 UTC
Permalink
Post by Antoine Pitrou
memoryviews don't provide a high-level view over their chunk of memory,
though, only bytes-level.
(they were specified to provide such a view, but it was never
implemented)
True, but the use case the original poster mentioned was for a bytes level view.

Cheers,
Nick.
--
Nick Coghlan   |   ncoghlan-***@public.gmane.org   |   Brisbane, Australia
Diego Jacobi
2010-07-12 15:13:34 UTC
Permalink
Hi.
I apologize if i am having difficulties to explain myself in English.
Also i just realise that i wasnt answering to the list, but directly
to Brett Cannon. Sorry for that.

Also i am taking into account that you are right and my concept of how
memory is handled here for lists is way too different as how i am more
used to work in firmwares.

Anyway, the concept of my idea comes of my understanding of an
low-leveled array.

I do understand that poping returns only a pointer to that element.
I didnt understand that every element inside a list is also a pointer
to a python type, so the data is not copied, but the pointer is.
The thing is that the elements on my list are just bytes. (unsigned
char), and i think that an array of this type of data is called in
python immutable, which means that i may not be using the right
datatype.

Anyway. slice support on pop(), and/or the ability to skip elements in
a for loop without restarting the iteration will clear up a lot of
code.



If my scenario is yet no clear i give below some more details:

When i think on how this will work on memory:
def pop_slice(lis, n):
tem = lis[:-n]
del lis[:-n]
return tem

I think in "copying the elements of an array":

BYTE* pop_slice(BYTE* list, unsigned int n){
BYTE* temp = malloc(n*sizeof(BYTE));
int i;
for(i=0 ; i < n ; i++) {
temp[i] = list[i];
}
free(list, n);
return temp;
}


Most python books and tutorials clearly says that this operation
L[start:end] copies the elements requested. And being copy i
understand the above behavior, which is less efficient than advancing
the pointer.



But i wanted to do (with "pop multiple bytes at once") is:


typedef unsigned char BYTE;

BYTE array[BIG_SIZE];
BYTE* incomming_buffer_pointer = &array[0];
BYTE* incomming_packet_pointer = &array[0];

BYTE* pop_slice(BYTE* list, unsigned int n){
BYTE* temp;
temp = list;
list = &list[n];
return temp;
}
..
incomming_packet_pointer = pop_slice( incomming_buffer_pointer , PACKET_SIZE)
if (parse_packet_is_not_corrupt( incomming_packet_pointer ) )
parse_new_packet( incomming_packet_pointer );
else
....
..



Thanks for analizing the idea.
Jacobi Diego
Nick Coghlan
2010-07-12 15:41:48 UTC
Permalink
Post by Diego Jacobi
Hi.
I apologize if i am having difficulties to explain myself in English.
Also i just realise that i wasnt answering to the list, but directly
to Brett Cannon. Sorry for that.
Also i am taking into account that you are right and my concept of how
memory is handled here for lists is way too different as how i am more
used to work in firmwares.
Anyway, the concept of my idea comes of my understanding of an
low-leveled array.
For builtin lists and most other Python container types, think more
along the lines of a list of pointers rather than a list of numbers.
However, for the behaviour of array.array (rather than the builtin
list), your understanding of the cost of slicing was actually pretty
close to correct (since storing actual values rather than pointers is
the way array.array gains its comparative memory efficiency).

Python doesn't actually excel at memory efficiency when manipulating
large data sets using conventional syntax (e.g. slicing). The old
buffer objects were an initial attempt at providing memory efficient
access to segments of data buffers, while the recently added
memoryview objects are a more sophisticated (and safer) approach to
the same idea. The NumPy extension, on the other hand, is able to very
efficiently provide multiple views of the same region of memory
without requiring copying (NumPy itself isn't particularly small
though).

Cheers,
Nick.
--
Nick Coghlan   |   ncoghlan-***@public.gmane.org   |   Brisbane, Australia
Continue reading on narkive:
Loading...