Discussion:
Bitwise operations on bytes class
Nathaniel McCallum
2014-06-16 18:03:30 UTC
Permalink
I find myself, fairly often, needing to perform bitwise operations
(rshift, lshift, and, or, xor) on arrays of bytes in python (both bytes
and bytearray). I can't think of any other reasonable use for these
operators. Is upstream Python interested in this kind of behavior by
default? At the least, it would make many algorithms very easy to read
and write.

Nathaniel
Terry Reedy
2014-06-16 19:20:33 UTC
Permalink
Post by Nathaniel McCallum
I find myself, fairly often, needing to perform bitwise operations
(rshift, lshift, and, or, xor) on arrays of bytes in python (both bytes
and bytearray).
If you are often doing and/or/xor on large arrays, as one might do for
bitmap images, you should probably be using numpy or a derivative thereof.

What use do you have for shifting bits across byte boundaries, where the
bytes are really bytes? Why would you not turn multiple bytes
considered together into an int?
Post by Nathaniel McCallum
I can't think of any other reasonable use for these operators.
I don't understand this. They are routinely used on ints for various
purposes.
--
Terry Jan Reedy
Nathaniel McCallum
2014-06-16 19:43:33 UTC
Permalink
Post by Terry Reedy
Post by Nathaniel McCallum
I find myself, fairly often, needing to perform bitwise operations
(rshift, lshift, and, or, xor) on arrays of bytes in python (both bytes
and bytearray).
If you are often doing and/or/xor on large arrays, as one might do for
bitmap images, you should probably be using numpy or a derivative thereof.
What use do you have for shifting bits across byte boundaries, where the
bytes are really bytes? Why would you not turn multiple bytes
considered together into an int?
There are many reasons. Anything relating to cryptography, key
derivation, asn1 BitString, etc. Many network protocols have specialized
algorithms which require bit rotations or bitwise operations on blocks.
Post by Terry Reedy
Post by Nathaniel McCallum
I can't think of any other reasonable use for these operators.
I don't understand this. They are routinely used on ints for various
purposes.
I meant that, for instance, I can't think of any other reasonable
interpretation for what "bytes() ^ bytes()" would mean other than a
bitwise xor of the bytes in the arrays. Yes, of course the operators
have meanings in other contexts. But in this context, I think the
meaning of the operators is self-evident and precise in meaning.

Perhaps some code will clarify what I'm proposing. Attached is a class I
have found continual reuse for over the last few years. It implements
bitwise operators on a bytes subclass. Something similar could be done
for bytearray.

Nathaniel
Stefan Behnel
2014-06-16 19:55:40 UTC
Permalink
Post by Nathaniel McCallum
Perhaps some code will clarify what I'm proposing. Attached is a class I
have found continual reuse for over the last few years. It implements
bitwise operators on a bytes subclass. Something similar could be done
for bytearray.
Ok, according to your code, you don't want a SIMD type but rather an
arbitrary size integer type. Why don't you just use the "int" ("long" in
Py2) type for that? It has way faster operations than your multiple copy
implementation.

Stefan
Nathaniel McCallum
2014-06-16 20:16:13 UTC
Permalink
Post by Stefan Behnel
Post by Nathaniel McCallum
Perhaps some code will clarify what I'm proposing. Attached is a class I
have found continual reuse for over the last few years. It implements
bitwise operators on a bytes subclass. Something similar could be done
for bytearray.
Ok, according to your code, you don't want a SIMD type but rather an
arbitrary size integer type. Why don't you just use the "int" ("long" in
Py2) type for that? It has way faster operations than your multiple copy
implementation.
Of course my attached code is slow. This is precisely why I'm proposing
native additions to the bytes class.

However, in most algorithms, there is a single operation like this on a
block of data which is otherwise not treated as an integer. This
operation often takes the form of something like:

blocks.append(blocks[-1] ^ block)

In all the surrounding code, you are dealing with bytes *as* bytes.
Converting into alternate types breaks up the readability of the
algorithm. And given the security requirements of such algorithms,
readability is extremely important.

The above code example has both simplicity and obviousness. Currently,
in py3k, this is AFAICS the best alternative for readability:

blocks.append([a ^ b for a, b in zip(blocks[-1], block)]

While this is infinitely better than Python 2.x, I think my proposal is
still significantly more readable. When implemented natively, my
proposal is also far more performant than this.

Nathaniel
Nathaniel McCallum
2014-06-16 20:22:28 UTC
Permalink
Post by Nathaniel McCallum
Post by Stefan Behnel
Post by Nathaniel McCallum
Perhaps some code will clarify what I'm proposing. Attached is a class I
have found continual reuse for over the last few years. It implements
bitwise operators on a bytes subclass. Something similar could be done
for bytearray.
Ok, according to your code, you don't want a SIMD type but rather an
arbitrary size integer type. Why don't you just use the "int" ("long" in
Py2) type for that? It has way faster operations than your multiple copy
implementation.
Of course my attached code is slow. This is precisely why I'm proposing
native additions to the bytes class.
However, in most algorithms, there is a single operation like this on a
block of data which is otherwise not treated as an integer. This
blocks.append(blocks[-1] ^ block)
In all the surrounding code, you are dealing with bytes *as* bytes.
Converting into alternate types breaks up the readability of the
algorithm. And given the security requirements of such algorithms,
readability is extremely important.
The above code example has both simplicity and obviousness. Currently,
blocks.append([a ^ b for a, b in zip(blocks[-1], block)]
While this is infinitely better than Python 2.x, I think my proposal is
still significantly more readable. When implemented natively, my
proposal is also far more performant than this.
Also, when implemented on bytearray, you can get things like this:
cksum ^= block.

This can be very fast as it can be done with no copies. It is also
extremely readable.

Nathaniel
Greg Ewing
2014-06-16 21:53:03 UTC
Permalink
Post by Nathaniel McCallum
In all the surrounding code, you are dealing with bytes *as* bytes.
Converting into alternate types breaks up the readability of the
algorithm. And given the security requirements of such algorithms,
readability is extremely important.
Not to mention needlessly inefficient.

There's also the issue that you are usually dealing
with a specific number of bits. When you convert to
an int, you lose any notion of it having a size, so
you have to keep track of that separately, and take
its effect on the bitwise operations into account
manually.

E.g. the bitwise complement of an N-bit string is
another N-bit string. But the bitwise complement of
a positive int is a bit string with an infinite
number of leading 1 bits, which you have to mask
off. The bitwise complement of a bytes object, on
the other hand, would be another bytes object of
the same size.
--
Greg
Nathaniel McCallum
2014-06-17 13:24:57 UTC
Permalink
Post by Greg Ewing
Post by Nathaniel McCallum
In all the surrounding code, you are dealing with bytes *as* bytes.
Converting into alternate types breaks up the readability of the
algorithm. And given the security requirements of such algorithms,
readability is extremely important.
Not to mention needlessly inefficient.
There's also the issue that you are usually dealing
with a specific number of bits. When you convert to
an int, you lose any notion of it having a size, so
you have to keep track of that separately, and take
its effect on the bitwise operations into account
manually.
E.g. the bitwise complement of an N-bit string is
another N-bit string. But the bitwise complement of
a positive int is a bit string with an infinite
number of leading 1 bits, which you have to mask
off. The bitwise complement of a bytes object, on
the other hand, would be another bytes object of
the same size.
+1
Chris Angelico
2014-06-16 22:59:30 UTC
Permalink
On Tue, Jun 17, 2014 at 6:16 AM, Nathaniel McCallum
Post by Nathaniel McCallum
Of course my attached code is slow. This is precisely why I'm proposing
native additions to the bytes class.
I presume you're aware that the bytes type is immutable, right? You're
still going to have at least some copying going on, whereas with a
mutable type you might well be able to avoid that. Efficiency suggests
bytearray instead.

ChrisA
Greg Ewing
2014-06-17 00:00:20 UTC
Permalink
Post by Chris Angelico
I presume you're aware that the bytes type is immutable, right? You're
still going to have at least some copying going on, whereas with a
mutable type you might well be able to avoid that. Efficiency suggests
bytearray instead.
Why not both?
--
Greg
Chris Angelico
2014-06-17 00:03:24 UTC
Permalink
On Tue, Jun 17, 2014 at 10:00 AM, Greg Ewing
Post by Greg Ewing
Post by Chris Angelico
I presume you're aware that the bytes type is immutable, right? You're
still going to have at least some copying going on, whereas with a
mutable type you might well be able to avoid that. Efficiency suggests
bytearray instead.
Why not both?
If you do a series of operations on a large bytes object, each one
will involve a full copy. If you do the same series of operations on a
large mutable object, they can be optimized down to non-copying. Why
both?

ChrisA
Nick Coghlan
2014-06-17 06:02:36 UTC
Permalink
Post by Chris Angelico
On Tue, Jun 17, 2014 at 10:00 AM, Greg Ewing
Post by Greg Ewing
Post by Chris Angelico
I presume you're aware that the bytes type is immutable, right? You're
still going to have at least some copying going on, whereas with a
mutable type you might well be able to avoid that. Efficiency suggests
bytearray instead.
Why not both?
If you do a series of operations on a large bytes object, each one
will involve a full copy. If you do the same series of operations on a
large mutable object, they can be optimized down to non-copying. Why
both?
Because the two APIs are currently in sync outside mutating operations, and
there isn't a compelling reason to break that symmetry, even if this
proposal was put forward as a PEP and ultimately accepted.

Cheers,
Nick.
Post by Chris Angelico
ChrisA
_______________________________________________
Python-ideas mailing list
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
Chris Angelico
2014-06-17 06:03:40 UTC
Permalink
Post by Nick Coghlan
Because the two APIs are currently in sync outside mutating operations, and
there isn't a compelling reason to break that symmetry, even if this
proposal was put forward as a PEP and ultimately accepted.
Ah! That would be why. Sorry for the noise!

ChrisA
Nick Coghlan
2014-06-17 08:36:42 UTC
Permalink
Post by Chris Angelico
Post by Nick Coghlan
Because the two APIs are currently in sync outside mutating operations, and
there isn't a compelling reason to break that symmetry, even if this
proposal was put forward as a PEP and ultimately accepted.
Ah! That would be why. Sorry for the noise!
Clarifying non-obvious design principles isn't noise on python-ideas,
it's one of the reasons the list exists :)

Cheers,
Nick.
Chris Angelico
2014-06-17 08:40:55 UTC
Permalink
Post by Nick Coghlan
Post by Chris Angelico
Post by Nick Coghlan
Because the two APIs are currently in sync outside mutating operations, and
there isn't a compelling reason to break that symmetry, even if this
proposal was put forward as a PEP and ultimately accepted.
Ah! That would be why. Sorry for the noise!
Clarifying non-obvious design principles isn't noise on python-ideas,
it's one of the reasons the list exists :)
Then I'm glad to have been able to play the role of The Watson [1] for
the benefit the audience :)

ChrisA
[1] http://tvtropes.org/pmwiki/pmwiki.php/Main/TheWatson
Steven D'Aprano
2014-06-17 00:55:33 UTC
Permalink
Post by Chris Angelico
On Tue, Jun 17, 2014 at 6:16 AM, Nathaniel McCallum
Post by Nathaniel McCallum
Of course my attached code is slow. This is precisely why I'm proposing
native additions to the bytes class.
I presume you're aware that the bytes type is immutable, right? You're
still going to have at least some copying going on, whereas with a
mutable type you might well be able to avoid that. Efficiency suggests
bytearray instead.
The very first sentence of Nathaniel's first post in this thread:

"I find myself, fairly often, needing to perform bitwise operations
(rshift, lshift, and, or, xor) on arrays of bytes in python (both bytes
and bytearray)."

So yes, I think he is aware of it :-)
--
Steven
Daniel Holth
2014-06-16 20:01:11 UTC
Permalink
Interesting idea. I like it.

I notice Python 3 has int.from_bytes() and int.to_bytes().

On Mon, Jun 16, 2014 at 3:43 PM, Nathaniel McCallum
Post by Nathaniel McCallum
Post by Terry Reedy
Post by Nathaniel McCallum
I find myself, fairly often, needing to perform bitwise operations
(rshift, lshift, and, or, xor) on arrays of bytes in python (both bytes
and bytearray).
If you are often doing and/or/xor on large arrays, as one might do for
bitmap images, you should probably be using numpy or a derivative thereof.
What use do you have for shifting bits across byte boundaries, where the
bytes are really bytes? Why would you not turn multiple bytes
considered together into an int?
There are many reasons. Anything relating to cryptography, key
derivation, asn1 BitString, etc. Many network protocols have specialized
algorithms which require bit rotations or bitwise operations on blocks.
Post by Terry Reedy
Post by Nathaniel McCallum
I can't think of any other reasonable use for these operators.
I don't understand this. They are routinely used on ints for various
purposes.
I meant that, for instance, I can't think of any other reasonable
interpretation for what "bytes() ^ bytes()" would mean other than a
bitwise xor of the bytes in the arrays. Yes, of course the operators
have meanings in other contexts. But in this context, I think the
meaning of the operators is self-evident and precise in meaning.
Perhaps some code will clarify what I'm proposing. Attached is a class I
have found continual reuse for over the last few years. It implements
bitwise operators on a bytes subclass. Something similar could be done
for bytearray.
Nathaniel
_______________________________________________
Python-ideas mailing list
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
Guido van Rossum
2014-06-16 20:21:51 UTC
Permalink
As additional input to thsi discussion I would like to remind you all that
it's not a good idea to have every operator apply to every data type, as
this increases the chances that bugs percolate up to a point where it's
hard to figure out where an unexpected value was generated. IOW, just
because there's no current meaning for e.g. b^b, that doesn't necessarily
make it a good idea to add one. (There are other arguments from language
usability against adding new operations indiscriminately, but this in
particular jumped out at me.)
--
--Guido van Rossum (python.org/~guido)
Nathaniel McCallum
2014-06-16 20:28:00 UTC
Permalink
Post by Guido van Rossum
As additional input to thsi discussion I would like to remind you all
that it's not a good idea to have every operator apply to every data
type, as this increases the chances that bugs percolate up to a point
where it's hard to figure out where an unexpected value was generated.
IOW, just because there's no current meaning for e.g. b^b, that
doesn't necessarily make it a good idea to add one. (There are other
arguments from language usability against adding new operations
indiscriminately, but this in particular jumped out at me.)
Agreed. My only thought here was that this addition seems to me to be
extremely natural and emulates the precise grammar that is very often
seen in algorithms in IETF RFCs (for instance). But the precise
threshold of "too many operators" can be difficult to gauge. That is
probably above my pay grade. :)

Nathaniel
Antoine Pitrou
2014-06-16 20:38:00 UTC
Permalink
There's a bitstring package on PyPI, perhaps it has the desired operations:
https://pypi.python.org/pypi/bitstring/

Regards

Antoine.
Post by Nathaniel McCallum
Post by Guido van Rossum
As additional input to thsi discussion I would like to remind you all
that it's not a good idea to have every operator apply to every data
type, as this increases the chances that bugs percolate up to a point
where it's hard to figure out where an unexpected value was generated.
IOW, just because there's no current meaning for e.g. b^b, that
doesn't necessarily make it a good idea to add one. (There are other
arguments from language usability against adding new operations
indiscriminately, but this in particular jumped out at me.)
Agreed. My only thought here was that this addition seems to me to be
extremely natural and emulates the precise grammar that is very often
seen in algorithms in IETF RFCs (for instance). But the precise
threshold of "too many operators" can be difficult to gauge. That is
probably above my pay grade. :)
Nathaniel
_______________________________________________
Python-ideas mailing list
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
Serhiy Storchaka
2014-06-17 19:29:56 UTC
Permalink
Post by Antoine Pitrou
https://pypi.python.org/pypi/bitstring/
And bitarray:

https://pypi.python.org/pypi/bitarray
Nick Coghlan
2014-06-16 22:48:51 UTC
Permalink
Post by Nathaniel McCallum
Post by Terry Reedy
Post by Nathaniel McCallum
I find myself, fairly often, needing to perform bitwise operations
(rshift, lshift, and, or, xor) on arrays of bytes in python (both bytes
and bytearray).
If you are often doing and/or/xor on large arrays, as one might do for
bitmap images, you should probably be using numpy or a derivative thereof.
What use do you have for shifting bits across byte boundaries, where the
bytes are really bytes? Why would you not turn multiple bytes
considered together into an int?
There are many reasons. Anything relating to cryptography, key
derivation, asn1 BitString, etc. Many network protocols have specialized
algorithms which require bit rotations or bitwise operations on blocks.
I used to want something like this when trying to deal with bit slips on
serial channels - sliding a pattern one bit to the left or right was a pain.

It makes more sense on the bytes type to me than it does on multibyte array
formats (which would suffer from messy endianness issues).

As Nathaniel noted, there's no other obvious meaning for these operations
on the binary data types, and it would definitely make bitbashing in Python
easier (something that will only become more common with the rise of things
like Arduino, Raspberry Pi and MicroPython).

Cheers,
Nick.
Post by Nathaniel McCallum
Post by Terry Reedy
Post by Nathaniel McCallum
I can't think of any other reasonable use for these operators.
I don't understand this. They are routinely used on ints for various
purposes.
I meant that, for instance, I can't think of any other reasonable
interpretation for what "bytes() ^ bytes()" would mean other than a
bitwise xor of the bytes in the arrays. Yes, of course the operators
have meanings in other contexts. But in this context, I think the
meaning of the operators is self-evident and precise in meaning.
Perhaps some code will clarify what I'm proposing. Attached is a class I
have found continual reuse for over the last few years. It implements
bitwise operators on a bytes subclass. Something similar could be done
for bytearray.
Nathaniel
_______________________________________________
Python-ideas mailing list
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
Stefan Behnel
2014-06-16 19:25:58 UTC
Permalink
Post by Nathaniel McCallum
I find myself, fairly often, needing to perform bitwise operations
(rshift, lshift, and, or, xor) on arrays of bytes in python (both bytes
and bytearray). I can't think of any other reasonable use for these
operators. Is upstream Python interested in this kind of behavior by
default? At the least, it would make many algorithms very easy to read
and write.
ISTM that what you're asking for is essentially a SIMD data type, which
certainly has a lot of nice applications. However, restricting it to byte
values seems to be a rather niche use case to me. IMHO, this seems much
better suited for the array module than the "bytes as in string" general
purpose bytes type. The array module has support for all sorts of C-ish
integer types.

Different ways to handle errors (e.g. overflows) across the array would be
another reason to not push this into the bytes type.

Stefan
Ethan Furman
2014-06-16 19:03:08 UTC
Permalink
Post by Nathaniel McCallum
I find myself, fairly often, needing to perform bitwise operations
(rshift, lshift, and, or, xor) on arrays of bytes in python (both bytes
and bytearray). I can't think of any other reasonable use for these
operators. Is upstream Python interested in this kind of behavior by
default? At the least, it would make many algorithms very easy to read
and write.
Could you give a couple examples?

--
~Ethan~
Ethan Furman
2014-06-17 19:35:02 UTC
Permalink
Post by Nathaniel McCallum
I find myself, fairly often, needing to perform bitwise operations
(rshift, lshift, and, or, xor) on arrays of bytes in python (both bytes
and bytearray). I can't think of any other reasonable use for these
operators. Is upstream Python interested in this kind of behavior by
default? At the least, it would make many algorithms very easy to read
and write.
I like the idea, but one question I have: when shifting, are the incoming bits set to 0 or 1? Why?

--
~Ethan~
Antoine Pitrou
2014-06-17 20:37:29 UTC
Permalink
Post by Ethan Furman
I like the idea, but one question I have: when shifting, are the
incoming bits set to 0 or 1? Why?
By convention, 0. Historically, that's how CPUs do it.
(and also because it provides a quick way of multiplying / dividing by 2^N).

Regards

Antoine.
MRAB
2014-06-17 21:33:29 UTC
Permalink
Post by Antoine Pitrou
Post by Ethan Furman
I like the idea, but one question I have: when shifting, are the
incoming bits set to 0 or 1? Why?
By convention, 0. Historically, that's how CPUs do it.
(and also because it provides a quick way of multiplying / dividing by 2^N).
That's sometimes known as a "logical shift".

When shifting to the right, there's also the "arithmetic shift", which
preserves the most significant bit.

Do we need that too? (I don't think so.) If yes, then what should be
operator be? Just a 'normal' method call?
Nick Coghlan
2014-06-17 22:10:26 UTC
Permalink
Post by MRAB
Post by Antoine Pitrou
Post by Ethan Furman
I like the idea, but one question I have: when shifting, are the
incoming bits set to 0 or 1? Why?
By convention, 0. Historically, that's how CPUs do it.
(and also because it provides a quick way of multiplying / dividing by 2^N).
That's sometimes known as a "logical shift".
My bitbashing-with-Python work was all serial communications protocol
based, so logical shifts were what I wanted (I was also in the fortunate
position of being able to tolerate the slow speed of doing them in Python,
because HF radio comms are so slow the data streams to be analysed weren't
very big).
Post by MRAB
When shifting to the right, there's also the "arithmetic shift", which
preserves the most significant bit.
Do we need that too? (I don't think so.) If yes, then what should be
operator be? Just a 'normal' method call?
Wanting an arithmetic shift would be a sign that one is working with
integers rather than arbitrary binary data, and ints or one of the fixed
width types from NumPy would likely be a better fit. So leaving that out of
any proposal sounds fine to me.

Cheers,
Nick.
Post by MRAB
_______________________________________________
Python-ideas mailing list
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
MRAB
2014-06-17 23:30:36 UTC
Permalink
Post by Nick Coghlan
Post by MRAB
Post by Antoine Pitrou
Post by Ethan Furman
I like the idea, but one question I have: when shifting, are the
incoming bits set to 0 or 1? Why?
By convention, 0. Historically, that's how CPUs do it.
(and also because it provides a quick way of multiplying /
dividing by 2^N).
Post by Nick Coghlan
Post by MRAB
That's sometimes known as a "logical shift".
My bitbashing-with-Python work was all serial communications protocol
based, so logical shifts were what I wanted (I was also in the fortunate
position of being able to tolerate the slow speed of doing them in
Python, because HF radio comms are so slow the data streams to be
analysed weren't very big).
Post by Nick Coghlan
Post by MRAB
When shifting to the right, there's also the "arithmetic shift", which
preserves the most significant bit.
Do we need that too? (I don't think so.) If yes, then what should be
operator be? Just a 'normal' method call?
Wanting an arithmetic shift would be a sign that one is working with
integers rather than arbitrary binary data, and ints or one of the fixed
width types from NumPy would likely be a better fit. So leaving that out
of any proposal sounds fine to me.
What about rotates?
Nick Coghlan
2014-06-18 02:34:42 UTC
Permalink
Post by MRAB
Post by Nick Coghlan
Wanting an arithmetic shift would be a sign that one is working with
integers rather than arbitrary binary data, and ints or one of the fixed
width types from NumPy would likely be a better fit. So leaving that out of
any proposal sounds fine to me.
Post by MRAB
What about rotates?
Bitwise rotation would be a bit of a pain to build on top of bitwise
masking and logical shifts, but it could be done, so I think it would make
more sense to keep a proposal minimal.

Cheers,
Nick.
Post by MRAB
_______________________________________________
Python-ideas mailing list
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
Nathaniel McCallum
2014-06-18 05:03:02 UTC
Permalink
Post by MRAB
Post by Nick Coghlan
Wanting an arithmetic shift would be a sign that one is working
with integers rather than arbitrary binary data, and ints or one of
the fixed width types from NumPy would likely be a better fit. So
leaving that out of any proposal sounds fine to me.
Post by MRAB
What about rotates?
Bitwise rotation would be a bit of a pain to build on top of bitwise
masking and logical shifts, but it could be done, so I think it would
make more sense to keep a proposal minimal.
Agreed. The code that I attached to one of my early replies actually
implemented rotate, but I don't think that is what should be implemented
by default in this proposal.

Nathaniel
Nathaniel McCallum
2014-06-18 15:35:28 UTC
Permalink
Post by Nathaniel McCallum
I find myself, fairly often, needing to perform bitwise operations
(rshift, lshift, and, or, xor) on arrays of bytes in python (both bytes
and bytearray). I can't think of any other reasonable use for these
operators. Is upstream Python interested in this kind of behavior by
default? At the least, it would make many algorithms very easy to read
and write.
So it seems to me that there is a consensus that something like this is
a good idea, with perhaps the exception of Guido's reminder to not
overpopulate the operators (is that a no for this proposal?).

Summarizing:

1. In lshift, what bits are introduced on the right-hand side? Zero is
traditional.

2. In rshift, what bits are introduced on the left-hand side? An
argument can be made for either zero (logical) or retaining the
left-most bit (arithmetic). The 'arithmetic shift' seems to fit the
sphere of NumPy. Zero should be preferred.

3. Rotates and other common operations are out of scope for this
proposal.

4. One question not discussed is what to do when attempting to
and/or/xor against a bytes() or bytearray() that is of a different
length. Should we left-align the shorter of the two? Right-align? Throw
an exception?

Also, I'm new to this process. Where should I go from here? Do I need to
form a PEP?

Nathaniel
Antoine Pitrou
2014-06-18 15:51:36 UTC
Permalink
Post by Nathaniel McCallum
Post by Nathaniel McCallum
I find myself, fairly often, needing to perform bitwise operations
(rshift, lshift, and, or, xor) on arrays of bytes in python (both bytes
and bytearray). I can't think of any other reasonable use for these
operators. Is upstream Python interested in this kind of behavior by
default? At the least, it would make many algorithms very easy to read
and write.
So it seems to me that there is a consensus that something like this is
a good idea, with perhaps the exception of Guido's reminder to not
overpopulate the operators (is that a no for this proposal?).
Rather than adding new operations to bytes/bytearray, an alternative is
a separate type ("bitview"?) which would take a writable buffer as
argument and then provide the operations over that buffer.

It would allow make the operations compatible with other writable buffer
types such as numpy arrays, etc.

Regards

Antoine.
Alexander Belopolsky
2014-06-18 16:05:30 UTC
Permalink
Rather than adding new operations to bytes/bytearray, an alternative is a
separate type ("bitview"?) which would take a writable buffer as argument
and then provide the operations over that buffer.
+1

.. and it does not have to be part of stdlib. The advantage of
implementing this outside of stdlib is that users of older versions of
Python will benefit immediately.
Nathaniel McCallum
2014-06-18 16:20:37 UTC
Permalink
Post by Antoine Pitrou
Rather than adding new operations to bytes/bytearray, an
alternative is a separate type ("bitview"?) which would take a
writable buffer as argument and then provide the operations
over that buffer.
+1
.. and it does not have to be part of stdlib. The advantage of
implementing this outside of stdlib is that users of older versions of
Python will benefit immediately.
Older versions of Python can just do:
third = [a ^ b for a, b in zip(first, second)]

The problem is that this is more expensive and less readable than:
third = first ^ second
... or ...
first ^= second

I'm not making this proposal on the basis that something can't be done
already, but based on the fact that implementing it natively as part of
the base types is a natural growth of the language.

Of course this can be implemented in a module at the cost of "batteries
included," a new dependency, readability and perhaps some additional
overhead. I, for one, would not use such a module and would just
implement the operations myself (as I have done for the last several
years).

The reason for this proposal is that such operations seem to me to be
extremely natural to bytes/bytearray. And I think at least some others
agree.

Nathaniel
Serhiy Storchaka
2014-06-18 18:52:04 UTC
Permalink
Post by Antoine Pitrou
Rather than adding new operations to bytes/bytearray, an alternative is
a separate type ("bitview"?) which would take a writable buffer as
argument and then provide the operations over that buffer.
+1

Loading...