Discussion:
An identity dict
Benjamin Peterson
2010-05-30 03:27:53 UTC
Permalink
In the spirit of collections.OrderedDict and collections.defaultdict, I'd like
to propose collections.identitydict. It would function just like a normal
dictionary, but ignore hash values and comparison operators and merely lookup
keys based on the key's id().

This dict is very useful for keep track of objects that should not be compared
by normal comparison methods. For example, I would use an identitydict to hold
weak references that would otherwise fallback to their referant object's hash.

An advantage of formalizing this in collections would be to enable other Python
implementations like PyPy, where id() is expensive, to provide an optimized
identitydict.

Regards,
Benjamin
Lie Ryan
2010-05-30 09:07:46 UTC
Permalink
Post by Benjamin Peterson
In the spirit of collections.OrderedDict and collections.defaultdict, I'd like
to propose collections.identitydict. It would function just like a normal
dictionary, but ignore hash values and comparison operators and merely lookup
keys based on the key's id().
This dict is very useful for keep track of objects that should not be compared
by normal comparison methods. For example, I would use an identitydict to hold
weak references that would otherwise fallback to their referant object's hash.
what's wrong with dict[id(key)] = foo?
Post by Benjamin Peterson
An advantage of formalizing this in collections would be to enable other Python
implementations like PyPy, where id() is expensive, to provide an optimized
identitydict.
that their id() is expensive is implementation details, and the
developer of PyPy should solve that instead of adding a clutch to the
stdlib.
Masklinn
2010-05-30 10:35:42 UTC
Permalink
Post by Lie Ryan
Post by Benjamin Peterson
An advantage of formalizing this in collections would be to enable other Python
implementations like PyPy, where id() is expensive, to provide an optimized
identitydict.
that their id() is expensive is implementation details, and the
developer of PyPy should solve that instead of adding a clutch to the
stdlib.
Actually, I'd say the exact opposite: CPython's identity being cheap is an
implementation detail, not the other way around, and that is what shouldn't
be relied on. In that light, a special-purpose identity dictionary independent
from implementation-specific and artificially low id() computation costs is a
pretty good idea.
Lie Ryan
2010-05-30 11:14:14 UTC
Permalink
Post by Masklinn
Post by Lie Ryan
Post by Benjamin Peterson
An advantage of formalizing this in collections would be to enable other Python
implementations like PyPy, where id() is expensive, to provide an optimized
identitydict.
that their id() is expensive is implementation details, and the
developer of PyPy should solve that instead of adding a clutch to the
stdlib.
Actually, I'd say the exact opposite: CPython's identity being cheap is an
implementation detail, not the other way around, and that is what shouldn't
be relied on. In that light, a special-purpose identity dictionary independent
from implementation-specific and artificially low id() computation costs is a
pretty good idea.
the performance of any Python statement/function is implementation
detail. Whether any statement/function is fast or cheap shouldn't be
relied on. IMHO, adding a brand new collection module just because a
Python implementation has a fast/cheap operations isn't a good reason.
Benjamin Peterson
2010-05-30 14:23:02 UTC
Permalink
Post by Lie Ryan
what's wrong with dict[id(key)] = foo?
For one, you can't get the value of the key out of the dict.
Post by Lie Ryan
that their id() is expensive is implementation details, and the
developer of PyPy should solve that instead of adding a clutch to the
stdlib.
A "clutch"? Even ignoring PyPy, an identitydict is still a useful primitive.
Lie Ryan
2010-05-30 14:56:32 UTC
Permalink
Post by Benjamin Peterson
Post by Lie Ryan
what's wrong with dict[id(key)] = foo?
For one, you can't get the value of the key out of the dict.
mydict[id(key)] = (key, value)
Post by Benjamin Peterson
Post by Lie Ryan
that their id() is expensive is implementation details, and the
developer of PyPy should solve that instead of adding a clutch to the
stdlib.
A "clutch"? Even ignoring PyPy, an identitydict is still a useful primitive.
it may be useful, but if you can emulate it relatively easily with
regular dict, why bother with a new type?
Benjamin Peterson
2010-05-30 15:20:41 UTC
Permalink
Post by Lie Ryan
it may be useful, but if you can emulate it relatively easily with
regular dict, why bother with a new type?
Why defaultdict then? It can be implemented just fine with dict.setdefault.
identitydict avoids the unnatural method of having the key and value as value of
the dictionary, resulting in contortions everytime the dictionary is manipulated.
Raymond Hettinger
2010-05-31 04:11:54 UTC
Permalink
Post by Lie Ryan
Post by Lie Ryan
what's wrong with dict[id(key)] = foo?
. . .
Post by Lie Ryan
it may be useful, but if you can emulate it relatively easily with
regular dict, why bother with a new type?
I agree that this would be a waste.

Also, I haven't seen much of a discussion of use cases.
It's been possible to make identity dictionaries for a very
long time, yet I can't remember the last time I saw one used.
The Python language makes very few guarantees about
object identity so it is mainly only usable in a few circumstances
where an object is continually referenced by something outside
the identity dict.

I would like to see an identity dict posted an ASPN recipe
to see if anyone actually uses it.


Raymond
Terry Reedy
2010-05-31 18:05:57 UTC
Permalink
It appears that different people are talking about two different ideas
of identity dict, with different properties and possible use cased. So
the statement of some do not apply to the idea held by others.

A. key object is not stored in dict.
d[k] = o ==> _d[id(k)] = o
Faster but limited in that cannot extract keys and user code must keep
keys alive.

B. key object stored in dict in one of at least two ways.
1. d[k] = o ==> _d[id(k)] = (k,o)
2. d[i] = o ==> i=id(k); _dk[i] = k; _do[i] = o
Slower, but (I believe) can fully emulate dicts.

Either type can be generalized to f instead of id as the key transform.


current vote: -.3
I am also not yet convinced, but perhaps could be, that either type,
with or without generalization should be in the stdlib. Instances of
user class without custom equality are already compared by identity. The
use cases for keying immutables by identify is pretty sparse. That
pretty much leave mutables with custom equality (by value rather than
identity).

Terry Jan Reedy
Jack Diederich
2010-05-31 23:31:06 UTC
Permalink
Post by Terry Reedy
current vote: -.3
I am also not yet convinced, but perhaps could be, that either type, with or
without generalization should be in the stdlib. Instances of user class
without custom equality are already compared by identity. The use cases for
keying immutables by identify is pretty sparse. That pretty much leave
mutables with custom equality (by value rather than identity).
I'm -1 on the idea without a strong use case. I vaguely recall
implementing one of these before but I think I was using it as a hacky
weakrefdict. Looking in my libmisc.py for dict-alikes I see an
OrderedDict (obsoleted), a ForgivingDict (obsoleted by defaultdict), a
ProxyDict, and a DecorateDict. The ProxyDict can push/pop dicts and
does lookups across all of them, most recent first, and performs sets
in the most recent. The DecorateDict calls a function on the value
before returning it. Django has classes with almost the exact same
code (not contributed by me).

Django:
http://code.djangoproject.com/svn/django/trunk/django/utils/datastructures.py
Me:
http://bazaar.launchpad.net/~odbrazen/leanlyn/trunk/annotate/head:/libmisc.py

-Jack
Benjamin Peterson
2010-06-01 00:44:41 UTC
Permalink
Post by Raymond Hettinger
Also, I haven't seen much of a discussion of use cases.
Here's a selection of use cases from PyPy's source (You can search for
"identity_dict" to see its use):

In a algorithm for breaking cycles in graphs:
http://codespeak.net/svn/pypy/trunk/pypy/tool/algo/graphlib.py

Keeping track of all the allocated objects in a model of a low level runtime:
http://codespeak.net/svn/pypy/trunk/pypy/rpython/lltypesystem/lltype.py

Tracing the source of a certain kind of type as our type checker annotate
RPython: http://codespeak.net/svn/pypy/trunk/pypy/annotation/bookkeeper.py

Traversing the blocks of a function's graph:
http://codespeak.net/svn/pypy/trunk/pypy/objspace/flow/model.py

Essentially these are places where defined equality should not matter.


I could also use it here:
http://code.activestate.com/recipes/577242-calling-c-level-finalizers-without-__del__/
Greg Ewing
2010-05-31 00:30:25 UTC
Permalink
Post by Benjamin Peterson
Post by Lie Ryan
what's wrong with dict[id(key)] = foo?
For one, you can't get the value of the key out of the dict.
Another thing is that it doesn't keep the key object
alive, so if the app doesn't do anything else to ensure
this, it's possible for the key to disappear and be
replaced by another object having the same id.
--
Greg
Raymond Hettinger
2010-05-31 17:25:30 UTC
Permalink
Post by Greg Ewing
Post by Benjamin Peterson
Post by Lie Ryan
what's wrong with dict[id(key)] = foo?
For one, you can't get the value of the key out of the dict.
Another thing is that it doesn't keep the key object
alive, so if the app doesn't do anything else to ensure
this, it's possible for the key to disappear and be
replaced by another object having the same id.
If you don't have an external reference to the object,
how are you ever going to look it up in the dictionary?

Since we can attach values directly to many objects
(using instance variables, function attributes, etc)
or can keep them in a tuple, there appear to be remarkably
few use cases for an identity dictionary.

Also, there hasn't been much discussion of implementation,
but unless you're willing to copy and paste most of the
code in dictobject.c, you're going to end-up with something
much slower than d[id(obj)]=value.


Raymond
Benjamin Peterson
2010-06-01 00:45:45 UTC
Permalink
Post by Raymond Hettinger
Also, there hasn't been much discussion of implementation,
but unless you're willing to copy and paste most of the
code in dictobject.c, you're going to end-up with something
much slower than d[id(obj)]=value.
It can be implemented simply in Python:
http://codespeak.net/svn/pypy/trunk/pypy/lib/identity_dict.py
Raymond Hettinger
2010-06-01 01:23:18 UTC
Permalink
Post by Benjamin Peterson
Post by Raymond Hettinger
Also, there hasn't been much discussion of implementation,
but unless you're willing to copy and paste most of the
code in dictobject.c, you're going to end-up with something
much slower than d[id(obj)]=value.
http://codespeak.net/svn/pypy/trunk/pypy/lib/identity_dict.py
That code is pretty much what I expected.
In CPython, it is dramatically slower than
using a regular dictionary with d[id(obj)]=value.
In PyPy, it makes sense because the code
gets optimized as if it were hand coded in C.
IOW, identity_dict.py doesn't make much sense
for other implementations.
Post by Benjamin Peterson
Here's a selection of use cases from PyPy's source (You can search for
http://codespeak.net/svn/pypy/trunk/pypy/tool/algo/graphlib.py
This is code that doesn't require or benefit from using an identity dictionary.
Regular dicts work just fine here. And since, identity-implies-equality
for regular CPython dicts, you already get excellent performance
(i.e. the __eq__ methods never get called when the object identities
already match).
Post by Benjamin Peterson
http://codespeak.net/svn/pypy/trunk/pypy/rpython/lltypesystem/lltype.py
This is a ton of code and I can't easily tell what it is doing or comment on it.
Post by Benjamin Peterson
Tracing the source of a certain kind of type as our type checker annotate
RPython: http://codespeak.net/svn/pypy/trunk/pypy/annotation/bookkeeper.py
Looks to be another case where a regular dict works just fine.
Post by Benjamin Peterson
http://codespeak.net/svn/pypy/trunk/pypy/objspace/flow/model.py
This code also works fine with a regular dictionary or a regular python set.
If you used the identity_dict.py code mentioned above, it would just
slow down the code. This isn't really even a dictionary use case,
a set would be a better choice.
Post by Benjamin Peterson
Essentially these are places where defined equality should not matter.
Essentially, these are cases where an identity dictionary isn't
necessary and would in-fact be worse performance-wise
in every implementation except for PyPy which can compile
the pure python code for indentity_dict.py.

Since instances have a default hash equal to the id and since
identity-implies-equality for dictionary keys, we already have
a dictionary that handles these cases. You don't even
have to type: d[id(k)]=value, it would suffice to write: d[k]=value.

Sorry, but I think this idea is a total waste. Perhaps post it as
a recipe, but it doesn't make sense to try to inject it into the
standard library.


Raymond
Jacob Holm
2010-06-01 07:52:44 UTC
Permalink
Received: from localhost (HELO mail.python.org) (127.0.0.1)
by albatross.python.org with SMTP; 01 Jun 2010 10:01:09 +0200
X-Greylist: delayed 493 seconds by postgrey-1.31 at albatross;
Tue, 01 Jun 2010 10:01:09 CEST
Received: from smtp190.iad.emailsrvr.com (smtp190.iad.emailsrvr.com
[207.97.245.190])
(using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
(No client certificate requested)
by mail.python.org (Postfix) with ESMTPS
for <python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>; Tue, 1 Jun 2010 10:01:09 +0200 (CEST)
Received: from relay9.relay.iad.mlsrvr.com (localhost [127.0.0.1])
by relay9.relay.iad.mlsrvr.com (SMTP Server) with ESMTP id 9A9201E510F;
Tue, 1 Jun 2010 03:52:51 -0400 (EDT)
Received: by relay9.relay.iad.mlsrvr.com (Authenticated sender:
jh-AT-improva.dk) with ESMTPSA id 0CC811E3067;
Tue, 1 Jun 2010 03:52:50 -0400 (EDT)
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.1.9) Gecko/20100423 Thunderbird/3.0.4
In-Reply-To: <A13D6A8C-C5EE-4AC7-A419-85CEC08F981A-***@public.gmane.org>
X-Enigmail-Version: 1.0.1
X-BeenThere: python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Discussions of speculative Python language ideas
<python-ideas.python.org>
List-Unsubscribe: <http://mail.python.org/mailman/options/python-ideas>,
<mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=unsubscribe>
List-Archive: <http://mail.python.org/pipermail/python-ideas>
List-Post: <mailto:python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>
List-Help: <mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=help>
List-Subscribe: <http://mail.python.org/mailman/listinfo/python-ideas>,
<mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=subscribe>
Sender: python-ideas-bounces+gcpi-python-ideas=m.gmane.org-+ZN9ApsXKcEdnm+***@public.gmane.org
Errors-To: python-ideas-bounces+gcpi-python-ideas=m.gmane.org-+ZN9ApsXKcEdnm+***@public.gmane.org
Archived-At: <http://permalink.gmane.org/gmane.comp.python.ideas/7251>
Post by Raymond Hettinger
Post by Benjamin Peterson
Essentially these are places where defined equality should not matter.
"should not matter" is the important part here. It might have been
clearer to say "should be ignored" instead. I think Raymond is
misunderstanding it.
Post by Raymond Hettinger
Essentially, these are cases where an identity dictionary isn't
necessary and would in-fact be worse performance-wise
in every implementation except for PyPy which can compile
the pure python code for indentity_dict.py.
It is necessary, because the objects involved might define their own
__hash__ and __cmp__/__eq__, and these should *not* be used.
Post by Raymond Hettinger
Sorry, but I think this idea is a total waste. Perhaps post it as
a recipe, but it doesn't make sense to try to inject it into the
standard library.
I don't think it is a total waste, but I have seen two ideas in this
thread that I find more generally useful. One is
"collections.keyfuncdict", which could be trivially used as an
identitydict. The other is a WeakIdentityDict, which is a WeakKeyDict
that uses only the identity of the keys for hashing/equality. These two
are independent, one cannot be used to implement the other (unless
collections.keyfuncdict grows an option to not keep strong refs to the
keys, perhaps by providing the inverse keyfunc instead).

Anyway, +0.1 on identitydict and +1 on each of collection.keyfuncdict
and WeakIdentityDict.

- Jacob
Benjamin Peterson
2010-06-02 17:42:38 UTC
Permalink
Post by Raymond Hettinger
Also, there hasn't been much discussion of implementation,
but unless you're willing to copy and paste most of the
code in dictobject.c, you're going to end-up with something
much slower than d[id(obj)]=value.
A slightly hacky way to implement this on CPython without copying much would be
to implement __getitem__, and __setitem__ in C and subclass it in Python to call
those methods in the rest of dictionary implementation.
Imri Goldberg
2010-05-30 22:34:59 UTC
Permalink
Post by Benjamin Peterson
Post by Benjamin Peterson
In the spirit of collections.OrderedDict and collections.defaultdict, I'd
like
Post by Benjamin Peterson
to propose collections.identitydict. It would function just like a normal
dictionary, but ignore hash values and comparison operators and merely
lookup
Post by Benjamin Peterson
keys based on the key's id().
This dict is very useful for keep track of objects that should not be
compared
Post by Benjamin Peterson
by normal comparison methods. For example, I would use an identitydict to
hold
Post by Benjamin Peterson
weak references that would otherwise fallback to their referant object's
hash.
My 2 bits:

1. I implemented and used such a dict a few years ago, as part of a
graph/tree algorithm. While I don't have access to the source anymore, I
mainly wanted to bring up an example for the usage of such a data structure.
Still, I haven't needed it since, so take from it what you will.
Post by Benjamin Peterson
what's wrong with dict[id(key)] = foo?
2. Mostly that you want other operators to work as well. In your example,
"key in dic" will return False, as __contains__ is the standard
implementation.

Cheers,
Imri
--
Imri Goldberg
--------------------------------------
http://plnnr.com/ - automatic trip planning
http://www.algorithm.co.il/blogs/
--------------------------------------
-- insert signature here ----
Benjamin Peterson
2010-06-01 02:31:39 UTC
Permalink
Post by Lie Ryan
that their id() is expensive is implementation details, and the
developer of PyPy should solve that instead of adding a clutch to the
stdlib.
The stdlib isn't just about CPython. We already have optimized primitives for
CPython, so I don't see why helping other implementations isn't a good cause.
Raymond Hettinger
2010-06-01 04:37:05 UTC
Permalink
Post by Benjamin Peterson
Post by Lie Ryan
that their id() is expensive is implementation details, and the
developer of PyPy should solve that instead of adding a clutch to the
stdlib.
The stdlib isn't just about CPython. We already have optimized primitives for
CPython, so I don't see why helping other implementations isn't a good cause.
Benjamin, could you elaborate of several points that are unclear:

* If id() is expensive in PyPy, then how are they helped by the code in
http://codespeak.net/svn/pypy/trunk/pypy/lib/identity_dict.py
which uses id() for the gets and sets and contains?

* In the examples you posted (such as http://codespeak.net/svn/pypy/trunk/pypy/tool/algo/graphlib.py ),
it appears that PyPy already has an identity dict, so how are they helped by adding one to the collections module?
... pass
Post by Benjamin Peterson
Post by Lie Ryan
a = A()
d = {a: 10}
assert d[a] == 10 # uses a's identity for lookup
* Is the proposal something needed for all implementations or is it just an optimization for a particular, non-CPython implementation?


Raymond
Benjamin Peterson
2010-06-01 21:17:25 UTC
Permalink
* If id() is expensive in PyPy, then how are they helped by the code in 
http://codespeak.net/svn/pypy/trunk/pypy/lib/identity_dict.py
which uses id() for the gets and sets and contains?
At the top of that file, it imports from the special module __pypy__ which
contains an optimized version of the dict.
* In the examples you posted (such
as http://codespeak.net/svn/pypy/trunk/pypy/tool/algo/graphlib.py ),
it appears that PyPy already has an identity dict,  so how are they helped by
adding one to the collections module?

My purpose with those examples was to prove it as a generally useful utility.
* Most of the posted examples already work with regular dicts (which check
identity before they check equality) -- don't the other implementations already
implement regular dicts which need to have identity-implied-equality in order to
pass the test suite?  I would expect the following snippet to work under all
    >>> class A: 
    ...         pass
    >>> a = A()
    >>> d = {a: 10}
    >>> assert d[a] == 10   # uses a's identity for lookup
Yes, but that would be different if you have two "a"s with __eq__ defined to be
equal and you want to hash them separately.
* Is the proposal something needed for all implementations or is it just an
optimization for a particular, non-CPython implementation?

My contention is that an identity dictionary or at least a dictionary with
custom hash and keys is a useful primitive that should be in the standard
library. However, I also see its advantage in avoiding bad performance of id()
based identity dicts in non-CPython implementations.

It is useful to let the implementation optimize it any time there is moving GC
as in Jython and IronPython where id also is expensive. (Basically a mapping has
to be maintained for all objects on which id is called.)
Marcin 'Qrczak' Kowalczyk
2010-06-01 22:05:31 UTC
Permalink
Post by Benjamin Peterson
My contention is that an identity dictionary or at least a dictionary with
custom hash and keys is a useful primitive that should be in the standard
library. However, I also see its advantage in avoiding bad performance of id()
based identity dicts in non-CPython implementations.
It is useful to let the implementation optimize it any time there is moving GC
as in Jython and IronPython where id also is expensive. (Basically a mapping has
to be maintained for all objects on which id is called.)
Here is how I designed this for my language:

You can request the ObjectId of the given object. If an ObjectId
corresponding to the given object is still alive, you always get it
back again, but it can be GC'ed and later created afresh. ObjectIds
are hashable and comparable (with an arbitrary ordering). Hash values
and the ordering are preserved when ObjectIds are kept alive, but they
may be different if ObjectIds are created afresh. An ObjectId contains
an integer index which is unique among ObjectIds being alive at the
same time.

You can make a dictionary with a specified key function. It is
internally backed by something equivalent to f(k) -> (k, v) dict. A
dictionary with ObjectId constructor as the key is an identity
dictionary; it works because it keeps both k and f(k) alive.

An advantage of this scheme is that with a moving GC the id mapping
must be maintained only for objects for which the program keeps their
ObjectIds alive.

A disadvantage is that the program must be careful to not use
ObjectIds in a manner which does not keep them alive yet expects
consistent hashing and ordering. In particular a key-mapped dict which
would store only (k, v) pairs and compute f(k) on the fly would not
work. Also ObjectIds cannot be used to generate printable unique
identifiers which would be valid without having to keep ObjectIds
alive, like in Python's default repr.
--
Marcin Kowalczyk
Raymond Hettinger
2010-06-02 16:37:07 UTC
Permalink
Post by Raymond Hettinger
Post by Raymond Hettinger
* In the examples you posted (such
as http://codespeak.net/svn/pypy/trunk/pypy/tool/algo/graphlib.py ),
Post by Raymond Hettinger
it appears that PyPy already has an identity dict, so how are they helped by
adding one to the collections module?
My purpose with those examples was to prove it as a generally useful utility.
Post by Raymond Hettinger
* Most of the posted examples already work with regular dicts (which check
identity before they check equality) -- don't the other implementations already
implement regular dicts which need to have identity-implied-equality in order to
pass the test suite? I would expect the following snippet to work under all
Post by Raymond Hettinger
... pass
a = A()
d = {a: 10}
assert d[a] == 10 # uses a's identity for lookup
Yes, but that would be different if you have two "a"s with __eq__ defined to be
equal and you want to hash them separately.
None of the presented examples take advantage of that property.
All of them work with regular dictionaries. This proposal is still
use case challenged.

AFAICT from code searches, the idea of needing to override
an existing __eq__ with an identity-only comparison seems
to never come up. It would not even be popular as an ASPN recipe.

Moreover, I think that including it in the standard library would be harmful.
The language makes very few guarantees about object identity.
In most cases a user would far better off using a regular dictionary.
If a rare case arose where __eq__ needed to be overridden with an
identity-only check, it is not hard to write d[id(obj)]=value.

Strong -1 on including this in the standard library.


Raymond


P.S. ISTM that including subtly different variations of a data type
does more harm than good. Understanding how to use an
identity dictionary correctly requires understanding the nuances
of object identity, how to keep the object alive outside the dictionary
(even if the dictionary keeps it alive, a user still needs an external reference
to be able to do a lookup), and knowing that the version proposed for
CPython has dramatically worse speed/space performance than
a regular dictionary. The very existence of an identity dictionary in
collections is likely to distract a user away from a better solution using:
d[id(obj)]=value.
Benjamin Peterson
2010-06-02 17:38:46 UTC
Permalink
Post by Raymond Hettinger
None of the presented examples take advantage of that property.
All of them work with regular dictionaries. This proposal is still
use case challenged.
Besides that ASPN recipe of mine I mentioned before [1], here are some
more examples:

- copy.deepcopy() and pickle use it for an object memo.

- keeping track of protocol versions in Twisted. [2]

- memo in a serialization protocol. [3]


[1]
http://code.activestate.com/recipes/577242-calling-c-level-finalizers-
without-__del__/
[2] http://twistedmatrix.com/trac/browser/trunk/twisted/persisted/
styles.py
[3] http://twistedmatrix.com/trac/browser/trunk/twisted/spread/jelly.py
Post by Raymond Hettinger
AFAICT from code searches, the idea of needing to override
an existing __eq__ with an identity-only comparison seems
to never come up. It would not even be popular as an ASPN recipe.
Moreover, I think that including it in the standard library would be harmful.
The language makes very few guarantees about object identity.
In most cases a user would far better off using a regular dictionary.
If a rare case arose where __eq__ needed to be overridden with an
identity-only check, it is not hard to write d[id(obj)]=value.
Post by Antoine Pitrou
d = {}
d[id([])] = 10
d[id([])]
10
Post by Raymond Hettinger
Strong -1 on including this in the standard library.
How do you feel about Antoine's keyfuncdict proposal?
Post by Raymond Hettinger
Raymond
P.S. ISTM that including subtly different variations of a data type
does more harm than good. Understanding how to use an
identity dictionary correctly requires understanding the nuances
of object identity,
We're all adults here. We provide WeakKeyedDict and friends even though
they rely on unpredictable and subtle garbage collection.
Post by Raymond Hettinger
how to keep the object alive outside the dictionary
(even if the dictionary keeps it alive, a user still needs an
external reference
to be able to do a lookup), and knowing that the version proposed for
CPython has dramatically worse speed/space performance than
a regular dictionary. The very existence of an identity dictionary in
collections is likely to distract a user away from a better solution
using: d[id(obj)]=value.
I would argue that that's not a better solution given the above example.
Anyone using id(obj) would have understand the nuances of object identity
perhaps more than a real identity dictionary.
Raymond Hettinger
2010-06-02 17:58:50 UTC
Permalink
Post by Benjamin Peterson
Post by Raymond Hettinger
Strong -1 on including this in the standard library.
How do you feel about Antoine's keyfuncdict proposal?
His proposal is more interesting because it is more general.
For example, a keyfuncdict may make it easy to implement
a case-insensitive dictionary. I would like to see a concrete
implementation to see what design choices it makes.


Raymond
Philip Jenvey
2010-06-03 19:02:44 UTC
Permalink
Post by Raymond Hettinger
Moreover, I think that including it in the standard library would be harmful.
The language makes very few guarantees about object identity.
In most cases a user would far better off using a regular dictionary.
If a rare case arose where __eq__ needed to be overridden with an
identity-only check, it is not hard to write d[id(obj)]=value.
Strong -1 on including this in the standard library.
P.S. ISTM that including subtly different variations of a data type
does more harm than good. Understanding how to use an
identity dictionary correctly requires understanding the nuances
of object identity, how to keep the object alive outside the dictionary
(even if the dictionary keeps it alive, a user still needs an external reference
to be able to do a lookup), and knowing that the version proposed for
CPython has dramatically worse speed/space performance than
a regular dictionary. The very existence of an identity dictionary in
d[id(obj)]=value.
Post by Benjamin Peterson
Essentially these are places where defined equality should not matter.
Essentially, these are cases where an identity dictionary isn't
necessary and would in-fact be worse performance-wise
in every implementation except for PyPy which can compile
the pure python code for indentity_dict.py.
Using id() is a workaround but again, a potentially expensive one for platforms with moving GCs. Every object calling for an id() forces additional bookkeeping on their ends. This is only a better solution for CPython.

Whereas abstracting this out into an identitydict type gives all platforms the chance to provide their own optimized versions.
Post by Raymond Hettinger
Since instances have a default hash equal to the id and since
identity-implies-equality for dictionary keys, we already have
a dictionary that handles these cases. You don't even
have to type: d[id(k)]=value, it would suffice to write: d[k]=value.
No, the default hash backed by id is a CPython implementation detail.

Another use case is just the fact that Python allows you to completely change the semantics of __eq__ (and for good reason). Though this is rare, take SQLAlchemy's SQL expression DSL for example, that has it generate a where clause:

table.select(table.c.id == 4) # table.c.id == 4 returns a "<sql statement> where id == 4" object

I don't see how a platform like Jython can provide an optimized identitydict that avoids id() calls via keyfuncdict(key=id). The keys() of said dict would need to be actual results of id() calls.

I'm +1 on an identitydict as long as its CPython implementation doesn't provide worse performance than the id workaround.

--
Philip Jenvey
Raymond Hettinger
2010-06-03 19:43:10 UTC
Permalink
Post by Philip Jenvey
Post by Raymond Hettinger
P.S. ISTM that including subtly different variations of a data type
does more harm than good. Understanding how to use an
identity dictionary correctly requires understanding the nuances
of object identity, how to keep the object alive outside the dictionary
(even if the dictionary keeps it alive, a user still needs an external reference
to be able to do a lookup), and knowing that the version proposed for
CPython has dramatically worse speed/space performance than
a regular dictionary. The very existence of an identity dictionary in
d[id(obj)]=value.
Post by Benjamin Peterson
Essentially these are places where defined equality should not matter.
Essentially, these are cases where an identity dictionary isn't
necessary and would in-fact be worse performance-wise
in every implementation except for PyPy which can compile
the pure python code for indentity_dict.py.
Using id() is a workaround but again, a potentially expensive one for platforms with moving GCs. Every object calling for an id() forces additional bookkeeping on their ends. This is only a better solution for CPython.
To be clear, most the examples given so far work with regular dictionaries even without using id(). The exception was something system specific such as the pickling mechanism.

So what we're talking about is the comparatively rare case when an object has an __eq__ method and you want that method to be ignored. For example, you have two tuples (3,5) and (3,5) which are equal but happen to be distinct in memory and your needs are:
* to treat the two equal objects as being distinct for some purpose
* to run faster than id() runs on non-CPython implementations
* don't care if the code is dog slow on CPython (i.e. slower than if you had used id())
* don't care that the two tuples being distinct is memory is not a guaranteed behavior across implementations (i.e. any implementation is free to make all equal tuples share the same id via interning)

FWIW, I spoke with Jim Baker about this yesterday and he believes that Jython has no need for an identity dict.


Raymond


P.S. If Antoine's keyfuncdict proposal gains traction, it would be possible for other implementations to create a fast special case for key=id.
Antoine Pitrou
2010-06-03 20:04:29 UTC
Permalink
On Thu, 3 Jun 2010 12:02:44 -0700
Post by Philip Jenvey
Using id() is a workaround but again, a potentially expensive one for platforms with moving
GCs. Every object calling for an id() forces additional bookkeeping on their ends. This is
only a better solution for CPython.
Well, CPython and all other implementations with a non-moving GC.
Post by Philip Jenvey
Whereas abstracting this out into an identitydict type gives all platforms the chance to
provide their own optimized versions.
That's really premature optimization.

Regards

Antoine.

Michael
2010-05-30 09:29:35 UTC
Permalink
+1 - this has been useful to me in other languages.

(apologies for top-post, sent from phone.)

Michael Foord


--
http://www.ironpythoninaction.com
Post by Benjamin Peterson
In the spirit of collections.OrderedDict and
collections.defaultdict, I'd like
to propose collections.identitydict. It would function just like a normal
dictionary, but ignore hash values and comparison operators and merely lookup
keys based on the key's id().
This dict is very useful for keep track of objects that should not be compared
by normal comparison methods. For example, I would use an
identitydict to hold
weak references that would otherwise fallback to their referant object's hash.
An advantage of formalizing this in collections would be to enable other Python
implementations like PyPy, where id() is expensive, to provide an optimized
identitydict.
Regards,
Benjamin
_______________________________________________
Python-ideas mailing list
http://mail.python.org/mailman/listinfo/python-ideas
Antoine Pitrou
2010-05-30 10:34:00 UTC
Permalink
On Sun, 30 May 2010 03:27:53 +0000 (UTC)
Post by Benjamin Peterson
In the spirit of collections.OrderedDict and collections.defaultdict, I'd like
to propose collections.identitydict. It would function just like a normal
dictionary, but ignore hash values and comparison operators and merely lookup
keys based on the key's id().
Perhaps it would be more useful to add a generic
collections.keyfuncdict, taking a function which applied to a key
gives the real key value used for lookups.
Your identity dict would be created as:
d = collections.keyfuncdict(id)

But of course, you can just use a normal dict:
d = {}
d[id(key)] = key, value

(actually, this could be how a collections.keyfuncdict gets implemented)

Regards

Antoine.
Lie Ryan
2010-05-30 11:07:52 UTC
Permalink
Received: from localhost (HELO mail.python.org) (127.0.0.1)
by albatross.python.org with SMTP; 30 May 2010 13:10:09 +0200
Received: from lo.gmane.org (lo.gmane.org [80.91.229.12])
by mail.python.org (Postfix) with ESMTP
for <python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>; Sun, 30 May 2010 13:10:08 +0200 (CEST)
Received: from list by lo.gmane.org with local (Exim 4.69)
(envelope-from <gcpi-python-***@m.gmane.org>) id 1OIgP7-0000cM-K0
for python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org; Sun, 30 May 2010 13:10:05 +0200
Received: from 110-175-240-90.static.tpgi.com.au ([110.175.240.90])
by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
id 1AlnuQ-0007hv-00
for <python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>; Sun, 30 May 2010 13:10:05 +0200
Received: from lie.1296 by 110-175-240-90.static.tpgi.com.au with local
(Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00
for <python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>; Sun, 30 May 2010 13:10:05 +0200
X-Injected-Via-Gmane: http://gmane.org/
connect(): No such file or directory
Lines: 22
X-Complaints-To: usenet-mkLwACOSqtU9smdsby/***@public.gmane.org
X-Gmane-NNTP-Posting-Host: 110-175-240-90.static.tpgi.com.au
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
rv:1.9.1.9) Gecko/20100413 Thunderbird/3.0.4
In-Reply-To: <20100530123400.726a9fe1-xNDA5Wrcr86sTnJN9+***@public.gmane.org>
X-Enigmail-Version: 1.0.1
X-BeenThere: python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Discussions of speculative Python language ideas
<python-ideas.python.org>
List-Unsubscribe: <http://mail.python.org/mailman/options/python-ideas>,
<mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=unsubscribe>
List-Archive: <http://mail.python.org/pipermail/python-ideas>
List-Post: <mailto:python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>
List-Help: <mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=help>
List-Subscribe: <http://mail.python.org/mailman/listinfo/python-ideas>,
<mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=subscribe>
Sender: python-ideas-bounces+gcpi-python-ideas=m.gmane.org-+ZN9ApsXKcEdnm+***@public.gmane.org
Errors-To: python-ideas-bounces+gcpi-python-ideas=m.gmane.org-+ZN9ApsXKcEdnm+***@public.gmane.org
Archived-At: <http://permalink.gmane.org/gmane.comp.python.ideas/7220>
Post by Antoine Pitrou
On Sun, 30 May 2010 03:27:53 +0000 (UTC)
Post by Benjamin Peterson
In the spirit of collections.OrderedDict and collections.defaultdict, I'd like
to propose collections.identitydict. It would function just like a normal
dictionary, but ignore hash values and comparison operators and merely lookup
keys based on the key's id().
Perhaps it would be more useful to add a generic
collections.keyfuncdict, taking a function which applied to a key
gives the real key value used for lookups.
d = collections.keyfuncdict(id)
d = {}
d[id(key)] = key, value
(actually, this could be how a collections.keyfuncdict gets implemented)
+1 on this
Mathias Panzenböck
2010-05-30 13:43:36 UTC
Permalink
Post by Antoine Pitrou
On Sun, 30 May 2010 03:27:53 +0000 (UTC)
Post by Benjamin Peterson
In the spirit of collections.OrderedDict and collections.defaultdict, I'd like
to propose collections.identitydict. It would function just like a normal
dictionary, but ignore hash values and comparison operators and merely lookup
keys based on the key's id().
Perhaps it would be more useful to add a generic
collections.keyfuncdict, taking a function which applied to a key
gives the real key value used for lookups.
d = collections.keyfuncdict(id)
d = {}
d[id(key)] = key, value
(actually, this could be how a collections.keyfuncdict gets implemented)
Regards
Antoine.
Yes, something like this or a dict wihch you can pass a custom hash and compare function would be a
good idea, imho.

+1

-panzi
Benjamin Peterson
2010-05-30 14:27:05 UTC
Permalink
Post by Antoine Pitrou
Perhaps it would be more useful to add a generic
collections.keyfuncdict, taking a function which applied to a key
Perhaps; I'm only really interested in the identity version. That could be
specified by keyfuncdict(None, None).
Antoine Pitrou
2010-05-30 14:54:08 UTC
Permalink
On Sun, 30 May 2010 14:27:05 +0000 (UTC)
Post by Benjamin Peterson
Post by Antoine Pitrou
Perhaps it would be more useful to add a generic
collections.keyfuncdict, taking a function which applied to a key
Perhaps; I'm only really interested in the identity version. That could be
specified by keyfuncdict(None, None).
Just keyfuncdict(None) or keyfuncdict(id), then.
I'm not proposing that the hash() and eq() function be passed, but
really a factory function which is used for computing the actual
lookup key when doing inserts and lookups (which then implicitly use
the natural hash and eq functions for that lookup key, as a normal
dict does).

Regards

Antoine.
Daniel Stutzbach
2010-05-30 19:37:18 UTC
Permalink
Post by Antoine Pitrou
Perhaps it would be more useful to add a generic
collections.keyfuncdict, taking a function which applied to a key
gives the real key value used for lookups.
d = collections.keyfuncdict(id)
For what it's worth, my sorteddict class (part of the blist library) already
supports exactly that syntax. The full signature is:

sorteddict([key[, arg]], **kw)

where "arg" and "kw" do exactly what they do for dict(), and "key" specifies
a key function just as in your example. (Although now that I think about
it, perhaps "arg" and "key" should be reversed to more closely match the
signature of dict())
Post by Antoine Pitrou
from blist import sorteddict
d = sorteddict(id)
d[(1,2)] = True
d[(1,2)] = True
d.keys()
[(1, 2), (1, 2)]

I'm not suggesting that Benjamin use sorteddict for his use-case, since
keeping the keys sorted by id() is probably not useful. ;-)

In some ways it would be nice if dict() itself accepted a "key" function as
an argument, but I don't see a way to provide that functionality without
adding another field to PyDictObject (yuck) or abusing the ma_lookup field
(again, yuck).

I favor adding the more general keyfuncdict over identitydict.
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
spir ☣
2010-05-30 11:20:47 UTC
Permalink
On Sun, 30 May 2010 03:27:53 +0000 (UTC)
Post by Benjamin Peterson
In the spirit of collections.OrderedDict and collections.defaultdict, I'd like
to propose collections.identitydict. It would function just like a normal
dictionary, but ignore hash values and comparison operators and merely lookup
keys based on the key's id().
This is how Lua works, systematically. To make this work, all "primitive" values (non-table, since table is the single container type) are interned. Thus, for "values", equality and identity are equivalent. Tables instead are considered referenced objects, even when they are used as collections.
So that Lua can quietly hash the ref instead of the value.
* simplicity
* performance
* anything can be used as key
Post by Benjamin Peterson
p1 = {x=1,y=2}
p2 = {x=1,y=2}
t = {[p1]="abc"}
print(t[p1],tp2)
abc nil
Conversely, collections which, conceptually, are referenced things, do not erroneaously return false positive because of value comparison. (The case does not happen in python since collections cannot be used as keys).

For the scheme to really extend to any use case, there should be a kind of mark allowing the programmer to state whether a table is, conceptually, a value (simply happens to be structured) or a referenced thing. Value tables should then be interned like other values -- but this is certainly difficult and costly.


Denis
________________________________

vit esse estrany ☣

spir.wikidot.com
Bill Janssen
2010-06-01 17:03:44 UTC
Permalink
Denis, if you're going to post to python-ideas, would you mind taking
that biohazard symbol out of your user name? My Emacs-based mail reader
thrashes for quite a while trying to find a glyph for it before it gives
up and renders it as a hollow rectangle.

I'd normally just add you to my kill-file, but I hate to give up on
python-ideas people that fast. I'm sure you're not really a biohazard :-).

Bill
spir ☣
2010-05-30 11:28:02 UTC
Permalink
On Sun, 30 May 2010 03:27:53 +0000 (UTC)
Post by Benjamin Peterson
This dict is very useful for keep track of objects that should not be compared
by normal comparison methods. For example, I would use an identitydict to hold
weak references that would otherwise fallback to their referant object's hash.
The main point is certainly that the limitation of "hashability" disappears... Everything can be used as key, if actually the ref is hashed instead of the value.

Denis
________________________________

vit esse estrany ☣

spir.wikidot.com
Nick Coghlan
2010-05-30 12:05:20 UTC
Permalink
Post by Benjamin Peterson
In the spirit of collections.OrderedDict and collections.defaultdict, I'd like
to propose collections.identitydict. It would function just like a normal
dictionary, but ignore hash values and comparison operators and merely lookup
keys based on the key's id().
This dict is very useful for keep track of objects that should not be compared
by normal comparison methods. For example, I would use an identitydict to hold
weak references that would otherwise fallback to their referant object's hash.
An advantage of formalizing this in collections would be to enable other Python
implementations like PyPy, where id() is expensive, to provide an optimized
identitydict.
How would dict equality be defined? In terms of values, or in terms of
value identity?

Cheers,
Nick.
--
Nick Coghlan | ncoghlan-***@public.gmane.org | Brisbane, Australia
---------------------------------------------------------------
Benjamin Peterson
2010-05-30 14:18:00 UTC
Permalink
Post by Nick Coghlan
How would dict equality be defined? In terms of values, or in terms of
value identity?
Completely identity. __eq__ and __hash__ should never be called.
Mike Graham
2010-05-30 17:04:39 UTC
Permalink
What is the use case for such a collection? (This will help answer
questions about how stuff like comparing two identity dicts or an
identity dict and another mapping will work.)

By default, objects __eq__ and __hash__ methods allow them to be used
in dicts by identity. When I define __eq__ to mean something else, I
find that I don't typically have something I can really store in a
dict with the behavior I want.

Should an ID Dict really be a weak ref identity dict? The one time I
wanted an ID dict is when I was writing a memoization routine for
methods, where I wanted to store the caches for each object.
(Incidentally, I ended up refactoring not to require this anyhow.) Is
this the main basic use case? Am I missing something?

Best regards,
Mike
Post by Benjamin Peterson
In the spirit of collections.OrderedDict and collections.defaultdict, I'd like
to propose collections.identitydict. It would function just like a normal
dictionary, but ignore hash values and comparison operators and merely lookup
keys based on the key's id().
This dict is very useful for keep track of objects that should not be compared
by normal comparison methods. For example, I would use an identitydict to hold
weak references that would otherwise fallback to their referant object's hash.
An advantage of formalizing this in collections would be to enable other Python
implementations like PyPy, where id() is expensive, to provide an optimized
identitydict.
Regards,
Benjamin
Greg Ewing
2010-05-31 00:37:27 UTC
Permalink
Should an ID Dict really be a weak ref identity dict? ... Is
this the main basic use case?
No, I don't think it's the main use case, although if an
identitydict is added, it would make sense to also add a
WeakIdentityDict to the weakref module.
--
Greg
M.-A. Lemburg
2010-05-30 21:16:27 UTC
Permalink
Post by Benjamin Peterson
In the spirit of collections.OrderedDict and collections.defaultdict, I'd like
to propose collections.identitydict. It would function just like a normal
dictionary, but ignore hash values and comparison operators and merely lookup
keys based on the key's id().
This dict is very useful for keep track of objects that should not be compared
by normal comparison methods. For example, I would use an identitydict to hold
weak references that would otherwise fallback to their referant object's hash.
An advantage of formalizing this in collections would be to enable other Python
implementations like PyPy, where id() is expensive, to provide an optimized
identitydict.
Are you sure this is a good idea for the stdlib ?

id(obj) gives you the current storage address of an object
in CPython and these can be reused over time.

Using a key object to such a dict would not secure it's persistence
throughout the usage lifetime in that dict, since d[id(key)] = 1
doesn't increase the refcount of key.

Such reuse is bound to happen often for some Python object
types, due to the free-lists we're using in CPython and
the Python allocator which is optimized for reuse of
fixed-size memory chunks.

For experts, such a dictionary may have some value (e.g. to
work around the hash() requirement of normal dictionaries),
but I feel that newbies would have a hard time understanding
all the implications.
--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, May 30 2010)
Post by Benjamin Peterson
Python/Zope Consulting and Support ... http://www.egenix.com/
mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2010-07-19: EuroPython 2010, Birmingham, UK 49 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
Benjamin Peterson
2010-05-30 22:17:59 UTC
Permalink
Post by M.-A. Lemburg
Are you sure this is a good idea for the stdlib ?
id(obj) gives you the current storage address of an object
in CPython and these can be reused over time.
But since the dictionary will hold a reference to the key, it won't be reused.
Post by M.-A. Lemburg
For experts, such a dictionary may have some value (e.g. to
work around the hash() requirement of normal dictionaries),
but I feel that newbies would have a hard time understanding
all the implications.
I'm not proposing a dictionary that would not hold a reference to its keys.
Loading...