Replacing the standard IO streams (was Re: changing sys.stdout encoding)

Discussion:

Replacing the standard IO streams (was Re: changing sys.stdout encoding)

Nick Coghlan

2012-06-09 09:55:09 UTC

So, after much digging, it appears the *right* way to replace a
standard stream in Python 3 after application start is to do the
following:

sys.stdin = open(sys.stdin.fileno(), 'r', <new settings>)
sys.stdout = open(sys.stdout.fileno(), 'w', <new settings>)
sys.stderr = open(sys.stderr.fileno(), 'w', <new settings>)

Ditto for the other standard streams. It seems it already *is* as
simple as with any other file, we just collectively forgot about:

1. The fact open() accepts file descriptors directly in Python 3
2. The fact that text streams still report the underlying file
descriptor correctly

*That* is something we can happily advertise in the standard library
docs. If you could check to make sure it works properly for your use
case and then file a docs bug at bugs.python.org to get it added to
the std streams documentation, that would be very helpful.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan-***@public.gmane.org | Brisbane, Australia

Paul Moore

2012-06-09 11:00:37 UTC

Post by Nick Coghlan
So, after much digging, it appears the *right* way to replace a
standard stream in Python 3 after application start is to do the
sys.stdin = open(sys.stdin.fileno(), 'r', <new settings>)
sys.stdout = open(sys.stdout.fileno(), 'w', <new settings>)
sys.stderr = open(sys.stderr.fileno(), 'w', <new settings>)
Ditto for the other standard streams. It seems it already *is* as

One minor point - if sys.stdout is redirected, *and* you have already
written to sys.stdout, this resets the file pointer. With test.py as

import sys
print("Hello!")
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("Hello!")

test.py >a gives one line in a, not two (tested on Windows, Unix may
be different). And changing to "a" doesn't resolve this...

Of course, the actual use case is to change the encoding before
anything is written - so maybe a small note saying "don't do this" is
enough. But it's worth mentioning before we get the bug report saying
"Python lost my data" :-)

Paul.

Paul Moore

2012-06-09 13:00:03 UTC

Post by Paul Moore

Post by Nick Coghlan
So, after much digging, it appears the *right* way to replace a
standard stream in Python 3 after application start is to do the
sys.stdin = open(sys.stdin.fileno(), 'r', <new settings>)
sys.stdout = open(sys.stdout.fileno(), 'w', <new settings>)
sys.stderr = open(sys.stderr.fileno(), 'w', <new settings>)
Ditto for the other standard streams. It seems it already *is* as

One minor point - if sys.stdout is redirected, *and* you have already
written to sys.stdout, this resets the file pointer. With test.py as
import sys
print("Hello!")
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("Hello!")
test.py >a gives one line in a, not two (tested on Windows, Unix may
be different). And changing to "a" doesn't resolve this...

Ignore me - you need to flush stdout before repoening it, is all. Dumb
mistake, sorry for the noise :-(

Paul.

MRAB

2012-06-09 16:42:53 UTC

Post by Paul Moore

Post by Nick Coghlan
So, after much digging, it appears the *right* way to replace a
standard stream in Python 3 after application start is to do the
sys.stdin = open(sys.stdin.fileno(), 'r',<new settings>)
sys.stdout = open(sys.stdout.fileno(), 'w',<new settings>)
sys.stderr = open(sys.stderr.fileno(), 'w',<new settings>)
Ditto for the other standard streams. It seems it already *is* as

One minor point - if sys.stdout is redirected, *and* you have already
written to sys.stdout, this resets the file pointer. With test.py as
import sys
print("Hello!")
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("Hello!")
test.py>a gives one line in a, not two (tested on Windows, Unix may
be different). And changing to "a" doesn't resolve this...
Of course, the actual use case is to change the encoding before
anything is written - so maybe a small note saying "don't do this" is
enough. But it's worth mentioning before we get the bug report saying
"Python lost my data" :-)

I find that this:

print("Hello!")
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("Hello!")

prints the string "Hello!\r\r\n", but this:

print("Hello!")
sys.stdout.flush()
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("Hello!")

prints the string "Hello!\r\nHello!\r\r\n".

I had hoped that the flush would be enough, but apparently not.

Serhiy Storchaka

2012-06-09 20:02:19 UTC

Post by Nick Coghlan
So, after much digging, it appears the *right* way to replace a
standard stream in Python 3 after application start is to do the
sys.stdin = open(sys.stdin.fileno(), 'r',<new settings>)
sys.stdout = open(sys.stdout.fileno(), 'w',<new settings>)
sys.stderr = open(sys.stderr.fileno(), 'w',<new settings>)

sys.stdin = io.TextIOWrapper(sys.stdin.detach(), <new settings>)
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), <new settings>)
...

None of these methods are not guaranteed to work if the input or output
have occurred before.

Mark Lawrence

2012-06-09 21:22:41 UTC

Post by Serhiy Storchaka
None of these methods are not guaranteed to work if the input or output
have occurred before.

That's a double negative so I'm not sure what you meant to say. Can you
please rephrase it. I assume that English is not your native language,
so I'll let you off :)

--
Cheers.

Mark Lawrence.

Serhiy Storchaka

2012-06-10 14:34:08 UTC

Post by Serhiy Storchaka
None of these methods are not guaranteed to work if the input or output
have occurred before.

That's a double negative so I'm not sure what you meant to say. Can you
please rephrase it. I assume that English is not your native language,
so I'll let you off :)

open(sys.stdin.fileno()) is not guaranteed to work if the input or
output have occurred before. And io.TextIOWrapper(sys.stdin.detach()) is
not guaranteed to work if the input or output have occurred before.
sys.stdin internal buffer can contains read by not used characters.
sys.stdin.buffer internal buffer can contains read by not used bytes.
With multibyte encoding sys.stdin.decoder internal buffer can contains
uncompleted multibyte character.

Nick Coghlan

2012-06-10 15:44:08 UTC

Post by Serhiy Storchaka
None of these methods are not guaranteed to work if the input or output
have occurred before.

That's a double negative so I'm not sure what you meant to say. Can you
please rephrase it. I assume that English is not your native language,
so I'll let you off :)

open(sys.stdin.fileno()) is not guaranteed to work if the input or output
have occurred before. And io.TextIOWrapper(sys.stdin.detach()) is not
guaranteed to work if the input or output have occurred before. sys.stdin
internal buffer can contains read by not used characters. sys.stdin.buffer
internal buffer can contains read by not used bytes. With multibyte encoding
sys.stdin.decoder internal buffer can contains uncompleted multibyte
character.

Right, but the point of this discussion is to document the cleanest
available way for an application to change these settings at
*application start* (e.g. to support an "--encoding" parameter). Yes,
there are potential issues if you use any of these mechanisms while
there is data in the buffers, but that's a much harder problem and not
one we're trying to solve here.

Regardless, the advantage of the "open + fileno" idiom is that it
works for *any* level of change. If you want to force your streams to
unbuffered binary IO rather than merely changing the encoding:

sys.stdin = open(sys.stdin.fileno(), 'rb', buffering=0, closefd=False)
sys.stdout = open(sys.stdout.fileno(), 'wb', buffering=0, closefd=False)
sys.stderr = open(sys.stderr.fileno(), 'wb', buffering=0, closefd=False)

Keep them as text, but force them to permissive utf-8, no matter how
the interpreter originally created them?:

sys.stdin = open(sys.stdin.fileno(), 'r', encoding="utf-8",
errors="surrogateescape", closefd=False)
sys.stdout = open(sys.stdout.fileno(), 'w', encoding="utf-8",
errors="surrogateescape", closefd=False)
sys.stderr = open(sys.stderr.fileno(), 'w', encoding="utf-8",
errors="surrogateescape", closefd=False)

This approach also has the advantage of leaving
sys.__std(in/out/err)__ in a somewhat usable state.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan-***@public.gmane.org | Brisbane, Australia

Stephen J. Turnbull

2012-06-10 16:41:14 UTC

Post by Nick Coghlan

open(sys.stdin.fileno()) is not guaranteed to work if the input or output
have occurred before.

[...]

Post by Nick Coghlan
Right, but the point of this discussion is to document the cleanest
available way for an application to change these settings at
*application start* (e.g. to support an "--encoding" parameter). Yes,
there are potential issues if you use any of these mechanisms while
there is data in the buffers,

+1

The OP's problem is a real one. His use case (the "--encoding"
parameter) seems to be the most likely one in production use, so the
loss of buffered data issue should rarely come up. Changing encodings
on the fly offers plenty of ways to lose data besides incomplete
buffers, anyway.

I am a little concerned with MRAB's report that

import sys
print("hello")
sys.stdout.flush()
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("hello")

doesn't work as expected, though. (It does work for me on Mac OS X,
both as above -- of course there are no '\r's in the output -- and
with 'print("hello", end="\r\n")'.)

MRAB

2012-06-10 18:12:55 UTC

Post by Stephen J. Turnbull

Post by Nick Coghlan

open(sys.stdin.fileno()) is not guaranteed to work if the input or output
have occurred before.

[...]

Post by Nick Coghlan
Right, but the point of this discussion is to document the cleanest
available way for an application to change these settings at
*application start* (e.g. to support an "--encoding" parameter). Yes,
there are potential issues if you use any of these mechanisms while
there is data in the buffers,

+1
The OP's problem is a real one. His use case (the "--encoding"
parameter) seems to be the most likely one in production use, so the
loss of buffered data issue should rarely come up. Changing encodings
on the fly offers plenty of ways to lose data besides incomplete
buffers, anyway.
I am a little concerned with MRAB's report that
import sys
print("hello")
sys.stdout.flush()
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("hello")
doesn't work as expected, though. (It does work for me on Mac OS X,
both as above -- of course there are no '\r's in the output -- and
with 'print("hello", end="\r\n")'.)

That's actually Python 3.1. From Python 3.2 it's slightly different,
but still not quite right:

Python 3.1: "hello\r\nhello\r\r\n"
Python 3.2: "hello\nhello\r\n"
Python 3.3.0a4: "hello\nhello\r\n"

All on Windows.

Paul Moore

2012-06-10 18:34:04 UTC

Received: from localhost (HELO mail.python.org) (127.0.0.1)
by albatross.python.org with SMTP; 10 Jun 2012 20:34:06 +0200
Received: from mail-ob0-f174.google.com (mail-ob0-f174.google.com
[209.85.214.174])
(using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
(No client certificate requested)
by mail.python.org (Postfix) with ESMTPS
for <python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>; Sun, 10 Jun 2012 20:34:06 +0200 (CEST)
Received: by obbtb18 with SMTP id tb18so5545510obb.19
for <python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>; Sun, 10 Jun 2012 11:34:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s 120113;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:content-type:content-transfer-encoding;
bh=tcNyKp+hZC8DQw+NGD879u6QBD2yVuait9pDuqCwGZ0=;
b=fLIhi5CXWRAe0JWR9LYXgOUl9skiO0AT34iiNzwmq2Bd2SmqLC1zwTI327C5zq3dju
FCmTPQmAIKCUsRry44/jBAAHT1jMHPVPFl3c8e5+5RjcSGyi+jtoQvIQMbNM0PGUzDBI
Kcr4IsBnr59/h5dMbDdFMR3yveM09975JPV1Dx+7hKzHetzhd9PajCI3+O8f8iXHQMId
P15vLGCkJtPH3GUwyHwQNJ67jkqe560/AG8+KxwHAHAcNRHKOVZnfGILYW5Uw0gQ7DPx
0gpqbBMuMSQGFgBO8i6L2G/SNNzePB1mIAY+NazUnbIG4qgc0qQskYn6H1SYGaKZtNm4
1/8g=Received: by 10.182.40.71 with SMTP id v7mr14261384obk.5.1339353244812; Sun,
10 Jun 2012 11:34:04 -0700 (PDT)
Received: by 10.182.145.67 with HTTP; Sun, 10 Jun 2012 11:34:04 -0700 (PDT)
In-Reply-To: <4FD4E3A7.6010506-896/***@public.gmane.org>
X-BeenThere: python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Discussions of speculative Python language ideas
<python-ideas.python.org>
List-Unsubscribe: <http://mail.python.org/mailman/options/python-ideas>,
<mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=unsubscribe>
List-Archive: <http://mail.python.org/pipermail/python-ideas>
List-Post: <mailto:python-ideas-+ZN9ApsXKcEdnm+***@public.gmane.org>
List-Help: <mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=help>
List-Subscribe: <http://mail.python.org/mailman/listinfo/python-ideas>,
<mailto:python-ideas-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=subscribe>
Sender: python-ideas-bounces+gcpi-python-ideas=m.gmane.org-+ZN9ApsXKcEdnm+***@public.gmane.org
Errors-To: python-ideas-bounces+gcpi-python-ideas=m.gmane.org-+ZN9ApsXKcEdnm+***@public.gmane.org
Archived-At: <http://permalink.gmane.org/gmane.comp.python.ideas/15469>

Post by Stephen J. Turnbull
I am a little concerned with MRAB's report that
import sys
print("hello")
sys.stdout.flush()
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("hello")
doesn't work as expected, though. (It does work for me on Mac OS X,
both as above -- of course there are no '\r's in the output -- and
with 'print("hello", end="\r\n")'.)

That's actually Python 3.1. From Python 3.2 it's slightly different,
Python 3.1: "hello\r\nhello\r\r\n"
Python 3.2: "hello\nhello\r\n"
Python 3.3.0a4: "hello\nhello\r\n"
All on Windows.

Not here (Win 7 32-bit):

PS D:\Data> type t.py
import sys
print("Hello!")
sys.stdout.flush()

sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("Hello!")
PS D:\Data> py -3.2 t.py | od -c
0000000 H e l l o ! \r \n H e l l o ! \r \n
0000020

Paul.

MRAB

2012-06-10 19:01:21 UTC

Post by Stephen J. Turnbull
I am a little concerned with MRAB's report that
import sys
print("hello")
sys.stdout.flush()
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("hello")
doesn't work as expected, though. (It does work for me on Mac OS X,
both as above -- of course there are no '\r's in the output -- and
with 'print("hello", end="\r\n")'.)

That's actually Python 3.1. From Python 3.2 it's slightly different,
Python 3.1: "hello\r\nhello\r\r\n"
Python 3.2: "hello\nhello\r\n"
Python 3.3.0a4: "hello\nhello\r\n"
All on Windows.

PS D:\Data> type t.py
import sys
print("Hello!")
sys.stdout.flush()
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("Hello!")
PS D:\Data> py -3.2 t.py | od -c
0000000 H e l l o ! \r \n H e l l o ! \r \n
0000020

I'm using Windows XP Pro (32-bit), initially sys.stdout.encoding ==
"cp1252".

Paul Moore

2012-06-10 20:07:00 UTC

I am a little concerned with MRAB's report that
import sys
print("hello")
sys.stdout.flush()
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("hello")
doesn't work as expected, though. (It does work for me on Mac OS X,
both as above -- of course there are no '\r's in the output -- and
with 'print("hello", end="\r\n")'.)

That's actually Python 3.1. From Python 3.2 it's slightly different,
Python 3.1: "hello\r\nhello\r\r\n"
Python 3.2: "hello\nhello\r\n"
Python 3.3.0a4: "hello\nhello\r\n"
All on Windows.

PS D:\Data> type t.py
import sys
print("Hello!")
sys.stdout.flush()
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("Hello!")
PS D:\Data> py -3.2 t.py | od -c
0000000 H e l l o ! \r \n H e l l o ! \r \n
0000020

I'm using Windows XP Pro (32-bit), initially sys.stdout.encoding ==
"cp1252".

PS D:\Data> py -3 -c "import sys; print(sys.stdout.encoding)"
cp850

This is at the console (Powershell) - are you running from within
something like idle, or a GUI environment?

Paul.

MRAB

2012-06-10 20:28:14 UTC

Post by Stephen J. Turnbull
I am a little concerned with MRAB's report that
import sys
print("hello")
sys.stdout.flush()
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("hello")
doesn't work as expected, though. (It does work for me on Mac OS X,
both as above -- of course there are no '\r's in the output -- and
with 'print("hello", end="\r\n")'.)

That's actually Python 3.1. From Python 3.2 it's slightly different,
Python 3.1: "hello\r\nhello\r\r\n"
Python 3.2: "hello\nhello\r\n"
Python 3.3.0a4: "hello\nhello\r\n"
All on Windows.

PS D:\Data> type t.py
import sys
print("Hello!")
sys.stdout.flush()
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("Hello!")
PS D:\Data> py -3.2 t.py | od -c
0000000 H e l l o ! \r \n H e l l o ! \r \n
0000020

I'm using Windows XP Pro (32-bit), initially sys.stdout.encoding ==
"cp1252".

PS D:\Data> py -3 -c "import sys; print(sys.stdout.encoding)"
cp850
This is at the console (Powershell) - are you running from within
something like idle, or a GUI environment?

It's at the system command prompt. When I redirect the script's stdout
to a file
(on the command line using ">output.txt") I get those 15 bytes from
Python 3.2.

Your output appears to be 32 bytes (the second line starts with
"0000020").

Paul Moore

2012-06-10 20:38:14 UTC

I am a little concerned with MRAB's report that
import sys
print("hello")
sys.stdout.flush()
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("hello")
doesn't work as expected, though. (It does work for me on Mac OS X,
both as above -- of course there are no '\r's in the output -- and
with 'print("hello", end="\r\n")'.)

That's actually Python 3.1. From Python 3.2 it's slightly different,
Python 3.1: "hello\r\nhello\r\r\n"
Python 3.2: "hello\nhello\r\n"
Python 3.3.0a4: "hello\nhello\r\n"
All on Windows.

PS D:\Data> type t.py
import sys
print("Hello!")
sys.stdout.flush()
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8')
print("Hello!")
PS D:\Data> py -3.2 t.py | od -c
0000000 H e l l o ! \r \n H e l l o ! \r \n
0000020

I'm using Windows XP Pro (32-bit), initially sys.stdout.encoding ==
"cp1252".

PS D:\Data> py -3 -c "import sys; print(sys.stdout.encoding)"
cp850
This is at the console (Powershell) - are you running from within
something like idle, or a GUI environment?

It's at the system command prompt. When I redirect the script's stdout to a
file
(on the command line using ">output.txt") I get those 15 bytes from Python
3.2.
Your output appears to be 32 bytes (the second line starts with
"0000020").

Well spotted - PowerShell does funny things with Unicode in pipes, I'd
forgotten. Indeed, I get the same output as you from cmd.

Odd.
Paul

Stephen J. Turnbull

2012-06-11 06:16:07 UTC

Post by MRAB
That's actually Python 3.1. From Python 3.2 it's slightly different,
Python 3.1: "hello\r\nhello\r\r\n"
Python 3.2: "hello\nhello\r\n"
Python 3.3.0a4: "hello\nhello\r\n"
All on Windows.

<stifle o="self"/>

Hm. Maybe it's that port's implementation of universal newlines or
something like that? What happens if you use an explicit "end="
argument? (I don't have a Python 3 to check on Windows easily
available.)

Paul Moore

2012-06-11 08:06:42 UTC

> That's actually Python 3.1. From Python 3.2 it's slightly different,
>
> Python 3.1: "hello\r\nhello\r\r\n"
> Python 3.2: "hello\nhello\r\n"
> Python 3.3.0a4: "hello\nhello\r\n"
>
> All on Windows.
<stifle o="self"/>
Hm. Maybe it's that port's implementation of universal newlines or
something like that? What happens if you use an explicit "end="
argument? (I don't have a Python 3 to check on Windows easily
available.)

Explicit end= makes no difference to the behaviour. In fact, a minimal
test suggests that universal newline mode is not enabled on Windows in
Python 3. That's a regression from 2.x. See below.

D:\Data>py -3 -c "print('x')" | od -c
0000000 x \n
0000002

D:\Data>py -2 -c "print('x')" | od -c
0000000 x \r \n
0000003

D:\Data>py -3 -V
Python 3.2.2

D:\Data>py -2 -V
Python 2.7.2

Paul.

Amaury Forgeot d'Arc

2012-06-11 08:11:34 UTC

Post by Paul Moore
Explicit end= makes no difference to the behaviour. In fact, a minimal
test suggests that universal newline mode is not enabled on Windows in
Python 3. That's a regression from 2.x. See below.
D:\Data>py -3 -c "print('x')" | od -c
0000000 x \n
0000002
D:\Data>py -2 -c "print('x')" | od -c
0000000 x \r \n
0000003

This is certainly related to http://bugs.python.org/issue11990

--
Amaury Forgeot d'Arc

Serhiy Storchaka

2012-06-10 16:43:51 UTC

Post by Nick Coghlan
This approach also has the advantage of leaving
sys.__std(in/out/err)__ in a somewhat usable state.

And then sys.std* and sys.__std*__ have their own inconsistent buffers.

Nick Coghlan

2012-06-11 06:12:45 UTC

Post by Serhiy Storchaka

Post by Nick Coghlan
This approach also has the advantage of leaving
sys.__std(in/out/err)__ in a somewhat usable state.

And then sys.std* and sys.__std*__ have their own inconsistent buffers.

Correct, but using detach() leaves sys.__std*__ completely broken
(either throwing exceptions or silently failing to emit output).
Creating two independent streams that share the underlying file handle
is much closer to the 2.x behaviour when replacing sys.std*.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan-***@public.gmane.org | Brisbane, Australia

Nick Coghlan

2012-06-10 02:26:17 UTC

Calling detach() on the standard streams is a bad idea - the interpreter
uses the originals internally, and calling detach() breaks them.

--
Sent from my phone, thus the relative brevity :)

Post by Nick Coghlan
So, after much digging, it appears the *right* way to replace a
standard stream in Python 3 after application start is to do the
sys.stdin = open(sys.stdin.fileno(), 'r',<new settings>)
sys.stdout = open(sys.stdout.fileno(), 'w',<new settings>)
sys.stderr = open(sys.stderr.fileno(), 'w',<new settings>)

sys.stdin = io.TextIOWrapper(sys.stdin.**detach(), <new settings>)
sys.stdout = io.TextIOWrapper(sys.stdout.**detach(), <new settings>)
...
None of these methods are not guaranteed to work if the input or output
have occurred before.
______________________________**_________________
Python-ideas mailing list
http://mail.python.org/**mailman/listinfo/python-ideas<http://mail.python.org/mailman/listinfo/python-ideas>

Antoine Pitrou

2012-06-10 07:17:02 UTC

Post by Nick Coghlan
Calling detach() on the standard streams is a bad idea - the interpreter
uses the originals internally, and calling detach() breaks them.

Where does it do that? The interpreter certainly shouldn't hardwire the
original objects internally.

Moreover, your snippet is wrong because if someone replaces the streams
for a second time, garbage collecting the previous streams will close
the file descriptors. You should use closefd=False.

Regards

Antoine.

Nick Coghlan

2012-06-10 13:16:24 UTC

Post by Antoine Pitrou

Post by Nick Coghlan
Calling detach() on the standard streams is a bad idea - the interpreter
uses the originals internally, and calling detach() breaks them.

Where does it do that? The interpreter certainly shouldn't hardwire the
original objects internally.

At the very least, sys.__std(in/out/err)__. Doing "sys.stderr =
io.TextIOWrapper(sys.stderr.detach(), line_buffering=True)" also seems
to suppress display of exception tracebacks at the interactive prompt
(perhaps the default except hook is using a cached reference?). I
believe PyFatalError and other APIs that are used deep in the
interpreter won't respect the module level setting.

Basically, it's dangerous to use detach() on a stream where you don't
hold the sole reference, and the safest approach with the standard
streams is to assume that other code is holding references to them.
Detaching the standard streams is just as likely to cause problems as
closing them.

Post by Antoine Pitrou
Moreover, your snippet is wrong because if someone replaces the streams for
a second time, garbage collecting the previous streams will close the file
descriptors. You should use closefd=False.

True, although that nicety is all the more reason to encapsulate this
idiom in a new IOBase.reopen() method:

def reopen(self, mode=None, buffering=-1, encoding=None,
errors=None, newline=None, closefd=False):
if mode is None:
mode = getattr(mode, self, 'r')
return open(self.fileno(), mode, buffering, encoding, errors,
newline, closefd)

Cheers,
Nick.

--
Nick Coghlan | ncoghlan-***@public.gmane.org | Brisbane, Australia

Serhiy Storchaka

2012-06-10 14:45:02 UTC

Post by Nick Coghlan
Calling detach() on the standard streams is a bad idea - the interpreter
uses the originals internally, and calling detach() breaks them.

If interpreter uses standard streams then it uses raw C streams (FILE *)
stdin/stdout/etc. Calling open(sys.stdin.fileno()) bypasses internal
buffering in sys.stdin, sys.stdin.buffer, sys.stdin.decoder and raw C
stdin (if it used in lower level), and lose and break multibyte characters.

Victor Stinner

2012-06-12 21:44:00 UTC

sys.stdin = open(sys.stdin.fileno(), 'r',<new settings>)
sys.stdout = open(sys.stdout.fileno(), 'w',<new settings>)
sys.stderr = open(sys.stderr.fileno(), 'w',<new settings>)

sys.stdin = io.TextIOWrapper(sys.stdin.detach(), <new settings>)
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), <new settings>)
...
None of these methods are not guaranteed to work if the input or output have
occurred before.

You should set the newline option for sys.std* files. Python 3 does
something like this:

if os.name == "win32:
# translate "\r\n" to "\n" for sys.stdin on Windows
newline = None
else:
newline = "\n"
sys.stdin = io.TextIOWrapper(sys.stdin.detach(), newline=newline, <new
settings>)
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), newline="\n", <new settings>)
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), newline="\n", <new settings>)

--

Lib/test/regrtest.py uses the following code which is not exactly
correct (it creates a new buffered writer instead of reusing
sys.stdout buffered writer):

def replace_stdout():
"""Set stdout encoder error handler to backslashreplace (as stderr error
handler) to avoid UnicodeEncodeError when printing a traceback"""
import atexit

stdout = sys.stdout
sys.stdout = open(stdout.fileno(), 'w',
encoding=stdout.encoding,
errors="backslashreplace",
closefd=False,
newline='\n')

def restore_stdout():
sys.stdout.close()
sys.stdout = stdout
atexit.register(restore_stdout)

Victor

Rurpy

2012-06-10 04:22:03 UTC

Post by Nick Coghlan
Calling detach() on the standard streams is a bad idea - the
interpreter uses the originals internally, and calling detach()
breaks them.

The documentation for sys.std* specifically describes
using detach() on the standard streams:

| To write or read binary data from/to the standard
| streams, use the underlying binary buffer.

and gives example code.

The only caveat mentioned is that detach() "can raise
AttributeError or io.UnsupportedOperation" if the stream
has benn replaced with something that does not support
detach().

25 Replies
127 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Nick Coghlan 2012-06-09 09:55:09 UTC

Paul Moore 2012-06-09 11:00:37 UTC

Paul Moore 2012-06-09 13:00:03 UTC

MRAB 2012-06-09 16:42:53 UTC

Serhiy Storchaka 2012-06-09 20:02:19 UTC

Mark Lawrence 2012-06-09 21:22:41 UTC

Serhiy Storchaka 2012-06-10 14:34:08 UTC

Nick Coghlan 2012-06-10 15:44:08 UTC

Stephen J. Turnbull 2012-06-10 16:41:14 UTC

MRAB 2012-06-10 18:12:55 UTC

Paul Moore 2012-06-10 18:34:04 UTC

MRAB 2012-06-10 19:01:21 UTC

Paul Moore 2012-06-10 20:07:00 UTC

MRAB 2012-06-10 20:28:14 UTC

Paul Moore 2012-06-10 20:38:14 UTC

Stephen J. Turnbull 2012-06-11 06:16:07 UTC

Paul Moore 2012-06-11 08:06:42 UTC

Amaury Forgeot d'Arc 2012-06-11 08:11:34 UTC

Serhiy Storchaka 2012-06-10 16:43:51 UTC

Nick Coghlan 2012-06-11 06:12:45 UTC

Nick Coghlan 2012-06-10 02:26:17 UTC

Antoine Pitrou 2012-06-10 07:17:02 UTC

Nick Coghlan 2012-06-10 13:16:24 UTC

Serhiy Storchaka 2012-06-10 14:45:02 UTC

Victor Stinner 2012-06-12 21:44:00 UTC

Rurpy 2012-06-10 04:22:03 UTC

about - legalese

Loading...