Потеря данных в TcpStream::read_exact с read_timeout

chernomor · 20.Март.2019 19:34:06

https://gist.github.com/chernomor/2425e2caf5143cc567f2f7fe57ea7108 - работающий (у меня, по крайней мере ) тест

Суть: если на TcpStream включить таймаут на чтение, пытаться прочитать заданное число байт, которое не наберется до таймаута, то полученные данные пропадут после таймаута и на следующем вызове read_exact прочитаны уже не будут.

Правильно ли я понимаю, это никак не обходится, кроме как городить свой лунапарк вокруг read (с ним все нормально)?

IBUzPE9 · 25.Март.2019 08:58:42

Эти особенности описаны в документации на std::io::Read::read_exact():

If any other read error is encountered then this function immediately returns. The contents of buf are unspecified in this case.
If this function returns an error, it is unspecified how many bytes it has read, but it will never read more than would be necessary to completely fill the buffer.

Опять же ответ есть в документации на std::io::Read::read():

If this function encounters any form of I/O or other error, an error variant will be returned. If an error is returned then it must be guaranteed that no bytes were read.

Метод std::io::Read::read_exact() реализуется автоматически при наличии реализации метода std::io::Read::read(). Его код довольно простой:

    fn read_exact(&mut self, mut buf: &mut [u8]) -> Result<()> {
        while !buf.is_empty() {
            match self.read(buf) {
                Ok(0) => break,
                Ok(n) => { let tmp = buf; buf = &mut tmp[n..]; }
                Err(ref e) if e.kind() == ErrorKind::Interrupted => {}
                Err(e) => return Err(e),
            }
        }
        if !buf.is_empty() {
            Err(Error::new(ErrorKind::UnexpectedEof,
                           "failed to fill whole buffer"))
        } else {
            Ok(())
        }
    }

Можно легко сделать свою функцию, которая будет возвращать вместе с кодом ошибки количество уже считанных байт.

Могу предположить почему этого нет “из коробки”: read_exact() ввели с версии 1.6 и к тому моменту структура std::io::Error, видимо, уже была стабилизирована и добавлять туда количество считанных данных не стали, чтобы не делать breaking change.

Вариант, когда при ошибке данные остаются во внутреннем буфере исключает автоматическую реализацию read_exact().

IBUzPE9 · 25.Март.2019 12:15:10

Нашел RFC по read_exact(). Вот что там пишут про содержимое буфера после ошибки:

This RFC proposes that the contents of the output buffer be undefined on an error return. It might be untouched, partially overwritten, or completely overwritten (even if less bytes could be read; for instance, this method might in theory use it as a scratch space).

Two possible alternatives could be considered: do not touch it on failure, or overwrite it with valid data as much as possible.

Never touching the output buffer on failure would make it much more expensive for the default implementation (which calls read in a loop), since it would have to read into a temporary buffer and copy to the output buffer on success. Any implementation which cannot do an early return for all failure cases would have similar extra costs.

Overwriting as much as possible with valid data makes some sense; it happens without any extra cost in the default implementation. However, for optimized implementations this extra work is useless; since the caller can’t know how much is valid data and how much is garbage, it can’t make use of the valid data.

Users who need finer control should use the read method directly.

chernomor · 05.Апрель.2019 11:37:49

Спасибо за развернутый ответ.