Discussion:
dd posix spec.
(too old to reply)
Rob Landley
2017-07-12 08:22:36 UTC
Permalink
Raw Message
Last conversation about the dd spec on here, people suggested the spec
was literally a practical joke. That said, posix published the darn
thing, so let's look through it.

First, we will not be implementing the whole thing. It requires
ascii/ebcdic mapping, which was dead 30 years ago. So the question is
what _subset is worth implementing.

bs= without data modifying conversions means you output what you input.
(If you got a short read, you do a write of that size.) Otherwise, you
collate input blocks into output blocks of the requested size.

question: what happens if there's a short write? Do you collate to the
next full output block size, or do you re-write the missing chunk as a
short write?

Question: what if bs= and obs= are both specified? (Answer: bs= wins.)

sync is silly. swab is silly. But both are easy to do...

Question: lcase and ucase are utf8 now, and any fixed block size is
going to chop characters in the middle. It says conversions operate
independent of input blocking, so I guess I need a minimum buffer size
of 512 or so... (gotta look up what this block/unblock stuff is doing...)

Ok, block or unblock do nothing unless you specify cbs= "conversion
block size". Which is different from ibs=, obs=, and bs=. Right, I'm
going to throw that in the "did not implement" pile and wait for
somebody to complain.

There's a bs=123x456 in the spec, we didn't previously implement that,
I'm not adding it now because it's crazy and $((123*456)) exists.

No if= default to stdin, no of= default to stdout. Got it.

sigint causes progress indicator output, but I have a "todo" that says
it's not ending the process...

Question: if bs= _isn't_ specified (but nor is ibs= or obs=) I vaguely
recall the default block size is 512 bytes. Is that considered bs= being
specified in terms of the "write what you wrote" behavior, or do we fill
up 512 byte output blocks if we read less than that? (This matters if
you dd from /dev/ttyS0 and get bytes typed by humans.)

If your output block is a short write, do you retry the rest of tha that
block or do a whole next block?

of= is truncated by default, to seek= position if that's specified.
Disabled with conv=notrunc.

Edge case: If you specify ibs=prime1 obs=prime2 then the smallest
internal buffer you can have without memcpy is ibs*obs... except even
_that_ won't work if you have a short read that screws up the alignment and

If you're willing to do memcpy to preserve block size, then you just
need ibs+obs as your worst case, and can memcpy to realign after each
block if necessary.

Three potential output formats:

"%u+%u records in\n", <number of whole input blocks>,
<number of partial input blocks>
"%u+%u records out\n", <number of whole output blocks>,
<number of partial output blocks>
"%u truncated %s\n", <number of truncated blocks>, "record" (if
<number of truncated blocks> is one) "records" (otherwise)
Another point is that a failed read on a regular file or a disk
generally does not increment the file offset, and dd must then
seek past the block on which the error occurred; otherwise, the
input error occurs repetitively. When the input is a magnetic
tape, however, the tape normally has passed the block containing
the error when the error is reported, and thus no seek is necessary.
So... try to seek after an error but ignore failure of lseek()?

The bit about writing a partial block after an error without noerror...
I guess that's from obs being larger than ibs? (Because read() either
returns data _or_ error, not both...)

Sigh. What a mess. Next up, a similarly close reading of the man page...

Rob
Rob Landley
2017-07-14 02:50:39 UTC
Permalink
Raw Message
Post by Rob Landley
Next up, a similarly close reading of the man page...
Using ubuntu 14.04 as reference:

Still dropping conv=ascii,ebcdic,ibm because that was already obsolete
30 years ago.

I can see adding conv=excl,nocreat (sure, you got notrunc already)

Adding conv=fsync but not conv=fdatasync (If this breaks somebody's
script I can add it but is "sync data but not metadata" a hair we want
to split?)

Let's see... iflag and oflag= are kind of silly. There's already
conv=excl,nocreat,excl,notrunc so why is append not there too? Why can
you have iflag=append? "direct" is micromanaging nonsense (it was
introduced for oracle's database code, we have sync if we want to be
sure it hit disk). And "directory" is just nonsensical on Linux.
(Apparently bsd can read binary directory info, but there's no _use_ for
it?)

https://lists.gnu.org/archive/html/coreutils/2014-08/msg00032.html

Sigh. But somebody did dig up a use case for nonblock:

https://stackoverflow.com/questions/32057396/how-to-flush-named-pipefifo-in-non-blocking-mode-in-busybox-shell-script

But busybox git still doesn't implement it so there can't be that big of
a demand? Hmmm...

Alright, these _might_ make sense:

iflag=noctty,noatime,nonblock,nofollow
oflag=append,noctty,noatime,nonblock,

These are silly:

?flag=direct,directory,dsync,fullblock,nocache

(If you want to drop caches for some sort of benchmark you can
echo 3 > /proc/sys/vm/drop_caches)

What on earth is count_bytes? answer: how dd should have worked in the
first place? Oh this is just a terribly designed command all around.
conv=bytes makes sense, having separate count-bytes, seek_bytes,and
skip_bytes and having them care whether you iflags= or oflags= them is
NONSENSE.

This business with sending a signal to get status? If you want dd to
output periodic progress reports (or a progress bar), DO THAT.
status=bar or status=count would have been the obvious things to do.

Ok, looking at the iflag/oflag things: noctty, noatime, nofollow, and
append don't make sense to specify for input/output. If you want it for
one you presumably want it for both. The only one that really needs to
distinguish input from output is "nonblock" (although not for the one
use case I could find, since /dev/null should never block...). And if we
needed to distinguish within conv it could just have done
conv=inonblock,ononblock.

Sigh. Nobody's requested any of this stuff yet, it's not in posix, and
even busybox hasn't bothered to implement it so far...

Right, question: should I

A) copy the existing insanity

B) just do:

conv=append,noctty,inonblock,ononblock,nofollow,bytes
status=bar,count

C) ignore it all and wait for people to complain about its absence?

Does anybody have strong opinions on this? I'm leaning towards door #2
myself. Posix has had 15 years to catch up on this, and declined.

Rob
Alain Toussaint
2017-07-14 03:41:53 UTC
Permalink
Raw Message
Use case here:

dd if=/dev/zero of=/dev/sd[a,b,c,d,etc...] bs=64MB (could go higher)
conv=fsync status=progress

The important bit for me is the status=progress or whatever you come
up as a design to figure out disk throughput.

It's the only use I have for dd.

Al
Post by Rob Landley
Post by Rob Landley
Next up, a similarly close reading of the man page...
Still dropping conv=ascii,ebcdic,ibm because that was already obsolete
30 years ago.
I can see adding conv=excl,nocreat (sure, you got notrunc already)
Adding conv=fsync but not conv=fdatasync (If this breaks somebody's
script I can add it but is "sync data but not metadata" a hair we want
to split?)
Let's see... iflag and oflag= are kind of silly. There's already
conv=excl,nocreat,excl,notrunc so why is append not there too? Why can
you have iflag=append? "direct" is micromanaging nonsense (it was
introduced for oracle's database code, we have sync if we want to be
sure it hit disk). And "directory" is just nonsensical on Linux.
(Apparently bsd can read binary directory info, but there's no _use_ for
it?)
https://lists.gnu.org/archive/html/coreutils/2014-08/msg00032.html
https://stackoverflow.com/questions/32057396/how-to-flush-named-pipefifo-in-non-blocking-mode-in-busybox-shell-script
But busybox git still doesn't implement it so there can't be that big of
a demand? Hmmm...
iflag=noctty,noatime,nonblock,nofollow
oflag=append,noctty,noatime,nonblock,
?flag=direct,directory,dsync,fullblock,nocache
(If you want to drop caches for some sort of benchmark you can
echo 3 > /proc/sys/vm/drop_caches)
What on earth is count_bytes? answer: how dd should have worked in the
first place? Oh this is just a terribly designed command all around.
conv=bytes makes sense, having separate count-bytes, seek_bytes,and
skip_bytes and having them care whether you iflags= or oflags= them is
NONSENSE.
This business with sending a signal to get status? If you want dd to
output periodic progress reports (or a progress bar), DO THAT.
status=bar or status=count would have been the obvious things to do.
Ok, looking at the iflag/oflag things: noctty, noatime, nofollow, and
append don't make sense to specify for input/output. If you want it for
one you presumably want it for both. The only one that really needs to
distinguish input from output is "nonblock" (although not for the one
use case I could find, since /dev/null should never block...). And if we
needed to distinguish within conv it could just have done
conv=inonblock,ononblock.
Sigh. Nobody's requested any of this stuff yet, it's not in posix, and
even busybox hasn't bothered to implement it so far...
Right, question: should I
A) copy the existing insanity
conv=append,noctty,inonblock,ononblock,nofollow,bytes
status=bar,count
C) ignore it all and wait for people to complain about its absence?
Does anybody have strong opinions on this? I'm leaning towards door #2
myself. Posix has had 15 years to catch up on this, and declined.
Rob
_______________________________________________
Toybox mailing list
http://lists.landley.net/listinfo.cgi/toybox-landley.net
Alain Toussaint
2017-07-14 03:44:03 UTC
Permalink
Raw Message
In the feature request department, I would love a wipefs command who
does similar to how I use dd. current wipefs from e2fsprogs leave out
some artifact when asked to wipe disk signatures...

Al
Post by Alain Toussaint
dd if=/dev/zero of=/dev/sd[a,b,c,d,etc...] bs=64MB (could go higher)
conv=fsync status=progress
The important bit for me is the status=progress or whatever you come
up as a design to figure out disk throughput.
It's the only use I have for dd.
Al
Post by Rob Landley
Post by Rob Landley
Next up, a similarly close reading of the man page...
Still dropping conv=ascii,ebcdic,ibm because that was already obsolete
30 years ago.
I can see adding conv=excl,nocreat (sure, you got notrunc already)
Adding conv=fsync but not conv=fdatasync (If this breaks somebody's
script I can add it but is "sync data but not metadata" a hair we want
to split?)
Let's see... iflag and oflag= are kind of silly. There's already
conv=excl,nocreat,excl,notrunc so why is append not there too? Why can
you have iflag=append? "direct" is micromanaging nonsense (it was
introduced for oracle's database code, we have sync if we want to be
sure it hit disk). And "directory" is just nonsensical on Linux.
(Apparently bsd can read binary directory info, but there's no _use_ for
it?)
https://lists.gnu.org/archive/html/coreutils/2014-08/msg00032.html
https://stackoverflow.com/questions/32057396/how-to-flush-named-pipefifo-in-non-blocking-mode-in-busybox-shell-script
But busybox git still doesn't implement it so there can't be that big of
a demand? Hmmm...
iflag=noctty,noatime,nonblock,nofollow
oflag=append,noctty,noatime,nonblock,
?flag=direct,directory,dsync,fullblock,nocache
(If you want to drop caches for some sort of benchmark you can
echo 3 > /proc/sys/vm/drop_caches)
What on earth is count_bytes? answer: how dd should have worked in the
first place? Oh this is just a terribly designed command all around.
conv=bytes makes sense, having separate count-bytes, seek_bytes,and
skip_bytes and having them care whether you iflags= or oflags= them is
NONSENSE.
This business with sending a signal to get status? If you want dd to
output periodic progress reports (or a progress bar), DO THAT.
status=bar or status=count would have been the obvious things to do.
Ok, looking at the iflag/oflag things: noctty, noatime, nofollow, and
append don't make sense to specify for input/output. If you want it for
one you presumably want it for both. The only one that really needs to
distinguish input from output is "nonblock" (although not for the one
use case I could find, since /dev/null should never block...). And if we
needed to distinguish within conv it could just have done
conv=inonblock,ononblock.
Sigh. Nobody's requested any of this stuff yet, it's not in posix, and
even busybox hasn't bothered to implement it so far...
Right, question: should I
A) copy the existing insanity
conv=append,noctty,inonblock,ononblock,nofollow,bytes
status=bar,count
C) ignore it all and wait for people to complain about its absence?
Does anybody have strong opinions on this? I'm leaning towards door #2
myself. Posix has had 15 years to catch up on this, and declined.
Rob
_______________________________________________
Toybox mailing list
http://lists.landley.net/listinfo.cgi/toybox-landley.net
Rob Landley
2017-07-14 18:48:47 UTC
Permalink
Raw Message
Post by Alain Toussaint
In the feature request department, I would love a wipefs command who
does similar to how I use dd. current wipefs from e2fsprogs leave out
some artifact when asked to wipe disk signatures...
Al
Define "some artifact"? It looks like it's basically blkid traversing
the table and overwriting any matches it finds with zeroes?

And yeah, I always used dd for that. Didn't know there was a command.
and I don't see how making it not there but recoverable is useful? (You
can backup the contents of a block device with "cat" if you really need
to...)

Rob
Alain Toussaint
2017-07-14 19:30:39 UTC
Permalink
Raw Message
Post by Rob Landley
Define "some artifact"? It looks like it's basically blkid traversing
the table and overwriting any matches it finds with zeroes?
Basically, I would wipe a disk partitions using wipefs --all /dev/sda
and load up cfdisk to write new partition but it would find already
existing partitions (boot or swap, or whatever) if I'd replicate some
partitions structure. Example:

old /dev/sda (1TB disk, dos partitioning)

/boot 512MB
swap-v1 4096MB
/ 48GB
/home 96GB
/srv $REST_OF_DISK_SPACE (typically 750-800GB, database space mostly
but web server and other odds and ends).

new /dev/sda (after cfdisk):

/boot 512MB (found by cfdisk despite partition signature not there anymore).
swap-v1 4096MB (ditto).
/ 48GB (ditto).
/home 192GB (ditto because of the starting point of the partition)
/var 96GB (no signature found by cfdisk, mail server space).
/srv $REST_OF_DISK_SPACE (no signature found).
Post by Rob Landley
And yeah, I always used dd for that. Didn't know there was a command.
and I don't see how making it not there but recoverable is useful? (You
can backup the contents of a block device with "cat" if you really need
to...)
Rob
My use case for dd is not for backup purpose (I'm not that inclined to
save a bunch of infected windows boxes and their ilks) but for cleanup
purpose.

Al
Rob Landley
2017-07-15 16:50:39 UTC
Permalink
Raw Message
A lot of this reminds me from my old SCSI and SASI days, and the PDP
series of minicomputers, where everything internal connected by one or
the other and these commands were de-rigur to ensure the commands and
data 'got through' between devices, after all the bus was ribbon cable.
Both scsi and especially sasi are still out there and going well, just
under other names.
I started out on a commodore 64. I'm not adding a command to translate
petascii, run commodore basic, or read 1541 floppy images. :)
However some of these 'obscure' dd settings are used today in the likes
of firmware updating packages so they are needed. Whether they should
appear as more than that, such as in documentation details, is something
to consider.
I can wait for something like that to actually try to get used, break,
and then see whether it's easier to fix their script than this dd
implementation.

Remember: I'm mostly trying to figure out what to do about the non-posix
stuff. If it's in posix, merely being silly isn't enough to yank it.
And the commands should be and are available to the full dd command,
whether also in toybox is a moot point, but personally with a little
thought, I think i'd say "not required!" but I may change my mind later.
I'm leaning not required as well.
Oh, and yes EBDIC is still used for external devices in the engineering,
science, chemistry and metalurgy industries for devices reading
resistance, capacitance and the like, (spectrometric devices come easily
to mind,) so conv=ascii, ebdic, ibm are still alive so maybe worth
keeping for the tinkerers out there.
There are still 6 bit devices out there too. I'm not implementing a
conversion for 'em in this dd. There's presumably a "tr" command line
that could do it.
I'll leave it to you if you want to feed this to the lists or not.
I prefer to have stuff on the list so I can find it again. :)
regards
scsijon
Thanks,

Rob

Loading...