Discussion:
[PATCH] Add support for 1024 as well as 1000 to human_readable.
(too old to reply)
enh
2015-08-15 22:20:34 UTC
Permalink
Add support for 1024 as well as 1000 to human_readable.

This fixes the issue found with du, and paves the way for ls -lh (in a
separate patch). In manual testing this produces similar results to
coreutils, but better in some cases, presumably due to bad rounding in
coreutils. (toybox's 5675-byte main.c, for example, should be rounded
to 5.5Ki, not the 5.6 that coreutils reports.)

I've preserved support for SI multiples of 1000, but haven't bothered
to work out whether dd actually wants that. "It's in pending for a
reason", after all.

diff --git a/lib/lib.c b/lib/lib.c
index c16cffe..2f80be3 100644
--- a/lib/lib.c
+++ b/lib/lib.c
@@ -866,23 +866,22 @@ void names_to_pid(char **names, int
(*callback)(pid_t pid, char *name))
closedir(dp);
}

-// display first few digits of number with power of two units, except we're
-// actually just counting decimal digits and showing mil/bil/trillions.
int human_readable(char *buf, unsigned long long num, int style)
{
- int end, len;
+ double amount = num;
+ const char *unit = (style&HR_SI) ? " kMGTPE" : " KMGTPE";
+ int divisor = (style&HR_SI) ? 1000 : 1024;
+ int end;

- len = sprintf(buf, "%lld", num)-1;
- end = (len%3)+1;
- len /= 3;
-
- if (len && end == 1) {
- buf[2] = buf[1];
- buf[1] = '.';
- end = 3;
+ while (amount > divisor && *unit) {
+ amount /= divisor;
+ ++unit;
}
+ if (amount < 10) end = sprintf(buf, "%.1lf", amount);
+ else end = sprintf(buf, "%.0lf", amount);
+
if (style & HR_SPACE) buf[end++] = ' ';
- if (len) buf[end++] = " KMGTPE"[len];
+ if (*unit && *unit != ' ') buf[end++] = *unit;
if (style & HR_B) buf[end++] = 'B';
buf[end++] = 0;

diff --git a/lib/lib.h b/lib/lib.h
index 17a4a97..5805295 100644
--- a/lib/lib.h
+++ b/lib/lib.h
@@ -177,8 +177,9 @@ void replace_tempfile(int fdin, int fdout, char **tempname);
void crc_init(unsigned int *crc_table, int little_endian);
void base64_init(char *p);
int yesno(char *prompt, int def);
-#define HR_SPACE 1
-#define HR_B 2
+#define HR_SPACE 1 // "20 K"; default "20K".
+#define HR_B 2 // "4B"; default "4".
+#define HR_SI 4 // /1000; default /1024.
int human_readable(char *buf, unsigned long long num, int style);
int qstrcmp(const void *a, const void *b);
int xpoll(struct pollfd *fds, int nfds, int timeout);
Rob Landley
2015-08-17 19:02:03 UTC
Permalink
On 08/15/2015 05:20 PM, enh wrote:
> Add support for 1024 as well as 1000 to human_readable.
>
> This fixes the issue found with du, and paves the way for ls -lh (in a
> separate patch). In manual testing this produces similar results to
> coreutils, but better in some cases, presumably due to bad rounding in
> coreutils. (toybox's 5675-byte main.c, for example, should be rounded
> to 5.5Ki, not the 5.6 that coreutils reports.)
>
> I've preserved support for SI multiples of 1000, but haven't bothered
> to work out whether dd actually wants that. "It's in pending for a
> reason", after all.
>
> diff --git a/lib/lib.c b/lib/lib.c
> index c16cffe..2f80be3 100644
> --- a/lib/lib.c
> +++ b/lib/lib.c
> @@ -866,23 +866,22 @@ void names_to_pid(char **names, int
> (*callback)(pid_t pid, char *name))
> closedir(dp);
> }
>
> -// display first few digits of number with power of two units, except we're
> -// actually just counting decimal digits and showing mil/bil/trillions.
> int human_readable(char *buf, unsigned long long num, int style)
> {
> - int end, len;
> + double amount = num;
> + const char *unit = (style&HR_SI) ? " kMGTPE" : " KMGTPE";
> + int divisor = (style&HR_SI) ? 1000 : 1024;
> + int end;

config TOYBOX_FLOAT
bool "Floating point support"
default y
help
Include floating point support infrastructure and commands that
require it.

Not relevant to android, but I've been trying to let toybox work on
systems without even emulated floating point support. (If I'm trying
to displace busybox, leaving it obvious niches that busybox does
but toybox doesn't do is unfair to those niches.)

I take a stab at making it work with long long, and if it's nontrivial
I'll bite the bullet and merge the double version. (What would be really
nice would be test cases, but I'll see what I can come up with.)

It's too bad du --apparent-size hasn't got a short option or I'd add
that so I could hijack du.test to test human_readable() in various
rounding conditions via truncate -s. (Might still anyway, but that's
a really stupid longopt name that gnu came up with there...)

Rob
enh
2015-08-17 20:38:13 UTC
Permalink
On Mon, Aug 17, 2015 at 12:02 PM, Rob Landley <***@landley.net> wrote:
> On 08/15/2015 05:20 PM, enh wrote:
>> Add support for 1024 as well as 1000 to human_readable.
>>
>> This fixes the issue found with du, and paves the way for ls -lh (in a
>> separate patch). In manual testing this produces similar results to
>> coreutils, but better in some cases, presumably due to bad rounding in
>> coreutils. (toybox's 5675-byte main.c, for example, should be rounded
>> to 5.5Ki, not the 5.6 that coreutils reports.)
>>
>> I've preserved support for SI multiples of 1000, but haven't bothered
>> to work out whether dd actually wants that. "It's in pending for a
>> reason", after all.
>>
>> diff --git a/lib/lib.c b/lib/lib.c
>> index c16cffe..2f80be3 100644
>> --- a/lib/lib.c
>> +++ b/lib/lib.c
>> @@ -866,23 +866,22 @@ void names_to_pid(char **names, int
>> (*callback)(pid_t pid, char *name))
>> closedir(dp);
>> }
>>
>> -// display first few digits of number with power of two units, except we're
>> -// actually just counting decimal digits and showing mil/bil/trillions.
>> int human_readable(char *buf, unsigned long long num, int style)
>> {
>> - int end, len;
>> + double amount = num;
>> + const char *unit = (style&HR_SI) ? " kMGTPE" : " KMGTPE";
>> + int divisor = (style&HR_SI) ? 1000 : 1024;
>> + int end;
>
> config TOYBOX_FLOAT
> bool "Floating point support"
> default y
> help
> Include floating point support infrastructure and commands that
> require it.
>
> Not relevant to android, but I've been trying to let toybox work on
> systems without even emulated floating point support. (If I'm trying
> to displace busybox, leaving it obvious niches that busybox does
> but toybox doesn't do is unfair to those niches.)

i did see TOYBOX_FLOAT but not all the existing floating point code is
guarded by it so assumed it was bitrotting to death.

> I take a stab at making it work with long long, and if it's nontrivial
> I'll bite the bullet and merge the double version.

it would be nice if you could keep the double versions for builds that
allow floating point. i suspect that coreutils isn't using floating
point under the hood, because some of its rounding seems wrong. it
would be sad to pessimize toybox on real hardware to support such a
minority use case.

(even though they're easy enough to find, i didn't list examples above
because i didn't want to encourage you to break things :-) )

> (What would be really
> nice would be test cases, but I'll see what I can come up with.)

this is why you need unit tests for the library rather than relying on
testing the final tools...

> It's too bad du --apparent-size hasn't got a short option or I'd add
> that so I could hijack du.test to test human_readable() in various
> rounding conditions via truncate -s. (Might still anyway, but that's
> a really stupid longopt name that gnu came up with there...)
>
> Rob



--
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.
Rob Landley
2015-08-18 04:24:35 UTC
Permalink
On 08/17/2015 03:38 PM, enh wrote:
> On Mon, Aug 17, 2015 at 12:02 PM, Rob Landley <***@landley.net> wrote:
>> On 08/15/2015 05:20 PM, enh wrote:
>>> Add support for 1024 as well as 1000 to human_readable.
>>>
>>> This fixes the issue found with du, and paves the way for ls -lh (in a
>>> separate patch). In manual testing this produces similar results to
>>> coreutils, but better in some cases, presumably due to bad rounding in
>>> coreutils. (toybox's 5675-byte main.c, for example, should be rounded
>>> to 5.5Ki, not the 5.6 that coreutils reports.)
>>>
>>> I've preserved support for SI multiples of 1000, but haven't bothered
>>> to work out whether dd actually wants that. "It's in pending for a
>>> reason", after all.
>>>
>>> diff --git a/lib/lib.c b/lib/lib.c
>>> index c16cffe..2f80be3 100644
>>> --- a/lib/lib.c
>>> +++ b/lib/lib.c
>>> @@ -866,23 +866,22 @@ void names_to_pid(char **names, int
>>> (*callback)(pid_t pid, char *name))
>>> closedir(dp);
>>> }
>>>
>>> -// display first few digits of number with power of two units, except we're
>>> -// actually just counting decimal digits and showing mil/bil/trillions.
>>> int human_readable(char *buf, unsigned long long num, int style)
>>> {
>>> - int end, len;
>>> + double amount = num;
>>> + const char *unit = (style&HR_SI) ? " kMGTPE" : " KMGTPE";
>>> + int divisor = (style&HR_SI) ? 1000 : 1024;
>>> + int end;
>>
>> config TOYBOX_FLOAT
>> bool "Floating point support"
>> default y
>> help
>> Include floating point support infrastructure and commands that
>> require it.
>>
>> Not relevant to android, but I've been trying to let toybox work on
>> systems without even emulated floating point support. (If I'm trying
>> to displace busybox, leaving it obvious niches that busybox does
>> but toybox doesn't do is unfair to those niches.)
>
> i did see TOYBOX_FLOAT but not all the existing floating point code is
> guarded by it so assumed it was bitrotting to death.

More that it was added well into the process and not everything got
converted over. (One of my proposed aboriginal test environments has no
floating point, so that's a test case once I've got it working, but
that's queued up after the nommu support.)

>> I take a stab at making it work with long long, and if it's nontrivial
>> I'll bite the bullet and merge the double version.
>
> it would be nice if you could keep the double versions for builds that
> allow floating point. i suspect that coreutils isn't using floating
> point under the hood, because some of its rounding seems wrong. it
> would be sad to pessimize toybox on real hardware to support such a
> minority use case.

Can the long long not round properly?

> (even though they're easy enough to find, i didn't list examples above
> because i didn't want to encourage you to break things :-) )

Alas I swing between almost neurotic levels of testing and "I need to do
a testing pass before the release, no time for it now"...

(I have a whole "properly fill in the test suite" pass scheduled, but
it's about as big as implementing vi, and I've been holding off so I
don't have to do it multiple times. In theory I could declare individual
commands finished and do their tests (and I've done a few), but I keep
thinking I'm finished with stuff like ls and being wrong...)

Todo list, runneth over. This is indeed important, as is documentation,
as is clearing out pending, as is a regular release schedule...

>> (What would be really
>> nice would be test cases, but I'll see what I can come up with.)
>
> this is why you need unit tests for the library rather than relying on
> testing the final tools...

Usually I try to find a tool that exposes the library functionality, but
there are some cases I can't easily do that for, yes.

Possibly I need to add some more commands under toys/example just so I
can have the test suite call them? Hmmm...

Rob
Rob Landley
2015-08-24 01:20:15 UTC
Permalink
On 08/17/2015 03:38 PM, enh wrote:
> On Mon, Aug 17, 2015 at 12:02 PM, Rob Landley <***@landley.net> wrote:
>> I take a stab at making it work with long long, and if it's nontrivial
>> I'll bite the bullet and merge the double version.
>
> it would be nice if you could keep the double versions for builds that
> allow floating point. i suspect that coreutils isn't using floating
> point under the hood, because some of its rounding seems wrong. it
> would be sad to pessimize toybox on real hardware to support such a
> minority use case.

It looks like the double version can say "1023M", which isn't right
either. Yes, ubuntu can too:

$ truncate blah -s $((1023*1024))
$ du -h --apparent-size blah
1023K blah

You're already saying you don't want to copy them getting the rounding
wrong, but this is the wrong number of _digits_. The base 10 one is 3
digits max, either via "9.8" or "321". This one can go to 4 digits. I
thought the reason for the "double" was to take care of that, but
apparently not?

I was halfway through doing an if (CFG_TOYBOX_FLOAT) one with the double
and an else case with the long long, but I _think_ what we need to do is
test against 999 each time and if we overshoot slightly it should round
up as 1.0 of the next unit? So decimal test always, but binary divide
where appropriate.

The ubuntu behavior is...

$ truncate blah -s $((3000*1024))
$ du -h --apparent-size blah
3.0M blah

That got rounded up (3000 < 3*1024) so 1001KB being 1.0MB is also
legitimately something you can round up? I think?

/me wists for a specification. Oh well. I hate when I have to guess at
what the right behavior _is_...

> (even though they're easy enough to find, i didn't list examples above
> because i didn't want to encourage you to break things :-) )
>
>> (What would be really
>> nice would be test cases, but I'll see what I can come up with.)
>
> this is why you need unit tests for the library rather than relying on
> testing the final tools...

I added a test command to toys/examples and a very simple one to tests/
but I need the *.test file waaaay fluffed out. :)

Rob
enh
2015-08-24 20:10:32 UTC
Permalink
On Sun, Aug 23, 2015 at 6:20 PM, Rob Landley <***@landley.net> wrote:
> On 08/17/2015 03:38 PM, enh wrote:
>> On Mon, Aug 17, 2015 at 12:02 PM, Rob Landley <***@landley.net> wrote:
>>> I take a stab at making it work with long long, and if it's nontrivial
>>> I'll bite the bullet and merge the double version.
>>
>> it would be nice if you could keep the double versions for builds that
>> allow floating point. i suspect that coreutils isn't using floating
>> point under the hood, because some of its rounding seems wrong. it
>> would be sad to pessimize toybox on real hardware to support such a
>> minority use case.
>
> It looks like the double version can say "1023M", which isn't right
> either. Yes, ubuntu can too:
>
> $ truncate blah -s $((1023*1024))
> $ du -h --apparent-size blah
> 1023K blah
>
> You're already saying you don't want to copy them getting the rounding
> wrong, but this is the wrong number of _digits_. The base 10 one is 3
> digits max, either via "9.8" or "321". This one can go to 4 digits. I
> thought the reason for the "double" was to take care of that, but
> apparently not?

no, i was just trying to get the rounding "obviously right".

the behavior matches what google usually does, and matched the
coreutils outputs in my ubuntu box's /, /tmp, and a checked-out toybox
repository. i've not seen the coreutils code, but i do know the BSD
code has a bunch of extra heuristics.

> I was halfway through doing an if (CFG_TOYBOX_FLOAT) one with the double
> and an else case with the long long, but I _think_ what we need to do is
> test against 999 each time and if we overshoot slightly it should round
> up as 1.0 of the next unit? So decimal test always, but binary divide
> where appropriate.
>
> The ubuntu behavior is...
>
> $ truncate blah -s $((3000*1024))
> $ du -h --apparent-size blah
> 3.0M blah
>
> That got rounded up (3000 < 3*1024) so 1001KB being 1.0MB is also
> legitimately something you can round up? I think?
>
> /me wists for a specification. Oh well. I hate when I have to guess at
> what the right behavior _is_...

yeah, i was actually trying to avoid ending up with all the heuristics
the BSD implementation has.

the BSD man page says:

If the formatted number (including suffix) would be too long to fit into
buf, then divide number by 1024 until it will.
...
The len argument must be at least 4 plus the length of suffix, in order
to ensure a useful result is generated into buf.

so it certainly seems they follow the "no more than three digits/two
digits plus '.'" rule.

>> (even though they're easy enough to find, i didn't list examples above
>> because i didn't want to encourage you to break things :-) )
>>
>>> (What would be really
>>> nice would be test cases, but I'll see what I can come up with.)
>>
>> this is why you need unit tests for the library rather than relying on
>> testing the final tools...
>
> I added a test command to toys/examples and a very simple one to tests/
> but I need the *.test file waaaay fluffed out. :)
>
> Rob



--
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.
Rob Landley
2015-08-25 01:47:03 UTC
Permalink
On 08/24/2015 03:10 PM, enh wrote:
> On Sun, Aug 23, 2015 at 6:20 PM, Rob Landley <***@landley.net> wrote:
>> /me wists for a specification. Oh well. I hate when I have to guess at
>> what the right behavior _is_...
>
> yeah, i was actually trying to avoid ending up with all the heuristics
> the BSD implementation has.
>
> the BSD man page says:
>
> If the formatted number (including suffix) would be too long to fit into
> buf, then divide number by 1024 until it will.

That's just "test against 999, divide by 1024". Easy enough.

> The len argument must be at least 4 plus the length of suffix, in order
> to ensure a useful result is generated into buf.

That constraint's already implicit. I should make sure it's explicit.

> so it certainly seems they follow the "no more than three digits/two
> digits plus '.'" rule.

I can work with this.

Thanks,

Rob
James McMechan
2015-08-29 02:47:52 UTC
Permalink
> Date: Mon, 24 Aug 2015 20:47:03 -0500
> From: ***@landley.net
> To: ***@google.com
> CC: ***@lists.landley.net
> Subject: Re: [Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.
>
> On 08/24/2015 03:10 PM, enh wrote:
>> On Sun, Aug 23, 2015 at 6:20 PM, Rob Landley <***@landley.net> wrote:
>>> /me wists for a specification. Oh well. I hate when I have to guess at
>>> what the right behavior _is_...

Well checking back with my copy of "Engineering Fundamentals and Problem Solving" A. Eide et al 1979 Ch 5
Engineering units are 0.1 to 999 followed by a space, prefix and SI unit.

I am of the opinion that gratious loss of precision should be avoided.
Since a one chararacter prefix and decimal point take two character spaces the natural
breakpoint would be 10000 e.g. 9998,9999,10 k for SI decimal notation.
Using the IEC two character binary prefix Ki/Mi/Gi uses three spaces with the '.'
This would however yeild a breakpoint at 100 000 or 10 000 if we use a thousands seperator.
Which seems to me a bit large.

>> yeah, i was actually trying to avoid ending up with all the heuristics
>> the BSD implementation has.
>>
>> the BSD man page says:
>>
>> If the formatted number (including suffix) would be too long to fit into
>> buf, then divide number by 1024 until it will.
>
> That's just "test against 999, divide by 1024". Easy enough.
>
>> The len argument must be at least 4 plus the length of suffix, in order
>> to ensure a useful result is generated into buf.
>
> That constraint's already implicit. I should make sure it's explicit.
>
>> so it certainly seems they follow the "no more than three digits/two
>> digits plus '.'" rule.
>
> I can work with this.
>
> Thanks,
>
> Rob

Attached is a patch that should allow for 0..9999, 10 k..999 k, 1.0 M..999 M SI units
0..9999, 9.8 Ki..999 Ki, 1.0 Mi..999 Mi... IEC binary units, note the 9999 -> 9.8 Ki transition
I have tested this with LE32 BE32 LE64 while I have BE64 sparc I do not have a BE64 userspace
and my other BE64 system is still on order.

You can also set a flags to drop the space between number and prefix or use the ubuntu 0..1023 style
also you can request the limited range 0..999, 1.0 k-999 k style in either SI or IEC

This is  pure integer, I could open code the printf also as it can only have 4 digits maximum at the moment.
If you want I could make it autosizing rather than just one decimal between 0.1..9.9
Also if any of the symbols are defined to 0 the capability will drop out.
Perhaps I should make it default to IEC "Ki" style? getting it right vs bug compatibility.

I made a testing command e.g. toybox_human_readable_test to allow me to test it.

I hope this is interesting.

Jim McMechan
Rob Landley
2015-09-04 01:52:44 UTC
Permalink
On 08/28/2015 09:47 PM, James McMechan wrote:
>> Date: Mon, 24 Aug 2015 20:47:03 -0500
>> From: ***@landley.net
>> To: ***@google.com
>> CC: ***@lists.landley.net
>> Subject: Re: [Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.
>>
>> On 08/24/2015 03:10 PM, enh wrote:
>>> On Sun, Aug 23, 2015 at 6:20 PM, Rob Landley <***@landley.net> wrote:
>>>> /me wists for a specification. Oh well. I hate when I have to guess at
>>>> what the right behavior _is_...
>
> Well checking back with my copy of "Engineering Fundamentals and Problem Solving" A. Eide et al 1979 Ch 5
> Engineering units are 0.1 to 999 followed by a space, prefix and SI unit.
>
> I am of the opinion that gratious loss of precision should be avoided.
> Since a one chararacter prefix and decimal point take two character spaces the natural
> breakpoint would be 10000 e.g. 9998,9999,10 k for SI decimal notation.
> Using the IEC two character binary prefix Ki/Mi/Gi uses three spaces with the '.'
> This would however yeild a breakpoint at 100 000 or 10 000 if we use a thousands seperator.
> Which seems to me a bit large.

I already fixed it a different way (just took me a while to debug and
check it in), but I see you added a couple more options.

Are these options we actually need? (I.E. expand 1023 and the force use
of units immediately?) They probably wouldn't be hard to add, but do we
have anything that actually needs them yet? (Is this compatible with the
bsd version and thus something we could push the posix guys to
standardize circa 2030 or so? Ok, more like sometime in the late 2040's.
Ok, let's face it: I don't engage with the Posix committe much because
interacting with Jorg Schilling is not something I'm willing to do in a
hobbyist capacity.)

>>> yeah, i was actually trying to avoid ending up with all the heuristics
>>> the BSD implementation has.
>>>
>>> the BSD man page says:
>>>
>>> If the formatted number (including suffix) would be too long to fit into
>>> buf, then divide number by 1024 until it will.
>>
>> That's just "test against 999, divide by 1024". Easy enough.
>>
>>> The len argument must be at least 4 plus the length of suffix, in order
>>> to ensure a useful result is generated into buf.
>>
>> That constraint's already implicit. I should make sure it's explicit.
>>
>>> so it certainly seems they follow the "no more than three digits/two
>>> digits plus '.'" rule.
>>
>> I can work with this.
>>
>> Thanks,
>>
>> Rob
>
> Attached is a patch that should allow for 0..9999, 10 k..999 k, 1.0 M..999 M SI units
> 0..9999, 9.8 Ki..999 Ki, 1.0 Mi..999 Mi... IEC binary units, note the 9999 -> 9.8 Ki transition
> I have tested this with LE32 BE32 LE64 while I have BE64 sparc I do not have a BE64 userspace
> and my other BE64 system is still on order.

If this behaves differently on big or little endian, your compiler is at
fault. And long long should be 64 bit on 32 bit or 64 bit systems, due
to LP64. (There's no spec requiring long long _not_ be 128 bit, which is
a bit creepy, but nobody's actually done that yet that I'm aware of. I
should probably use uint64_t but the name is horrid and PRI_U64 stuff in
printf is just awkward, and it's a typedef not a real type the way
"int", "long", and "long long" are...)

> You can also set a flags to drop the space between number and prefix or use the ubuntu 0..1023 style
> also you can request the limited range 0..999, 1.0 k-999 k style in either SI or IEC

Yes, but why would we want to?

> This is pure integer, I could open code the printf also as it can only have 4 digits maximum at the moment.
> If you want I could make it autosizing rather than just one decimal between 0.1..9.9
> Also if any of the symbols are defined to 0 the capability will drop out.
> Perhaps I should make it default to IEC "Ki" style? getting it right vs bug compatibility.
>
> I made a testing command e.g. toybox_human_readable_test to allow me to test it.

I had toys/examples/test_human_readable.c which I thought I'd checked in
a couple weeks ago but apparently forgot to "git add".

(If you git add a file, git diff shows no differences, mercurial diff
shows it diffed against /dev/null. I'm STILL getting used to the weird
little behavioral divergences.)

> I hope this is interesting.

It's very interesting and I'm keeping it around in case it's needed. I'm
just trying to figure out if the extra flags are something any command
is actually going to use. (And that's an Elliott question more than a me
question, I never use -h and it's not in posix or LSB.)

Rob
enh
2015-09-04 03:57:20 UTC
Permalink
On Thu, Sep 3, 2015 at 6:52 PM, Rob Landley <***@landley.net> wrote:
> On 08/28/2015 09:47 PM, James McMechan wrote:
>>> Date: Mon, 24 Aug 2015 20:47:03 -0500
>>> From: ***@landley.net
>>> To: ***@google.com
>>> CC: ***@lists.landley.net
>>> Subject: Re: [Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.
>>>
>>> On 08/24/2015 03:10 PM, enh wrote:
>>>> On Sun, Aug 23, 2015 at 6:20 PM, Rob Landley <***@landley.net> wrote:
>>>>> /me wists for a specification. Oh well. I hate when I have to guess at
>>>>> what the right behavior _is_...
>>
>> Well checking back with my copy of "Engineering Fundamentals and Problem Solving" A. Eide et al 1979 Ch 5
>> Engineering units are 0.1 to 999 followed by a space, prefix and SI unit.
>>
>> I am of the opinion that gratious loss of precision should be avoided.
>> Since a one chararacter prefix and decimal point take two character spaces the natural
>> breakpoint would be 10000 e.g. 9998,9999,10 k for SI decimal notation.
>> Using the IEC two character binary prefix Ki/Mi/Gi uses three spaces with the '.'
>> This would however yeild a breakpoint at 100 000 or 10 000 if we use a thousands seperator.
>> Which seems to me a bit large.
>
> I already fixed it a different way (just took me a while to debug and
> check it in), but I see you added a couple more options.
>
> Are these options we actually need? (I.E. expand 1023 and the force use
> of units immediately?) They probably wouldn't be hard to add, but do we
> have anything that actually needs them yet? (Is this compatible with the
> bsd version and thus something we could push the posix guys to
> standardize circa 2030 or so? Ok, more like sometime in the late 2040's.
> Ok, let's face it: I don't engage with the Posix committe much because
> interacting with Jorg Schilling is not something I'm willing to do in a
> hobbyist capacity.)

BSD has (https://www.freebsd.org/cgi/man.cgi?query=humanize_number&sektion=3):

The following flags may be passed in scale:

HN_AUTOSCALE Format the buffer using the lowest multiplier pos-
sible.

HN_GETSCALE Return the prefix index number (the number of
times number must be divided to fit) instead of
formatting it to the buffer.

The following flags may be passed in flags:

HN_DECIMAL If the final result is less than 10, display it
using one decimal place.

HN_NOSPACE Do not put a space between number and the prefix.

HN_B Use `B' (bytes) as prefix if the original result
does not have a prefix.

HN_DIVISOR_1000 Divide number with 1000 instead of 1024.

HN_IEC_PREFIXES Use the IEE/IEC notion of prefixes (Ki, Mi,
Gi...). This flag has no effect when
HN_DIVISOR_1000 is also specified.

in the entire tree, there's only one use of HN_GETSCALE
(/usr/bin/procstat), and it doesn't look like that's actually
necessary).

HN_DECIMAL and HN_NOSPACE are used a lot: ls, df, du, and so on. HN_B
is used less, but in df, du, and vmstat. HN_DIVISOR_1000 is only
really used in df (it's also used once each in "edquota" and
"camcontrol").

HN_IEC_PREFIXES isn't used at all. not even a test.

so until we find a place where we want to turn off HN_DECIMAL, we're
good. (that's a harder thing to grep for, but i couldn't find an
instance in FreeBSD.)

>>>> yeah, i was actually trying to avoid ending up with all the heuristics
>>>> the BSD implementation has.
>>>>
>>>> the BSD man page says:
>>>>
>>>> If the formatted number (including suffix) would be too long to fit into
>>>> buf, then divide number by 1024 until it will.
>>>
>>> That's just "test against 999, divide by 1024". Easy enough.
>>>
>>>> The len argument must be at least 4 plus the length of suffix, in order
>>>> to ensure a useful result is generated into buf.
>>>
>>> That constraint's already implicit. I should make sure it's explicit.
>>>
>>>> so it certainly seems they follow the "no more than three digits/two
>>>> digits plus '.'" rule.
>>>
>>> I can work with this.
>>>
>>> Thanks,
>>>
>>> Rob
>>
>> Attached is a patch that should allow for 0..9999, 10 k..999 k, 1.0 M..999 M SI units
>> 0..9999, 9.8 Ki..999 Ki, 1.0 Mi..999 Mi... IEC binary units, note the 9999 -> 9.8 Ki transition
>> I have tested this with LE32 BE32 LE64 while I have BE64 sparc I do not have a BE64 userspace
>> and my other BE64 system is still on order.
>
> If this behaves differently on big or little endian, your compiler is at
> fault. And long long should be 64 bit on 32 bit or 64 bit systems, due
> to LP64. (There's no spec requiring long long _not_ be 128 bit, which is
> a bit creepy, but nobody's actually done that yet that I'm aware of. I
> should probably use uint64_t but the name is horrid and PRI_U64 stuff in
> printf is just awkward, and it's a typedef not a real type the way
> "int", "long", and "long long" are...)
>
>> You can also set a flags to drop the space between number and prefix or use the ubuntu 0..1023 style
>> also you can request the limited range 0..999, 1.0 k-999 k style in either SI or IEC
>
> Yes, but why would we want to?
>
>> This is pure integer, I could open code the printf also as it can only have 4 digits maximum at the moment.
>> If you want I could make it autosizing rather than just one decimal between 0.1..9.9
>> Also if any of the symbols are defined to 0 the capability will drop out.
>> Perhaps I should make it default to IEC "Ki" style? getting it right vs bug compatibility.
>>
>> I made a testing command e.g. toybox_human_readable_test to allow me to test it.
>
> I had toys/examples/test_human_readable.c which I thought I'd checked in
> a couple weeks ago but apparently forgot to "git add".
>
> (If you git add a file, git diff shows no differences, mercurial diff
> shows it diffed against /dev/null. I'm STILL getting used to the weird
> little behavioral divergences.)
>
>> I hope this is interesting.
>
> It's very interesting and I'm keeping it around in case it's needed. I'm
> just trying to figure out if the extra flags are something any command
> is actually going to use. (And that's an Elliott question more than a me
> question, I never use -h and it's not in posix or LSB.)
>
> Rob
> _______________________________________________
> Toybox mailing list
> ***@lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net



--
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.
James McMechan
2015-09-04 15:43:43 UTC
Permalink
> From: ***@google.com
> Date: Thu, 3 Sep 2015 20:57:20 -0700
> To: ***@landley.net
> CC: ***@hotmail.com; ***@lists.landley.net
> Subject: Re: [Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.
>
> On Thu, Sep 3, 2015 at 6:52 PM, Rob Landley <***@landley.net> wrote:
>> On 08/28/2015 09:47 PM, James McMechan wrote:
>>>> Date: Mon, 24 Aug 2015 20:47:03 -0500
>>>> From: ***@landley.net
>>>> To: ***@google.com
>>>> CC: ***@lists.landley.net
>>>> Subject: Re: [Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.
>>>>
>>>> On 08/24/2015 03:10 PM, enh wrote:
>>>>> On Sun, Aug 23, 2015 at 6:20 PM, Rob Landley <***@landley.net> wrote:
>>>>>> /me wists for a specification. Oh well. I hate when I have to guess at
>>>>>> what the right behavior _is_...
>>>
>>> Well checking back with my copy of "Engineering Fundamentals and Problem Solving" A. Eide et al 1979 Ch 5
>>> Engineering units are 0.1 to 999 followed by a space, prefix and SI unit.
>>>
>>> I am of the opinion that gratious loss of precision should be avoided.
>>> Since a one chararacter prefix and decimal point take two character spaces the natural
>>> breakpoint would be 10000 e.g. 9998,9999,10 k for SI decimal notation.
>>> Using the IEC two character binary prefix Ki/Mi/Gi uses three spaces with the '.'
>>> This would however yeild a breakpoint at 100 000 or 10 000 if we use a thousands seperator.
>>> Which seems to me a bit large.
>>
>> I already fixed it a different way (just took me a while to debug and
>> check it in), but I see you added a couple more options.
>>
>> Are these options we actually need? (I.E. expand 1023 and the force use
>> of units immediately?) They probably wouldn't be hard to add, but do we
>> have anything that actually needs them yet? (Is this compatible with the
>> bsd version and thus something we could push the posix guys to
>> standardize circa 2030 or so? Ok, more like sometime in the late 2040's.
>> Ok, let's face it: I don't engage with the Posix committe much because
>> interacting with Jorg Schilling is not something I'm willing to do in a
>> hobbyist capacity.)

Apparently the answer is yes, or at least BSD did, I have not run a BSD system in years.
The 1023 was because Rob had mentioned that that is what Ubuntu did :)
It is however a consistant choice, so I included it incase we needed Ubuntu's way for some reason.
Not a care in the world about dropping it.

> BSD has (https://www.freebsd.org/cgi/man.cgi?query=humanize_number&sektion=3):
>
> The following flags may be passed in scale:
>
> HN_AUTOSCALE Format the buffer using the lowest multiplier pos-
> sible.

If this does what I think it says you can end up with 1000000 KiB if the buffer is big enough?
also the scale factor can be a number to force a particular prefix.
int humanize_number(char *buf, size_t len, int64_t number, const char *suffix, int scale, int flags);

Interesting, they use a len to prevent buffer overflow, and it looks like they may display a signed number?
Also they pass in the suffix I had a comment about that but had guessed we could keep 'B'

> HN_GETSCALE Return the prefix index number (the number of
> times number must be divided to fit) instead of
> formatting it to the buffer.

This is where you get the number to pass in as scale. Not hard, can anyone see a use for it though?

> The following flags may be passed in flags:
>
> HN_DECIMAL If the final result is less than 10, display it
> using one decimal place.

I would expect that this is only for prefixed scales, but who knows certain groups might have done it to small integers.

> HN_NOSPACE Do not put a space between number and the prefix.

Yes this one I can see using e.g. for ls which is also a place where the 'B' might not be present
It would however be consistant to include the space and the B

> HN_B Use `B' (bytes) as prefix if the original result
> does not have a prefix.

Is it just me or do you find this weird also, if you have an explicit prefix setting why not use it...
If you don't want to use it why is it there in the first place?

> HN_DIVISOR_1000 Divide number with 1000 instead of 1024.

Yep, I think network speeds are measured in SI units for example
I could live with 1024 units everywhere esp. if we also used the IEC prefixes

> HN_IEC_PREFIXES Use the IEE/IEC notion of prefixes (Ki, Mi,
> Gi...). This flag has no effect when
> HN_DIVISOR_1000 is also specified.

Err yes, but it is not that it has no effect but that if you are using 1000s there should not be the 'i'
For my two cents I would suggest we go for IEC prefixes by default, yes they are so-so
but there is a standard and it does make things noticeably clearer, might as do it right instead
of the usual customary ComSci notation where it is Notoriously ambiguous

> in the entire tree, there's only one use of HN_GETSCALE
> (/usr/bin/procstat), and it doesn't look like that's actually
> necessary).
>
> HN_DECIMAL and HN_NOSPACE are used a lot: ls, df, du, and so on. HN_B

I did not have a HN_DECIMAL since I expect 0-9 to have a decimal point for a second
digit of precision, the range is to 999 anyway so it will not use more characters.

> is used less, but in df, du, and vmstat. HN_DIVISOR_1000 is only
> really used in df (it's also used once each in "edquota" and
> "camcontrol").

I would have no problem with df using units 1024 instead and displaying IEC Units

> HN_IEC_PREFIXES isn't used at all. not even a test.

Yeah, I have noticed for myself, following the standard and even making it the default
so that you know what everything is in would be good, alas somewhat incompatable
with custom, but are scripts using -h and then parsing it... something is likely that dumb.
But it would be nice to actually do the right thing.

> so until we find a place where we want to turn off HN_DECIMAL, we're
> good. (that's a harder thing to grep for, but i couldn't find an
> instance in FreeBSD.)

I would hope not, I would regard it as a useless loss of presision.
9.9 will fit in the same space as 999 just fine.

>>>>> yeah, i was actually trying to avoid ending up with all the heuristics
>>>>> the BSD implementation has.
>>>>>
>>>>> the BSD man page says:
>>>>>
>>>>> If the formatted number (including suffix) would be too long to fit into
>>>>> buf, then divide number by 1024 until it will.
>>>>
>>>> That's just "test against 999, divide by 1024". Easy enough.
>>>>
>>>>> The len argument must be at least 4 plus the length of suffix, in order
>>>>> to ensure a useful result is generated into buf.
>>>>
>>>> That constraint's already implicit. I should make sure it's explicit.
>>>>
>>>>> so it certainly seems they follow the "no more than three digits/two
>>>>> digits plus '.'" rule.

That is what I was going for also

>>>> I can work with this.
>>>>
>>>> Thanks,
>>>>
>>>> Rob
>>>
>>> Attached is a patch that should allow for 0..9999, 10 k..999 k, 1.0 M..999 M SI units
>>> 0..9999, 9.8 Ki..999 Ki, 1.0 Mi..999 Mi... IEC binary units, note the 9999 -> 9.8 Ki transition
>>> I have tested this with LE32 BE32 LE64 while I have BE64 sparc I do not have a BE64 userspace
>>> and my other BE64 system is still on order.
>>
>> If this behaves differently on big or little endian, your compiler is at
>> fault. And long long should be 64 bit on 32 bit or 64 bit systems, due
>> to LP64. (There's no spec requiring long long _not_ be 128 bit, which is
>> a bit creepy, but nobody's actually done that yet that I'm aware of. I
>> should probably use uint64_t but the name is horrid and PRI_U64 stuff in
>> printf is just awkward, and it's a typedef not a real type the way
>> "int", "long", and "long long" are...)

I have developed paranoia over BE/LE & 32/64 over the years, subtle assumptions about
size or byte ordering can creep in and break things. One I can remember was in the ext2 code
they had a bit map in LE order but accessed it using longs rather than bytes so it had to have
the byteswap even though the code using bytes was just as simple and completely agnostic
about wordsize and BE/LE.

I could argue that long should be 128 bit on 64 bit computers but LP64 was a hack to work
around poorly written software, long long /should/ be 256 bits :) not mearly 128 bit.

Yes, uint64_t is a bit of a mess, but if the compiler puts some other size in there I would
feel fully justified in bitching about it. int, long and long long are compiler dependent and can
be whatever they desire and are per-arch, so I try to use it where I want  a particular size.

For example int was the size to store pointers in, as it was the machine word per K & R explicited stated store pointer in int.
now it is long, or better yet void *.
I did find a couple of uint128_t references on my system.

>>> You can also set a flags to drop the space between number and prefix or use the ubuntu 0..1023 style
>>> also you can request the limited range 0..999, 1.0 k-999 k style in either SI or IEC
>>
>> Yes, but why would we want to?

Strict conformance to the standard? avoiding the 9999->9.8Ki transition.

>>> This is pure integer, I could open code the printf also as it can only have 4 digits maximum at the moment.
>>> If you want I could make it autosizing rather than just one decimal between 0.1..9.9
>>> Also if any of the symbols are defined to 0 the capability will drop out.
>>> Perhaps I should make it default to IEC "Ki" style? getting it right vs bug compatibility.
>>>
>>> I made a testing command e.g. toybox_human_readable_test to allow me to test it.
>>
>> I had toys/examples/test_human_readable.c which I thought I'd checked in
>> a couple weeks ago but apparently forgot to "git add".

I was thinking maybe it needs a better name, outputting info for humans would be nice
to be able to do from the shell, so it could be actually used in production.

>> (If you git add a file, git diff shows no differences, mercurial diff
>> shows it diffed against /dev/null. I'm STILL getting used to the weird
>> little behavioral divergences.)
>>
>>> I hope this is interesting.
>>
>> It's very interesting and I'm keeping it around in case it's needed. I'm
>> just trying to figure out if the extra flags are something any command
>> is actually going to use. (And that's an Elliott question more than a me
>> question, I never use -h and it's not in posix or LSB.)

Odd, it has been in common useage for years, but I guess it was just whatever
people felt a human would like to see rather than one of the standards.

>> Rob
> --
> Elliott Hughes - http://who/enh - http://jessies.org/~enh/
> Android native code/tools questions? Mail me/drop by/add me as a reviewer.
enh
2015-09-04 16:21:56 UTC
Permalink
On Fri, Sep 4, 2015 at 8:43 AM, James McMechan
<***@hotmail.com> wrote:
>> From: ***@google.com
>> Date: Thu, 3 Sep 2015 20:57:20 -0700
>> To: ***@landley.net
>> CC: ***@hotmail.com; ***@lists.landley.net
>> Subject: Re: [Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.
>>
>> On Thu, Sep 3, 2015 at 6:52 PM, Rob Landley <***@landley.net> wrote:
>>> On 08/28/2015 09:47 PM, James McMechan wrote:
>>>>> Date: Mon, 24 Aug 2015 20:47:03 -0500
>>>>> From: ***@landley.net
>>>>> To: ***@google.com
>>>>> CC: ***@lists.landley.net
>>>>> Subject: Re: [Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.
>>>>>
>>>>> On 08/24/2015 03:10 PM, enh wrote:
>>>>>> On Sun, Aug 23, 2015 at 6:20 PM, Rob Landley <***@landley.net> wrote:
>>>>>>> /me wists for a specification. Oh well. I hate when I have to guess at
>>>>>>> what the right behavior _is_...
>>>>
>>>> Well checking back with my copy of "Engineering Fundamentals and Problem Solving" A. Eide et al 1979 Ch 5
>>>> Engineering units are 0.1 to 999 followed by a space, prefix and SI unit.
>>>>
>>>> I am of the opinion that gratious loss of precision should be avoided.
>>>> Since a one chararacter prefix and decimal point take two character spaces the natural
>>>> breakpoint would be 10000 e.g. 9998,9999,10 k for SI decimal notation.
>>>> Using the IEC two character binary prefix Ki/Mi/Gi uses three spaces with the '.'
>>>> This would however yeild a breakpoint at 100 000 or 10 000 if we use a thousands seperator.
>>>> Which seems to me a bit large.
>>>
>>> I already fixed it a different way (just took me a while to debug and
>>> check it in), but I see you added a couple more options.
>>>
>>> Are these options we actually need? (I.E. expand 1023 and the force use
>>> of units immediately?) They probably wouldn't be hard to add, but do we
>>> have anything that actually needs them yet? (Is this compatible with the
>>> bsd version and thus something we could push the posix guys to
>>> standardize circa 2030 or so? Ok, more like sometime in the late 2040's.
>>> Ok, let's face it: I don't engage with the Posix committe much because
>>> interacting with Jorg Schilling is not something I'm willing to do in a
>>> hobbyist capacity.)
>
> Apparently the answer is yes, or at least BSD did, I have not run a BSD system in years.
> The 1023 was because Rob had mentioned that that is what Ubuntu did :)
> It is however a consistant choice, so I included it incase we needed Ubuntu's way for some reason.
> Not a care in the world about dropping it.
>
>> BSD has (https://www.freebsd.org/cgi/man.cgi?query=humanize_number&sektion=3):
>>
>> The following flags may be passed in scale:
>>
>> HN_AUTOSCALE Format the buffer using the lowest multiplier pos-
>> sible.
>
> If this does what I think it says you can end up with 1000000 KiB if the buffer is big enough?
> also the scale factor can be a number to force a particular prefix.
> int humanize_number(char *buf, size_t len, int64_t number, const char *suffix, int scale, int flags);
>
> Interesting, they use a len to prevent buffer overflow, and it looks like they may display a signed number?
> Also they pass in the suffix I had a comment about that but had guessed we could keep 'B'
>
>> HN_GETSCALE Return the prefix index number (the number of
>> times number must be divided to fit) instead of
>> formatting it to the buffer.
>
> This is where you get the number to pass in as scale. Not hard, can anyone see a use for it though?
>
>> The following flags may be passed in flags:
>>
>> HN_DECIMAL If the final result is less than 10, display it
>> using one decimal place.
>
> I would expect that this is only for prefixed scales, but who knows certain groups might have done it to small integers.
>
>> HN_NOSPACE Do not put a space between number and the prefix.
>
> Yes this one I can see using e.g. for ls which is also a place where the 'B' might not be present
> It would however be consistant to include the space and the B
>
>> HN_B Use `B' (bytes) as prefix if the original result
>> does not have a prefix.
>
> Is it just me or do you find this weird also, if you have an explicit prefix setting why not use it...
> If you don't want to use it why is it there in the first place?

no, i think you misunderstand HN_B versus suffix. they're not
interchangeable. HN_B says "if the number is so small that you don't
need an SI/IEC multiplier, do want me to output 'B' for you?". that
is: do you want "100KB" and "2B" or "100KB" and "2"?

(but, yes, i was answering the narrower interpretation of rob's
question, and only addressing the flags BSD has that toybox doesn't.
the 'suffix' parameter might turn out to be useful when we implement
more commands, but it makes sense to wait until we have an actual use
case.)

>> HN_DIVISOR_1000 Divide number with 1000 instead of 1024.
>
> Yep, I think network speeds are measured in SI units for example
> I could live with 1024 units everywhere esp. if we also used the IEC prefixes
>
>> HN_IEC_PREFIXES Use the IEE/IEC notion of prefixes (Ki, Mi,
>> Gi...). This flag has no effect when
>> HN_DIVISOR_1000 is also specified.
>
> Err yes, but it is not that it has no effect but that if you are using 1000s there should not be the 'i'
> For my two cents I would suggest we go for IEC prefixes by default, yes they are so-so
> but there is a standard and it does make things noticeably clearer, might as do it right instead
> of the usual customary ComSci notation where it is Notoriously ambiguous

the point here isn't to write great API for new code. from the use of
ISO date format in ls i assume rob would prefer to be clear and
consistent too, but we're here reimplementing existing idiocy. sadly
we can't go back in time and make ls/dd/du/df/et cetera sensible and
consistent.

>> in the entire tree, there's only one use of HN_GETSCALE
>> (/usr/bin/procstat), and it doesn't look like that's actually
>> necessary).
>>
>> HN_DECIMAL and HN_NOSPACE are used a lot: ls, df, du, and so on. HN_B
>
> I did not have a HN_DECIMAL since I expect 0-9 to have a decimal point for a second
> digit of precision, the range is to 999 anyway so it will not use more characters.
>
>> is used less, but in df, du, and vmstat. HN_DIVISOR_1000 is only
>> really used in df (it's also used once each in "edquota" and
>> "camcontrol").
>
> I would have no problem with df using units 1024 instead and displaying IEC Units
>
>> HN_IEC_PREFIXES isn't used at all. not even a test.
>
> Yeah, I have noticed for myself, following the standard and even making it the default
> so that you know what everything is in would be good, alas somewhat incompatable
> with custom, but are scripts using -h and then parsing it... something is likely that dumb.
> But it would be nice to actually do the right thing.
>
>> so until we find a place where we want to turn off HN_DECIMAL, we're
>> good. (that's a harder thing to grep for, but i couldn't find an
>> instance in FreeBSD.)
>
> I would hope not, I would regard it as a useless loss of presision.
> 9.9 will fit in the same space as 999 just fine.
>
>>>>>> yeah, i was actually trying to avoid ending up with all the heuristics
>>>>>> the BSD implementation has.
>>>>>>
>>>>>> the BSD man page says:
>>>>>>
>>>>>> If the formatted number (including suffix) would be too long to fit into
>>>>>> buf, then divide number by 1024 until it will.
>>>>>
>>>>> That's just "test against 999, divide by 1024". Easy enough.
>>>>>
>>>>>> The len argument must be at least 4 plus the length of suffix, in order
>>>>>> to ensure a useful result is generated into buf.
>>>>>
>>>>> That constraint's already implicit. I should make sure it's explicit.
>>>>>
>>>>>> so it certainly seems they follow the "no more than three digits/two
>>>>>> digits plus '.'" rule.
>
> That is what I was going for also
>
>>>>> I can work with this.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Rob
>>>>
>>>> Attached is a patch that should allow for 0..9999, 10 k..999 k, 1.0 M..999 M SI units
>>>> 0..9999, 9.8 Ki..999 Ki, 1.0 Mi..999 Mi... IEC binary units, note the 9999 -> 9.8 Ki transition
>>>> I have tested this with LE32 BE32 LE64 while I have BE64 sparc I do not have a BE64 userspace
>>>> and my other BE64 system is still on order.
>>>
>>> If this behaves differently on big or little endian, your compiler is at
>>> fault. And long long should be 64 bit on 32 bit or 64 bit systems, due
>>> to LP64. (There's no spec requiring long long _not_ be 128 bit, which is
>>> a bit creepy, but nobody's actually done that yet that I'm aware of. I
>>> should probably use uint64_t but the name is horrid and PRI_U64 stuff in
>>> printf is just awkward, and it's a typedef not a real type the way
>>> "int", "long", and "long long" are...)
>
> I have developed paranoia over BE/LE & 32/64 over the years, subtle assumptions about
> size or byte ordering can creep in and break things. One I can remember was in the ext2 code
> they had a bit map in LE order but accessed it using longs rather than bytes so it had to have
> the byteswap even though the code using bytes was just as simple and completely agnostic
> about wordsize and BE/LE.
>
> I could argue that long should be 128 bit on 64 bit computers but LP64 was a hack to work
> around poorly written software, long long /should/ be 256 bits :) not mearly 128 bit.
>
> Yes, uint64_t is a bit of a mess, but if the compiler puts some other size in there I would
> feel fully justified in bitching about it. int, long and long long are compiler dependent and can
> be whatever they desire and are per-arch, so I try to use it where I want a particular size.
>
> For example int was the size to store pointers in, as it was the machine word per K & R explicited stated store pointer in int.
> now it is long, or better yet void *.
> I did find a couple of uint128_t references on my system.
>
>>>> You can also set a flags to drop the space between number and prefix or use the ubuntu 0..1023 style
>>>> also you can request the limited range 0..999, 1.0 k-999 k style in either SI or IEC
>>>
>>> Yes, but why would we want to?
>
> Strict conformance to the standard? avoiding the 9999->9.8Ki transition.
>
>>>> This is pure integer, I could open code the printf also as it can only have 4 digits maximum at the moment.
>>>> If you want I could make it autosizing rather than just one decimal between 0.1..9.9
>>>> Also if any of the symbols are defined to 0 the capability will drop out.
>>>> Perhaps I should make it default to IEC "Ki" style? getting it right vs bug compatibility.
>>>>
>>>> I made a testing command e.g. toybox_human_readable_test to allow me to test it.
>>>
>>> I had toys/examples/test_human_readable.c which I thought I'd checked in
>>> a couple weeks ago but apparently forgot to "git add".
>
> I was thinking maybe it needs a better name, outputting info for humans would be nice
> to be able to do from the shell, so it could be actually used in production.
>
>>> (If you git add a file, git diff shows no differences, mercurial diff
>>> shows it diffed against /dev/null. I'm STILL getting used to the weird
>>> little behavioral divergences.)
>>>
>>>> I hope this is interesting.
>>>
>>> It's very interesting and I'm keeping it around in case it's needed. I'm
>>> just trying to figure out if the extra flags are something any command
>>> is actually going to use. (And that's an Elliott question more than a me
>>> question, I never use -h and it's not in posix or LSB.)
>
> Odd, it has been in common useage for years, but I guess it was just whatever
> people felt a human would like to see rather than one of the standards.
>
>>> Rob
>> --
>> Elliott Hughes - http://who/enh - http://jessies.org/~enh/
>> Android native code/tools questions? Mail me/drop by/add me as a reviewer.
>



--
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.
Rob Landley
2015-09-04 23:24:49 UTC
Permalink
On 09/04/2015 10:43 AM, James McMechan wrote:
>> From: ***@google.com
...
>> HN_B Use `B' (bytes) as prefix if the original result
>> does not have a prefix.
>
> Is it just me or do you find this weird also, if you have an explicit prefix setting why not use it...
> If you don't want to use it why is it there in the first place?

Why is the _caller_ not appending B when they printf() the result? The
space is before the units but the B isn't, and this is a string that
gets put into a buffer and then used by something else. Further editing
is kinda _normal_...

>> HN_DIVISOR_1000 Divide number with 1000 instead of 1024.
>
> Yep, I think network speeds are measured in SI units for example
> I could live with 1024 units everywhere esp. if we also used the IEC prefixes

I object to the word "kibibyte" on general principles, and disks are
also sold in decimal sizes (for historical marketing reasons).

(Of course "512 gigs" is mixing decimal and binary when you _do_ use
binary gigs, since the 512 is decimal and all. But let's be honest,
"kibibytes" is a stupid name, all else is details for me.)

>> HN_IEC_PREFIXES Use the IEE/IEC notion of prefixes (Ki, Mi,

Mebibytes. *shudder*

Huh, I thought the i was the second character in "binary", but this
implies it's "IEC"? Or possibly IEE? Or maybe the i from "mebi" which is
back to "binary" again...

>> Gi...). This flag has no effect when
>> HN_DIVISOR_1000 is also specified.
>
> Err yes, but it is not that it has no effect but that if you are using 1000s there should not be the 'i'

The B is already a separate flag from the 1024. If the caller wants to
append the unicode character for "clown nose" to the returned string,
that's not really human_readable()'s business.

> For my two cents I would suggest we go for IEC prefixes by default, yes they are so-so
> but there is a standard and it does make things noticeably clearer, might as do it right instead
> of the usual customary ComSci notation where it is Notoriously ambiguous

The function is called human_readable().

You want to default to binary units.

What exactly is our goal here again?

(Keeping the thundering hordes of android users happy. Right. Trying not
to get emotionally invested in an aesthetic decision which hasn't _got_
a right answer and just needs to be consistent. That said, if I can help
kill the term "mebibytes" it is worth MUCH EFFORT on my part...)

>> in the entire tree, there's only one use of HN_GETSCALE
>> (/usr/bin/procstat), and it doesn't look like that's actually
>> necessary).
>>
>> HN_DECIMAL and HN_NOSPACE are used a lot: ls, df, du, and so on. HN_B
>
> I did not have a HN_DECIMAL since I expect 0-9 to have a decimal point for a second
> digit of precision, the range is to 999 anyway so it will not use more characters.
>
>> is used less, but in df, du, and vmstat. HN_DIVISOR_1000 is only
>> really used in df (it's also used once each in "edquota" and
>> "camcontrol").
>
> I would have no problem with df using units 1024 instead and displaying IEC Units

Disks are sold in decimal measurements. People are going to ask why your
horribly inefficient file format is eating so much of their disk space.

(What, did they stop doing that with flash? I'd be surprised if they did...)

>> HN_IEC_PREFIXES isn't used at all. not even a test.
>
> Yeah, I have noticed for myself, following the standard and even making it the default
> so that you know what everything is in would be good, alas somewhat incompatable
> with custom, but are scripts using -h and then parsing it... something is likely that dumb.
> But it would be nice to actually do the right thing.

Nothing extending the usage of the word "gibibytes" is the right thing.

>> so until we find a place where we want to turn off HN_DECIMAL, we're
>> good. (that's a harder thing to grep for, but i couldn't find an
>> instance in FreeBSD.)
>
> I would hope not, I would regard it as a useless loss of presision.
> 9.9 will fit in the same space as 999 just fine.

human_readable() _IS_ a useless loss of precision. That's what it's _for_.

And the units advance by kilobytes so 9.9 and 999 are not rephrasings of
each other. 999k and 1.0M can be from a rounding perspective, but "loss
of precision" is the reason rounding _exists_...

>>> If this behaves differently on big or little endian, your compiler is at
>>> fault. And long long should be 64 bit on 32 bit or 64 bit systems, due
>>> to LP64. (There's no spec requiring long long _not_ be 128 bit, which is
>>> a bit creepy, but nobody's actually done that yet that I'm aware of. I
>>> should probably use uint64_t but the name is horrid and PRI_U64 stuff in
>>> printf is just awkward, and it's a typedef not a real type the way
>>> "int", "long", and "long long" are...)
>
> I have developed paranoia over BE/LE & 32/64 over the years, subtle assumptions about
> size or byte ordering can creep in and break things.

Oh sure. But I've been doing Aboriginal Linux in various forms since
1999 and started caring about cross compiling it in 2005, so I'm fairly
familiar with where the sharp edges are by now.

> One I can remember was in the ext2 code
> they had a bit map in LE order but accessed it using longs rather than bytes so it had to have
> the byteswap even though the code using bytes was just as simple and completely agnostic
> about wordsize and BE/LE.

Not my code. :)

(That said, my code's currently back on the todo heap because I have to
read about ext4. Although really if it can upconvert on the fly maybe I
should just genext2fs an ext2, stamp an ext3 journal on it, and let the
filesystem driver handle the rest...)

> I could argue that long should be 128 bit on 64 bit computers

Then there would be no 64 bit integer type.

char = 8 bit
short = 16 bit
int = 32 bit
long = 64 bit (on 64 bit)
long long = 64 bit on both 32 and 64 bit (de-facto).

The uint99_t stuff are typedefs that have to resolve to an underlying
integer type.

> but LP64 was a hack to work
> around poorly written software, long long /should/ be 256 bits :) not mearly 128 bit.

You know how people went to great lengths to avoid using uint64_t on 32
bit machines because it introduced libgcc_s.so calls and sucked in
_deeply_ crappy code to do FOIL multiplies and divides from high school
algebra?

You're saying "64 bit should have this problem too".

Bignum libraries exist. A 256 byte integer type doesn't let you do
crytptography or implement standards-compliant BC without using them.

(Heck, Posix and LSB are hacks to work around poorly written software.
Kinda both's reason d' et cetera.)

> Yes, uint64_t is a bit of a mess, but if the compiler puts some other size in there I would
> feel fully justified in bitching about it.

It would be a standards violation.

> int, long and long long are compiler dependent and can
> be whatever they desire and are per-arch,

LP64 says what int and long should be, and specifies at least a minimum
size for long long. Linux, BSD, and MacOS X depend on LP64. As does
toybox (in design.html I believe).

> so I try to use it where I want a particular size.

Good for you...?

> For example int was the size to store pointers in,
> as it was the machine word per K & R explicited stated store pointer
in int.
> now it is long, or better yet void *.

Ah, the days when char could be 18 bits because some machines were just
crazy and we hadn't weeded out the weak hardware designs yet.

That went away.

> I did find a couple of uint128_t references on my system.

gcc of course added a __int128 compiler extension which is two 64 bit
integers glued together just like 32 bit mode. How you printf() them is
left as an exercise to the reader apparently?

I'm not going there. I did a sizeof(long long) on every aboriginal linux
target to check what the size actually _was_, but as far as I know the
limited number of units here are the first thing that might actually
care about the size being larger. (Because it could overflow the string
buffer allocation since we're not passing in a length. 64 bit input
won't produce more than ~6 bytes of output depending on flags.)

>>>> You can also set a flags to drop the space between number and prefix or use the ubuntu 0..1023 style
>>>> also you can request the limited range 0..999, 1.0 k-999 k style in either SI or IEC
>>>
>>> Yes, but why would we want to?
>
> Strict conformance to the standard? avoiding the 9999->9.8Ki transition.

The first I heard of this standard was when you mentioned it. Ubuntu
clearly wasn't doing it.

>>>> This is pure integer, I could open code the printf also as it can only have 4 digits maximum at the moment.
>>>> If you want I could make it autosizing rather than just one decimal between 0.1..9.9
>>>> Also if any of the symbols are defined to 0 the capability will drop out.
>>>> Perhaps I should make it default to IEC "Ki" style? getting it right vs bug compatibility.
>>>>
>>>> I made a testing command e.g. toybox_human_readable_test to allow me to test it.
>>>
>>> I had toys/examples/test_human_readable.c which I thought I'd checked in
>>> a couple weeks ago but apparently forgot to "git add".
>
> I was thinking maybe it needs a better name, outputting info for humans would be nice
> to be able to do from the shell, so it could be actually used in production.

It defaults to "n" in defconfig. It's a testing command. That's why it
has "test" in the name and lives in the "examples" directory.

This is beyond infrastructure in search of a user, you're letting
infrastructure suggest a use case. "If all you have is a hammer,
everything looks like a nail." Nobody's _asked_ for this.

>>> (If you git add a file, git diff shows no differences, mercurial diff
>>> shows it diffed against /dev/null. I'm STILL getting used to the weird
>>> little behavioral divergences.)
>>>
>>>> I hope this is interesting.
>>>
>>> It's very interesting and I'm keeping it around in case it's needed. I'm
>>> just trying to figure out if the extra flags are something any command
>>> is actually going to use. (And that's an Elliott question more than a me
>>> question, I never use -h and it's not in posix or LSB.)
>
> Odd, it has been in common useage for years, but I guess it was just whatever
> people felt a human would like to see rather than one of the standards.

It's got a dozen flags because everybody who implemented this did it
differently because the machine readable scriptable version is just to
print out the actual NUMBER, thus the aesthetic cleanup is (or at least
should be) just that.

Bringing an international standards body into a purely aesthetic
decision is weird. ANSI vs ISO tea was a _joke_.

(Ok, maybe the aesthetic output has mutated into functional due to
screen scrapers, which is what Elliott was implying by scripts depending
on -h output. In which case either rigorously copying the historical
mistakes or breaking them really loudly is called for. Adding a
standards body to that sort of mess gives me a headache long before we
get into any sort of details.)

Rob
Samuel Holland
2015-09-05 06:04:26 UTC
Permalink
On 2015-09-04 18:24, Rob Landley wrote:
> Why is the _caller_ not appending B when they printf() the result? The
> space is before the units but the B isn't, and this is a string that
> gets put into a buffer and then used by something else. Further editing
> is kinda _normal_...

Because the caller would then have to worry about the M/MB/MiB problem.
The convention (at least in GNU and util-linux) is that M and MiB both
refer to 2^20 bytes, and MB refers to 10^6 bytes. If the caller appends
the B afterward, it might change the meaning of the number:
10Mi -> 10MiB is fine
10M -> 10MB is wrong

The purpose of the flag is to append B if the number is less than
1000/1024, so (among other reasons) you can have a fixed-with string of
output: 42G, 42M, 42K, 42B, even if there would not normally be a letter
there. In that case, at least, you definitely don't want to "just append
a B", because you only want the B in certain cases.

>>> HN_DIVISOR_1000 Divide number with 1000 instead of 1024.
>>
>> Yep, I think network speeds are measured in SI units for example
>> I could live with 1024 units everywhere esp. if we also used the IEC prefixes
>
> I object to the word "kibibyte" on general principles, and disks are
> also sold in decimal sizes (for historical marketing reasons).

But RAM is sold in binary sizes. "16 gigs" of RAM is 16384MiB, not
16000MB. (Think `free -h`.) And on a more fundamental level, it will
always be measured in binary sizes: pages are 4096 bytes, not 4000.

And so is flash, manufactured in binary. Even though you can buy a
"500GB" SSD, it's really 512GiB on the inside, with the additional space
used as spare flash pages.

> (Of course "512 gigs" is mixing decimal and binary when you _do_ use
> binary gigs, since the 512 is decimal and all. But let's be honest,
> "kibibytes" is a stupid name, all else is details for me.)
>
>>> HN_IEC_PREFIXES Use the IEE/IEC notion of prefixes (Ki, Mi,
>
> Mebibytes. *shudder*
>
> Huh, I thought the i was the second character in "binary", but this
> implies it's "IEC"? Or possibly IEE? Or maybe the i from "mebi" which is
> back to "binary" again...

Mi -> Mebi -> million binary -> 2^20

>>> Gi...). This flag has no effect when
>>> HN_DIVISOR_1000 is also specified.
>>
>> Err yes, but it is not that it has no effect but that if you are using 1000s there should not be the 'i'
>
> The B is already a separate flag from the 1024. If the caller wants to
> append the unicode character for "clown nose" to the returned string,
> that's not really human_readable()'s business.

See above. You have to have the "i" if you want to append the "B". But
you can't just append both if you want the "B" in the case of <1000,
because then you'll have 1KiB = 1024BiB, or 1KB = 1000BB, and there's no
such thing as a BiB.

>> For my two cents I would suggest we go for IEC prefixes by default, yes they are so-so
>> but there is a standard and it does make things noticeably clearer, might as do it right instead
>> of the usual customary ComSci notation where it is Notoriously ambiguous
>
> The function is called human_readable().
>
> You want to default to binary units.
>
> What exactly is our goal here again?

Using binary powers is quite important for some human-readable cases.
Take, for example, SSDs. For performance and longevity, you have to
align data access to flash erase block sizes, which get up to 128KiB or
256KiB. It's important then to align partitions on MiB (not MB)
boundaries. cfdisk and Debian's partitioner get this horribly wrong.
(Especially because you specify MB when creating partitions that it will
then show you in MiB sizes).

> (Keeping the thundering hordes of android users happy. Right. Trying not
> to get emotionally invested in an aesthetic decision which hasn't _got_
> a right answer and just needs to be consistent. That said, if I can help
> kill the term "mebibytes" it is worth MUCH EFFORT on my part...)
>
>>> in the entire tree, there's only one use of HN_GETSCALE
>>> (/usr/bin/procstat), and it doesn't look like that's actually
>>> necessary).
>>>
>>> HN_DECIMAL and HN_NOSPACE are used a lot: ls, df, du, and so on. HN_B
>>
>> I did not have a HN_DECIMAL since I expect 0-9 to have a decimal point for a second
>> digit of precision, the range is to 999 anyway so it will not use more characters.
>>
>>> is used less, but in df, du, and vmstat. HN_DIVISOR_1000 is only
>>> really used in df (it's also used once each in "edquota" and
>>> "camcontrol").
>>
>> I would have no problem with df using units 1024 instead and displaying IEC Units
>
> Disks are sold in decimal measurements. People are going to ask why your
> horribly inefficient file format is eating so much of their disk space.

Even Windows shows disk free space in binary units.

> (What, did they stop doing that with flash? I'd be surprised if they did...)

No, SSDs are still sold in decimal sizes. But you have to _use_ them in
binary sizes.

>>> HN_IEC_PREFIXES isn't used at all. not even a test.
>>
>> Yeah, I have noticed for myself, following the standard and even making it the default
>> so that you know what everything is in would be good, alas somewhat incompatable
>> with custom, but are scripts using -h and then parsing it... something is likely that dumb.
>> But it would be nice to actually do the right thing.
>
> Nothing extending the usage of the word "gibibytes" is the right thing.

Then just do like util-linux and use "G" instead of "GiB"

>>> so until we find a place where we want to turn off HN_DECIMAL, we're
>>> good. (that's a harder thing to grep for, but i couldn't find an
>>> instance in FreeBSD.)
>>
>> I would hope not, I would regard it as a useless loss of presision.
>> 9.9 will fit in the same space as 999 just fine.
>
> human_readable() _IS_ a useless loss of precision. That's what it's _for_.
>
> And the units advance by kilobytes so 9.9 and 999 are not rephrasings of
> each other. 999k and 1.0M can be from a rounding perspective, but "loss
> of precision" is the reason rounding _exists_...
>

>>>>> You can also set a flags to drop the space between number and prefix or use the ubuntu 0..1023 style
>>>>> also you can request the limited range 0..999, 1.0 k-999 k style in either SI or IEC
>>>>
>>>> Yes, but why would we want to?
>>
>> Strict conformance to the standard? avoiding the 9999->9.8Ki transition.
>
> The first I heard of this standard was when you mentioned it. Ubuntu
> clearly wasn't doing it.
>

>>>> (If you git add a file, git diff shows no differences, mercurial diff
>>>> shows it diffed against /dev/null. I'm STILL getting used to the weird
>>>> little behavioral divergences.)

git diff --cached

That will show your staged changes (including added/removed file diffs).

>>>>> I hope this is interesting.
>>>>
>>>> It's very interesting and I'm keeping it around in case it's needed. I'm
>>>> just trying to figure out if the extra flags are something any command
>>>> is actually going to use. (And that's an Elliott question more than a me
>>>> question, I never use -h and it's not in posix or LSB.)
>>
>> Odd, it has been in common useage for years, but I guess it was just whatever
>> people felt a human would like to see rather than one of the standards.
>
> It's got a dozen flags because everybody who implemented this did it
> differently because the machine readable scriptable version is just to
> print out the actual NUMBER, thus the aesthetic cleanup is (or at least
> should be) just that.

And because different quantities are measured with different units.
Network speeds use decimal; memory sizes use binary; and disk sizes use
both.

> Bringing an international standards body into a purely aesthetic
> decision is weird. ANSI vs ISO tea was a _joke_.
>
> (Ok, maybe the aesthetic output has mutated into functional due to
> screen scrapers, which is what Elliott was implying by scripts depending
> on -h output. In which case either rigorously copying the historical
> mistakes or breaking them really loudly is called for. Adding a
> standards body to that sort of mess gives me a headache long before we
> get into any sort of details.)
>
> Rob
--
Regards,
Samuel Holland <***@sholland.net>
James McMechan
2015-09-05 17:10:15 UTC
Permalink
> From: ***@landley.net
> Date: Fri, 4 Sep 2015 18:24:49 -0500
>
> On 09/04/2015 10:43 AM, James McMechan wrote:
>>> From: ***@google.com
> ...
>>> HN_B Use `B' (bytes) as prefix if the original result
>>> does not have a prefix.
>>
>> Is it just me or do you find this weird also, if you have an explicit prefix setting why not use it...
>> If you don't want to use it why is it there in the first place?
>
> Why is the _caller_ not appending B when they printf() the result? The
> space is before the units but the B isn't, and this is a string that
> gets put into a buffer and then used by something else. Further editing
> is kinda _normal_...
>
>>> HN_DIVISOR_1000 Divide number with 1000 instead of 1024.
>>
>> Yep, I think network speeds are measured in SI units for example
>> I could live with 1024 units everywhere esp. if we also used the IEC prefixes
>
> I object to the word "kibibyte" on general principles, and disks are
> also sold in decimal sizes (for historical marketing reasons).
>
> (Of course "512 gigs" is mixing decimal and binary when you _do_ use
> binary gigs, since the 512 is decimal and all. But let's be honest,
> "kibibytes" is a stupid name, all else is details for me.)

Over abbreviation will do that I still think the long form kilobinarybytes

>>> HN_IEC_PREFIXES Use the IEE/IEC notion of prefixes (Ki, Mi,
>
> Mebibytes. *shudder*
>
> Huh, I thought the i was the second character in "binary", but this
> implies it's "IEC"? Or possibly IEE? Or maybe the i from "mebi" which is
> back to "binary" again...

The IEC just first standardized it, then IEEE, ISO, NIST joined in.
Think as Megab[i]nary[B]ytes it is less shudder worthy
http://physics.nist.gov/cuu/Units/binary.html

>>> Gi...). This flag has no effect when
>>> HN_DIVISOR_1000 is also specified.
>>
>> Err yes, but it is not that it has no effect but that if you are using 1000s there should not be the 'i'
>
> The B is already a separate flag from the 1024. If the caller wants to
> append the unicode character for "clown nose" to the returned string,
> that's not really human_readable()'s business.
>
>> For my two cents I would suggest we go for IEC prefixes by default, yes they are so-so
>> but there is a standard and it does make things noticeably clearer, might as do it right instead
>> of the usual customary ComSci notation where it is Notoriously ambiguous
>
> The function is called human_readable().
>
> You want to default to binary units.

Computers are binary machines ;)
but hey, you are in charge you have the final call
That is why it had a two cent bid ;)

> What exactly is our goal here again?
>
> (Keeping the thundering hordes of android users happy. Right. Trying not
> to get emotionally invested in an aesthetic decision which hasn't _got_
> a right answer and just needs to be consistent. That said, if I can help
> kill the term "mebibytes" it is worth MUCH EFFORT on my part...)
>
>>> in the entire tree, there's only one use of HN_GETSCALE
>>> (/usr/bin/procstat), and it doesn't look like that's actually
>>> necessary).
>>>
>>> HN_DECIMAL and HN_NOSPACE are used a lot: ls, df, du, and so on. HN_B
>>
>> I did not have a HN_DECIMAL since I expect 0-9 to have a decimal point for a second
>> digit of precision, the range is to 999 anyway so it will not use more characters.
>>
>>> is used less, but in df, du, and vmstat. HN_DIVISOR_1000 is only
>>> really used in df (it's also used once each in "edquota" and
>>> "camcontrol").
>>
>> I would have no problem with df using units 1024 instead and displaying IEC Units
>
> Disks are sold in decimal measurements. People are going to ask why your
> horribly inefficient file format is eating so much of their disk space.

Well Linux, Windows, and Mac OS/X (before 10.6) all display disk size using 1024 units

> (What, did they stop doing that with flash? I'd be surprised if they did...)

Well yes, sort of... 256 Gi -> 240 G flash + hidden wear leveling spare sectors
I recall almost every flash drive I noticed is using a hard power of two rounded down some
Likely because 1/2/4 chips is much easier and 3 or 5-7 chips mostly does not give enough gain
to make it marketable for the cost of the work.

>>> HN_IEC_PREFIXES isn't used at all. not even a test.
>>
>> Yeah, I have noticed for myself, following the standard and even making it the default
>> so that you know what everything is in would be good, alas somewhat incompatable
>> with custom, but are scripts using -h and then parsing it... something is likely that dumb.
>> But it would be nice to actually do the right thing.
>
> Nothing extending the usage of the word "gibibytes" is the right thing.
>
>>> so until we find a place where we want to turn off HN_DECIMAL, we're
>>> good. (that's a harder thing to grep for, but i couldn't find an
>>> instance in FreeBSD.)
>>
>> I would hope not, I would regard it as a useless loss of presision.
>> 9.9 will fit in the same space as 999 just fine.
>
> human_readable() _IS_ a useless loss of precision. That's what it's _for_.

I will argue for both useful, and an increase in density.
This provides the maximum data into a compact form.
Scaled to units easier to think about.

> And the units advance by kilobytes so 9.9 and 999 are not rephrasings of
> each other. 999k and 1.0M can be from a rounding perspective, but "loss
> of precision" is the reason rounding _exists_...

I must have not been clear, density is increased with prefixes by consuming
less space for display 1.0 to 9.9 each use 3 characters, just like 100 to 999
each use 3 characters, so from the absolute count they are both 3 characters.
Yeilding 2 characters only from 10-99 and 3 characters in all other prefixed cases?

>>>> If this behaves differently on big or little endian, your compiler is at
>>>> fault. And long long should be 64 bit on 32 bit or 64 bit systems, due
>>>> to LP64. (There's no spec requiring long long _not_ be 128 bit, which is
>>>> a bit creepy, but nobody's actually done that yet that I'm aware of. I
>>>> should probably use uint64_t but the name is horrid and PRI_U64 stuff in
>>>> printf is just awkward, and it's a typedef not a real type the way
>>>> "int", "long", and "long long" are...)
>>
>> I have developed paranoia over BE/LE & 32/64 over the years, subtle assumptions about
>> size or byte ordering can creep in and break things.
>
> Oh sure. But I've been doing Aboriginal Linux in various forms since
> 1999 and started caring about cross compiling it in 2005, so I'm fairly
> familiar with where the sharp edges are by now.

I expect you are, I picked up the same painful lessons.
One of the first group of systems I worked on ~1986-~2010 used a BE user interface
with shared memory to 1..4 LE IO processor boards, everything seemed designed
to maximize the number of places to have BE/LE troubles.

>> One I can remember was in the ext2 code
>> they had a bit map in LE order but accessed it using longs rather than bytes so it had to have
>> the byteswap even though the code using bytes was just as simple and completely agnostic
>> about wordsize and BE/LE.
>
> Not my code. :)

Ok, Not your code. You however did write a bunch on one of my favorites the initramfs.

> (That said, my code's currently back on the todo heap because I have to
> read about ext4. Although really if it can upconvert on the fly maybe I
> should just genext2fs an ext2, stamp an ext3 journal on it, and let the
> filesystem driver handle the rest...)

In some respects the ext4 stuff could degenerate to a simple case.
Defered initializaion of the unused structures and extents instead of
index blocks.
If you don't need efficiency at the outset or optimal disk layout initially,
I believe you can cheat outrageously and then let the filesystem
worry about the layout of all the other data you did not initialize.
If every file is layed out as a contiguous extent of blocks and
with the inodes, super block and bitmap with only minimal initialzation
and either no/empty journal (hey it not like you need to do a replay
when you have just built it from scratch).
The file system would just see all the partly initialized data in a big
clump at the start of the disk and start background allocation of the rest
meanwhile everything can be together, the parts created before use
may not be optimal but could be very very simple. In use the filesystem
may prefer to allocate in other locations with a better layout for new files.

>> I could argue that long should be 128 bit on 64 bit computers
>
> Then there would be no 64 bit integer type.
>
> char = 8 bit
> short = 16 bit
> int = 32 bit
> long = 64 bit (on 64 bit)
> long long = 64 bit on both 32 and 64 bit (de-facto).

Sure you could, if would just be a different mess ;)
8 bit char/int8_t
16 bit long char/short short int/int16_t
32 bit long long char/short int/int32_t
64 bit int/int64_t
128 bit long int/int128_t
256 bit long long int/int256_t

The long appears to act as multiplicative modifiers to the
base types char/int/float, odd it now only seems to work on int and double.
gcc will also produce "long long long int" is too long for GCC.

Short used to be the inverse of long, so I still expect it to work the same way.

Short short does not work at the moment but I seem to remember long
used to be only once also where long long int would be the same as long
int or maybe a syntax error.

I think long char used to be how you would get the 16bit wide char type.
short char used to be a nop, both appear to nolonger work :(

and I have just found out that gcc-4.8.5 now errors on long float and short double
both of which I have used in the past.

Bah it appears I am an old fogie and still expect the syntax to be regular
in the old way based on learning K&R style and the early squirrely compilers.

I will have to get a cane to wave at these young compiler whippersnappers.

> The uint99_t stuff are typedefs that have to resolve to an underlying
> integer type.

Err, I really don't care how it is implemented all that much so long as they get
it right, for all I care char/short int/int/long int/long long int could all be typedefs
to the base int8_t/int16_t/int32_t/int64_t types.
Iff the compiler makes them work correctly.

>> but LP64 was a hack to work
>> around poorly written software, long long /should/ be 256 bits :) not mearly 128 bit.
>
> You know how people went to great lengths to avoid using uint64_t on 32
> bit machines because it introduced libgcc_s.so calls and sucked in
> _deeply_ crappy code to do FOIL multiplies and divides from high school
> algebra?
>
> You're saying "64 bit should have this problem too".

I would argue that that was gcc having a bug/being stupid not that it is the fault
of the structure of the language, deeply crappy code in gcc is not a problem with
the C language, did they ever get around to fixing gcc?

> Bignum libraries exist. A 256 byte integer type doesn't let you do
> crytptography or implement standards-compliant BC without using them.
>
> (Heck, Posix and LSB are hacks to work around poorly written software.
> Kinda both's reason d' et cetera.)
>
>> Yes, uint64_t is a bit of a mess, but if the compiler puts some other size in there I would
>> feel fully justified in bitching about it.
>
> It would be a standards violation.
>
>> int, long and long long are compiler dependent and can
>> be whatever they desire and are per-arch,
>
> LP64 says what int and long should be, and specifies at least a minimum
> size for long long. Linux, BSD, and MacOS X depend on LP64. As does
> toybox (in design.html I believe).
>
>> so I try to use it where I want a particular size.
>
> Good for you...?
>
>> For example int was the size to store pointers in,
>> as it was the machine word per K & R explicited stated store pointer
> in int.
>> now it is long, or better yet void *.
>
> Ah, the days when char could be 18 bits because some machines were just
> crazy and we hadn't weeded out the weak hardware designs yet.
>
> That went away.

Also I would like it to fail loudly at compile time on that 18 bit machine not helpfully convert it
to a 18 bit word or something there are other types like uint_least16_t defined for things like that.
Though I have not used them.

>> I did find a couple of uint128_t references on my system.
>
> gcc of course added a __int128 compiler extension which is two 64 bit
> integers glued together just like 32 bit mode. How you printf() them is
> left as an exercise to the reader apparently?
>
> I'm not going there. I did a sizeof(long long) on every aboriginal linux
> target to check what the size actually _was_, but as far as I know the
> limited number of units here are the first thing that might actually
> care about the size being larger. (Because it could overflow the string
> buffer allocation since we're not passing in a length. 64 bit input
> won't produce more than ~6 bytes of output depending on flags.)
>
>>>>> You can also set a flags to drop the space between number and prefix or use the ubuntu 0..1023 style
>>>>> also you can request the limited range 0..999, 1.0 k-999 k style in either SI or IEC
>>>>
>>>> Yes, but why would we want to?
>>
>> Strict conformance to the standard? avoiding the 9999->9.8Ki transition.
>
> The first I heard of this standard was when you mentioned it. Ubuntu
> clearly wasn't doing it.

I should have said style not standard, but now it does have a standard.
Apparently as of Ubuntu 10.10 they try to use 1024 units, but standards compliance was
not the Ubuntu focus.
And now just to help consistancy as of Mac OS/X 10.6 Snow Lepoard Apple has gone
to units of 1000 for disk sizes

>>>>> This is pure integer, I could open code the printf also as it can only have 4 digits maximum at the moment.
>>>>> If you want I could make it autosizing rather than just one decimal between 0.1..9.9
>>>>> Also if any of the symbols are defined to 0 the capability will drop out.
>>>>> Perhaps I should make it default to IEC "Ki" style? getting it right vs bug compatibility.
>>>>>
>>>>> I made a testing command e.g. toybox_human_readable_test to allow me to test it.
>>>>
>>>> I had toys/examples/test_human_readable.c which I thought I'd checked in
>>>> a couple weeks ago but apparently forgot to "git add".
>>
>> I was thinking maybe it needs a better name, outputting info for humans would be nice
>> to be able to do from the shell, so it could be actually used in production.
>
> It defaults to "n" in defconfig. It's a testing command. That's why it
> has "test" in the name and lives in the "examples" directory.
>
> This is beyond infrastructure in search of a user, you're letting
> infrastructure suggest a use case. "If all you have is a hammer,
> everything looks like a nail." Nobody's _asked_ for this.
>
>>>> (If you git add a file, git diff shows no differences, mercurial diff
>>>> shows it diffed against /dev/null. I'm STILL getting used to the weird
>>>> little behavioral divergences.)
>>>>
>>>>> I hope this is interesting.
>>>>
>>>> It's very interesting and I'm keeping it around in case it's needed. I'm
>>>> just trying to figure out if the extra flags are something any command
>>>> is actually going to use. (And that's an Elliott question more than a me
>>>> question, I never use -h and it's not in posix or LSB.)
>>
>> Odd, it has been in common useage for years, but I guess it was just whatever
>> people felt a human would like to see rather than one of the standards.
>
> It's got a dozen flags because everybody who implemented this did it
> differently because the machine readable scriptable version is just to
> print out the actual NUMBER, thus the aesthetic cleanup is (or at least
> should be) just that.
>
> Bringing an international standards body into a purely aesthetic
> decision is weird. ANSI vs ISO tea was a _joke_.
>
> (Ok, maybe the aesthetic output has mutated into functional due to
> screen scrapers, which is what Elliott was implying by scripts depending
> on -h output. In which case either rigorously copying the historical
> mistakes or breaking them really loudly is called for. Adding a
> standards body to that sort of mess gives me a headache long before we
> get into any sort of details.)
>
> Rob

Hey, ~35 years ago my first engineering course spent quite a bit of time on
stuff like this, and no this is not for screen scrapers but rather to maximize the
functional usefullness to the human by fiting into our biases and decluttering
the dispaly of values so the useful part can be understood.
Engeering Notation was designed to make it easier for humans to understand
the values. https://en.wikipedia.org/wiki/Engineering_notation and binary prefixes
because computers are binary https://en.wikipedia.org/wiki/Binary_prefix

Jim
apparently a software curmudgeon ;)
Loading...