> From: ***@landley.net
> Date: Fri, 4 Sep 2015 18:24:49 -0500
> On 09/04/2015 10:43 AM, James McMechan wrote:
>>> From: ***@google.com
>>> HN_B Use `B' (bytes) as prefix if the original result
>>> does not have a prefix.
>> Is it just me or do you find this weird also, if you have an explicit prefix setting why not use it...
>> If you don't want to use it why is it there in the first place?
> Why is the _caller_ not appending B when they printf() the result? The
> space is before the units but the B isn't, and this is a string that
> gets put into a buffer and then used by something else. Further editing
> is kinda _normal_...
>>> HN_DIVISOR_1000 Divide number with 1000 instead of 1024.
>> Yep, I think network speeds are measured in SI units for example
>> I could live with 1024 units everywhere esp. if we also used the IEC prefixes
> I object to the word "kibibyte" on general principles, and disks are
> also sold in decimal sizes (for historical marketing reasons).
> (Of course "512 gigs" is mixing decimal and binary when you _do_ use
> binary gigs, since the 512 is decimal and all. But let's be honest,
> "kibibytes" is a stupid name, all else is details for me.)
Over abbreviation will do that I still think the long form kilobinarybytes
>>> HN_IEC_PREFIXES Use the IEE/IEC notion of prefixes (Ki, Mi,
> Mebibytes. *shudder*
> Huh, I thought the i was the second character in "binary", but this
> implies it's "IEC"? Or possibly IEE? Or maybe the i from "mebi" which is
> back to "binary" again...
The IEC just first standardized it, then IEEE, ISO, NIST joined in.
Think as Megab[i]nary[B]ytes it is less shudder worthy
>>> Gi...). This flag has no effect when
>>> HN_DIVISOR_1000 is also specified.
>> Err yes, but it is not that it has no effect but that if you are using 1000s there should not be the 'i'
> The B is already a separate flag from the 1024. If the caller wants to
> append the unicode character for "clown nose" to the returned string,
> that's not really human_readable()'s business.
>> For my two cents I would suggest we go for IEC prefixes by default, yes they are so-so
>> but there is a standard and it does make things noticeably clearer, might as do it right instead
>> of the usual customary ComSci notation where it is Notoriously ambiguous
> The function is called human_readable().
> You want to default to binary units.
Computers are binary machines ;)
but hey, you are in charge you have the final call
That is why it had a two cent bid ;)
> What exactly is our goal here again?
> (Keeping the thundering hordes of android users happy. Right. Trying not
> to get emotionally invested in an aesthetic decision which hasn't _got_
> a right answer and just needs to be consistent. That said, if I can help
> kill the term "mebibytes" it is worth MUCH EFFORT on my part...)
>>> in the entire tree, there's only one use of HN_GETSCALE
>>> (/usr/bin/procstat), and it doesn't look like that's actually
>>> HN_DECIMAL and HN_NOSPACE are used a lot: ls, df, du, and so on. HN_B
>> I did not have a HN_DECIMAL since I expect 0-9 to have a decimal point for a second
>> digit of precision, the range is to 999 anyway so it will not use more characters.
>>> is used less, but in df, du, and vmstat. HN_DIVISOR_1000 is only
>>> really used in df (it's also used once each in "edquota" and
>> I would have no problem with df using units 1024 instead and displaying IEC Units
> Disks are sold in decimal measurements. People are going to ask why your
> horribly inefficient file format is eating so much of their disk space.
Well Linux, Windows, and Mac OS/X (before 10.6) all display disk size using 1024 units
> (What, did they stop doing that with flash? I'd be surprised if they did...)
Well yes, sort of... 256 Gi -> 240 G flash + hidden wear leveling spare sectors
I recall almost every flash drive I noticed is using a hard power of two rounded down some
Likely because 1/2/4 chips is much easier and 3 or 5-7 chips mostly does not give enough gain
to make it marketable for the cost of the work.
>>> HN_IEC_PREFIXES isn't used at all. not even a test.
>> Yeah, I have noticed for myself, following the standard and even making it the default
>> so that you know what everything is in would be good, alas somewhat incompatable
>> with custom, but are scripts using -h and then parsing it... something is likely that dumb.
>> But it would be nice to actually do the right thing.
> Nothing extending the usage of the word "gibibytes" is the right thing.
>>> so until we find a place where we want to turn off HN_DECIMAL, we're
>>> good. (that's a harder thing to grep for, but i couldn't find an
>>> instance in FreeBSD.)
>> I would hope not, I would regard it as a useless loss of presision.
>> 9.9 will fit in the same space as 999 just fine.
> human_readable() _IS_ a useless loss of precision. That's what it's _for_.
I will argue for both useful, and an increase in density.
This provides the maximum data into a compact form.
Scaled to units easier to think about.
> And the units advance by kilobytes so 9.9 and 999 are not rephrasings of
> each other. 999k and 1.0M can be from a rounding perspective, but "loss
> of precision" is the reason rounding _exists_...
I must have not been clear, density is increased with prefixes by consuming
less space for display 1.0 to 9.9 each use 3 characters, just like 100 to 999
each use 3 characters, so from the absolute count they are both 3 characters.
Yeilding 2 characters only from 10-99 and 3 characters in all other prefixed cases?
>>>> If this behaves differently on big or little endian, your compiler is at
>>>> fault. And long long should be 64 bit on 32 bit or 64 bit systems, due
>>>> to LP64. (There's no spec requiring long long _not_ be 128 bit, which is
>>>> a bit creepy, but nobody's actually done that yet that I'm aware of. I
>>>> should probably use uint64_t but the name is horrid and PRI_U64 stuff in
>>>> printf is just awkward, and it's a typedef not a real type the way
>>>> "int", "long", and "long long" are...)
>> I have developed paranoia over BE/LE & 32/64 over the years, subtle assumptions about
>> size or byte ordering can creep in and break things.
> Oh sure. But I've been doing Aboriginal Linux in various forms since
> 1999 and started caring about cross compiling it in 2005, so I'm fairly
> familiar with where the sharp edges are by now.
I expect you are, I picked up the same painful lessons.
One of the first group of systems I worked on ~1986-~2010 used a BE user interface
with shared memory to 1..4 LE IO processor boards, everything seemed designed
to maximize the number of places to have BE/LE troubles.
>> One I can remember was in the ext2 code
>> they had a bit map in LE order but accessed it using longs rather than bytes so it had to have
>> the byteswap even though the code using bytes was just as simple and completely agnostic
>> about wordsize and BE/LE.
> Not my code. :)
Ok, Not your code. You however did write a bunch on one of my favorites the initramfs.
> (That said, my code's currently back on the todo heap because I have to
> read about ext4. Although really if it can upconvert on the fly maybe I
> should just genext2fs an ext2, stamp an ext3 journal on it, and let the
> filesystem driver handle the rest...)
In some respects the ext4 stuff could degenerate to a simple case.
Defered initializaion of the unused structures and extents instead of
If you don't need efficiency at the outset or optimal disk layout initially,
I believe you can cheat outrageously and then let the filesystem
worry about the layout of all the other data you did not initialize.
If every file is layed out as a contiguous extent of blocks and
with the inodes, super block and bitmap with only minimal initialzation
and either no/empty journal (hey it not like you need to do a replay
when you have just built it from scratch).
The file system would just see all the partly initialized data in a big
clump at the start of the disk and start background allocation of the rest
meanwhile everything can be together, the parts created before use
may not be optimal but could be very very simple. In use the filesystem
may prefer to allocate in other locations with a better layout for new files.
>> I could argue that long should be 128 bit on 64 bit computers
> Then there would be no 64 bit integer type.
> char = 8 bit
> short = 16 bit
> int = 32 bit
> long = 64 bit (on 64 bit)
> long long = 64 bit on both 32 and 64 bit (de-facto).
Sure you could, if would just be a different mess ;)
8 bit char/int8_t
16 bit long char/short short int/int16_t
32 bit long long char/short int/int32_t
64 bit int/int64_t
128 bit long int/int128_t
256 bit long long int/int256_t
The long appears to act as multiplicative modifiers to the
base types char/int/float, odd it now only seems to work on int and double.
gcc will also produce "long long long int" is too long for GCC.
Short used to be the inverse of long, so I still expect it to work the same way.
Short short does not work at the moment but I seem to remember long
used to be only once also where long long int would be the same as long
int or maybe a syntax error.
I think long char used to be how you would get the 16bit wide char type.
short char used to be a nop, both appear to nolonger work :(
and I have just found out that gcc-4.8.5 now errors on long float and short double
both of which I have used in the past.
Bah it appears I am an old fogie and still expect the syntax to be regular
in the old way based on learning K&R style and the early squirrely compilers.
I will have to get a cane to wave at these young compiler whippersnappers.
> The uint99_t stuff are typedefs that have to resolve to an underlying
> integer type.
Err, I really don't care how it is implemented all that much so long as they get
it right, for all I care char/short int/int/long int/long long int could all be typedefs
to the base int8_t/int16_t/int32_t/int64_t types.
Iff the compiler makes them work correctly.
>> but LP64 was a hack to work
>> around poorly written software, long long /should/ be 256 bits :) not mearly 128 bit.
> You know how people went to great lengths to avoid using uint64_t on 32
> bit machines because it introduced libgcc_s.so calls and sucked in
> _deeply_ crappy code to do FOIL multiplies and divides from high school
> You're saying "64 bit should have this problem too".
I would argue that that was gcc having a bug/being stupid not that it is the fault
of the structure of the language, deeply crappy code in gcc is not a problem with
the C language, did they ever get around to fixing gcc?
> Bignum libraries exist. A 256 byte integer type doesn't let you do
> crytptography or implement standards-compliant BC without using them.
> (Heck, Posix and LSB are hacks to work around poorly written software.
> Kinda both's reason d' et cetera.)
>> Yes, uint64_t is a bit of a mess, but if the compiler puts some other size in there I would
>> feel fully justified in bitching about it.
> It would be a standards violation.
>> int, long and long long are compiler dependent and can
>> be whatever they desire and are per-arch,
> LP64 says what int and long should be, and specifies at least a minimum
> size for long long. Linux, BSD, and MacOS X depend on LP64. As does
> toybox (in design.html I believe).
>> so I try to use it where I want a particular size.
> Good for you...?
>> For example int was the size to store pointers in,
>> as it was the machine word per K & R explicited stated store pointer
> in int.
>> now it is long, or better yet void *.
> Ah, the days when char could be 18 bits because some machines were just
> crazy and we hadn't weeded out the weak hardware designs yet.
> That went away.
Also I would like it to fail loudly at compile time on that 18 bit machine not helpfully convert it
to a 18 bit word or something there are other types like uint_least16_t defined for things like that.
Though I have not used them.
>> I did find a couple of uint128_t references on my system.
> gcc of course added a __int128 compiler extension which is two 64 bit
> integers glued together just like 32 bit mode. How you printf() them is
> left as an exercise to the reader apparently?
> I'm not going there. I did a sizeof(long long) on every aboriginal linux
> target to check what the size actually _was_, but as far as I know the
> limited number of units here are the first thing that might actually
> care about the size being larger. (Because it could overflow the string
> buffer allocation since we're not passing in a length. 64 bit input
> won't produce more than ~6 bytes of output depending on flags.)
>>>>> You can also set a flags to drop the space between number and prefix or use the ubuntu 0..1023 style
>>>>> also you can request the limited range 0..999, 1.0 k-999 k style in either SI or IEC
>>>> Yes, but why would we want to?
>> Strict conformance to the standard? avoiding the 9999->9.8Ki transition.
> The first I heard of this standard was when you mentioned it. Ubuntu
> clearly wasn't doing it.
I should have said style not standard, but now it does have a standard.
Apparently as of Ubuntu 10.10 they try to use 1024 units, but standards compliance was
not the Ubuntu focus.
And now just to help consistancy as of Mac OS/X 10.6 Snow Lepoard Apple has gone
to units of 1000 for disk sizes
>>>>> This is pure integer, I could open code the printf also as it can only have 4 digits maximum at the moment.
>>>>> If you want I could make it autosizing rather than just one decimal between 0.1..9.9
>>>>> Also if any of the symbols are defined to 0 the capability will drop out.
>>>>> Perhaps I should make it default to IEC "Ki" style? getting it right vs bug compatibility.
>>>>> I made a testing command e.g. toybox_human_readable_test to allow me to test it.
>>>> I had toys/examples/test_human_readable.c which I thought I'd checked in
>>>> a couple weeks ago but apparently forgot to "git add".
>> I was thinking maybe it needs a better name, outputting info for humans would be nice
>> to be able to do from the shell, so it could be actually used in production.
> It defaults to "n" in defconfig. It's a testing command. That's why it
> has "test" in the name and lives in the "examples" directory.
> This is beyond infrastructure in search of a user, you're letting
> infrastructure suggest a use case. "If all you have is a hammer,
> everything looks like a nail." Nobody's _asked_ for this.
>>>> (If you git add a file, git diff shows no differences, mercurial diff
>>>> shows it diffed against /dev/null. I'm STILL getting used to the weird
>>>> little behavioral divergences.)
>>>>> I hope this is interesting.
>>>> It's very interesting and I'm keeping it around in case it's needed. I'm
>>>> just trying to figure out if the extra flags are something any command
>>>> is actually going to use. (And that's an Elliott question more than a me
>>>> question, I never use -h and it's not in posix or LSB.)
>> Odd, it has been in common useage for years, but I guess it was just whatever
>> people felt a human would like to see rather than one of the standards.
> It's got a dozen flags because everybody who implemented this did it
> differently because the machine readable scriptable version is just to
> print out the actual NUMBER, thus the aesthetic cleanup is (or at least
> should be) just that.
> Bringing an international standards body into a purely aesthetic
> decision is weird. ANSI vs ISO tea was a _joke_.
> (Ok, maybe the aesthetic output has mutated into functional due to
> screen scrapers, which is what Elliott was implying by scripts depending
> on -h output. In which case either rigorously copying the historical
> mistakes or breaking them really loudly is called for. Adding a
> standards body to that sort of mess gives me a headache long before we
> get into any sort of details.)
Hey, ~35 years ago my first engineering course spent quite a bit of time on
stuff like this, and no this is not for screen scrapers but rather to maximize the
functional usefullness to the human by fiting into our biases and decluttering
the dispaly of values so the useful part can be understood.
Engeering Notation was designed to make it easier for humans to understand
the values. https://en.wikipedia.org/wiki/Engineering_notation and binary prefixes
because computers are binary https://en.wikipedia.org/wiki/Binary_prefix
apparently a software curmudgeon ;)