Discussion:
[PATCH] fix newline on stdin for
(too old to reply)
Robert Thompson
2014-12-12 23:19:21 UTC
Permalink
I ran across a variance between toybox factor and coreutils factor.

Coreutils factor will accept numbers on stdin separated by any whitespace
(including newlines and tabs) between integers, but toybox factor was only
accepting one integer per line.

I added a test for this, and hacked factor to give the expected behavior.
It's not properly indented, and it depends on isspace(), but it seems to be
doing the job.



diff -r 51b7d1af353b tests/factor.test
--- a/tests/factor.test Thu Dec 11 20:17:28 2014 -0600
+++ b/tests/factor.test Fri Dec 12 17:10:49 2014 -0600
@@ -16,3 +16,7 @@
"10000000018: 2 131 521 73259\n" "" ""
testing "factor 10000000019" "factor 10000000019" \
"10000000019: 10000000019\n" "" ""
+
+testing "factor 3 6 from stdin" "factor" "3: 3\n6: 2 3\n" "" "3 6"
+testing "factor stdin newline" "factor" "3: 3\n6: 2 3\n" "" "3\n6\n"
+
diff -r 51b7d1af353b toys/other/factor.c
--- a/toys/other/factor.c Thu Dec 11 20:17:28 2014 -0600
+++ b/toys/other/factor.c Fri Dec 12 17:10:49 2014 -0600
@@ -20,9 +20,11 @@
static void factor(char *s)
{
long l, ll;
+ while( *s && s[0] && ! isspace(s[0]) ) {
+ printf("->: %s\n",s);

l = strtol(s, &s, 0);
- if (*s) {
+ if (*s && s[0] > 32 ) {
error_msg("%s: not integer");
return;
}
@@ -35,10 +37,10 @@
l *= -1;
}

- // Deal with 0 and 1 (and 2 since we're here)
- if (l < 3) {
+ // Deal with 0..3
+ if (l < 4) {
printf(" %ld\n", l);
- return;
+ continue;
}

// Special case factors of 2
@@ -61,6 +63,7 @@
}
}
xputc('\n');
+ }
}

void factor_main(void)
Rob Landley
2014-12-25 03:47:35 UTC
Permalink
Post by Robert Thompson
I ran across a variance between toybox factor and coreutils factor.
Coreutils factor will accept numbers on stdin separated by any whitespace
(including newlines and tabs) between integers, but toybox factor was only
accepting one integer per line.
Really?

$ factor ""
factor: `' is not a valid positive integer
$ factor "32 "
factor: `32 ' is not a valid positive integer
$ factor "32 7"
factor: `32 7' is not a valid positive integer

Must be newer than Ubuntu 12.04... Ah, on _stdin_. Right. Confirmed.

Hmmm... might as well make it take both anyway.
Post by Robert Thompson
I added a test for this, and hacked factor to give the expected behavior.
It's not properly indented, and it depends on isspace(), but it seems to be
doing the job.
I think you left a debug printf in there, it's making all the tests fail,
including the ones you submitted:

$ VERBOSE=fail scripts/test.sh factor
scripts/make.sh
Generate headers from toys/*/*.c...
Make generated/config.h from .singleconfig.
generated/flags.h generated/help.h
Compile toybox.....
FAIL: factor -32
echo -ne '' | factor -32
--- expected 2014-12-23 20:48:38.689595406 -0600
+++ actual 2014-12-23 20:48:38.693595406 -0600
@@ -1 +1,2 @@
+->: -32
-32: -1 2 2 2 2 2

A couple other issues:

@@ -20,9 +20,11 @@
static void factor(char *s)
{
long l, ll;
+ while( *s && s[0] && ! isspace(s[0]) ) {
+ printf("->: %s\n",s);

l = strtol(s, &s, 0);

*s and s[0] are the same thing.

@@ -61,6 +63,7 @@
}
}
xputc('\n');
+ }
}

void factor_main(void)

As you mentioned, you added a curly bracket level without indenting the code.
I could do a tail call and expect the compiler to turn the recursion into
iteration, but reindenting the code properly is worth the noise in the diff.

The version I checked in won't error out for 'factor ""' or 'factor "36 "'
the way Ubuntu's will, but I think I'm ok with that...?

Let me know if there are more things to fix.

Thanks,

Rob
enh
2014-12-25 07:46:15 UTC
Permalink
Post by Rob Landley
Post by Robert Thompson
I ran across a variance between toybox factor and coreutils factor.
Coreutils factor will accept numbers on stdin separated by any whitespace
(including newlines and tabs) between integers, but toybox factor was only
accepting one integer per line.
Really?
$ factor ""
factor: `' is not a valid positive integer
$ factor "32 "
factor: `32 ' is not a valid positive integer
$ factor "32 7"
factor: `32 7' is not a valid positive integer
Must be newer than Ubuntu 12.04... Ah, on _stdin_. Right. Confirmed.
Hmmm... might as well make it take both anyway.
Post by Robert Thompson
I added a test for this, and hacked factor to give the expected behavior.
It's not properly indented, and it depends on isspace(), but it seems to be
doing the job.
I think you left a debug printf in there, it's making all the tests fail,
$ VERBOSE=fail scripts/test.sh factor
scripts/make.sh
Generate headers from toys/*/*.c...
Make generated/config.h from .singleconfig.
generated/flags.h generated/help.h
Compile toybox.....
FAIL: factor -32
echo -ne '' | factor -32
--- expected 2014-12-23 20:48:38.689595406 -0600
+++ actual 2014-12-23 20:48:38.693595406 -0600
@@ -1 +1,2 @@
+->: -32
-32: -1 2 2 2 2 2
@@ -20,9 +20,11 @@
static void factor(char *s)
{
long l, ll;
+ while( *s && s[0] && ! isspace(s[0]) ) {
+ printf("->: %s\n",s);
l = strtol(s, &s, 0);
*s and s[0] are the same thing.
@@ -61,6 +63,7 @@
}
}
xputc('\n');
+ }
}
void factor_main(void)
As you mentioned, you added a curly bracket level without indenting the code.
I could do a tail call and expect the compiler to turn the recursion into
iteration, but reindenting the code properly is worth the noise in the diff.
The version I checked in won't error out for 'factor ""' or 'factor "36 "'
the way Ubuntu's will, but I think I'm ok with that...?
out of curiosity, what practical use is there for factor? even the
coreutils version gives up around 38 decimal digits, and it's pretty
slow even with numbers that small.
Post by Rob Landley
Let me know if there are more things to fix.
Thanks,
Rob
Robert Thompson
2014-12-25 11:48:26 UTC
Permalink
I've only seen it used in shell scripts. The most interesting example was
an ancient script I saw a few years ago that used it with seq. It split a
single large directory into a blah/$X/Y/filename nested-cache directory.

Not sure if that is representative, though.
Post by Robert Thompson
Post by Rob Landley
Post by Robert Thompson
I ran across a variance between toybox factor and coreutils factor.
Coreutils factor will accept numbers on stdin separated by any
whitespace
Post by Rob Landley
Post by Robert Thompson
(including newlines and tabs) between integers, but toybox factor was
only
Post by Rob Landley
Post by Robert Thompson
accepting one integer per line.
Really?
$ factor ""
factor: `' is not a valid positive integer
$ factor "32 "
factor: `32 ' is not a valid positive integer
$ factor "32 7"
factor: `32 7' is not a valid positive integer
Must be newer than Ubuntu 12.04... Ah, on _stdin_. Right. Confirmed.
Hmmm... might as well make it take both anyway.
Post by Robert Thompson
I added a test for this, and hacked factor to give the expected
behavior.
Post by Rob Landley
Post by Robert Thompson
It's not properly indented, and it depends on isspace(), but it seems
to be
Post by Rob Landley
Post by Robert Thompson
doing the job.
I think you left a debug printf in there, it's making all the tests fail,
$ VERBOSE=fail scripts/test.sh factor
scripts/make.sh
Generate headers from toys/*/*.c...
Make generated/config.h from .singleconfig.
generated/flags.h generated/help.h
Compile toybox.....
FAIL: factor -32
echo -ne '' | factor -32
--- expected 2014-12-23 20:48:38.689595406 -0600
+++ actual 2014-12-23 20:48:38.693595406 -0600
@@ -1 +1,2 @@
+->: -32
-32: -1 2 2 2 2 2
@@ -20,9 +20,11 @@
static void factor(char *s)
{
long l, ll;
+ while( *s && s[0] && ! isspace(s[0]) ) {
+ printf("->: %s\n",s);
l = strtol(s, &s, 0);
*s and s[0] are the same thing.
@@ -61,6 +63,7 @@
}
}
xputc('\n');
+ }
}
void factor_main(void)
As you mentioned, you added a curly bracket level without indenting the
code.
Post by Rob Landley
I could do a tail call and expect the compiler to turn the recursion into
iteration, but reindenting the code properly is worth the noise in the
diff.
Post by Rob Landley
The version I checked in won't error out for 'factor ""' or 'factor "36
"'
Post by Rob Landley
the way Ubuntu's will, but I think I'm ok with that...?
out of curiosity, what practical use is there for factor? even the
coreutils version gives up around 38 decimal digits, and it's pretty
slow even with numbers that small.
Post by Rob Landley
Let me know if there are more things to fix.
Thanks,
Rob
Rob Landley
2014-12-26 00:04:50 UTC
Permalink
Post by enh
Post by Rob Landley
Post by Robert Thompson
I ran across a variance between toybox factor and coreutils factor.
...
Post by enh
Post by Rob Landley
As you mentioned, you added a curly bracket level without indenting the code.
I could do a tail call and expect the compiler to turn the recursion into
iteration, but reindenting the code properly is worth the noise in the diff.
The version I checked in won't error out for 'factor ""' or 'factor "36 "'
the way Ubuntu's will, but I think I'm ok with that...?
out of curiosity, what practical use is there for factor? even the
coreutils version gives up around 38 decimal digits, and it's pretty
slow even with numbers that small.
I was reading http://www.muppetlabs.com/~breadbox/txt/rsa.html#14 on a
long bus ride, because I probably have to implement TLS someday (by
which I mean https not thread local storage) because wget can't talk to
the world without encryption anymore (thanks NSA), and the section I
linked to above used "factor", and I went "that's a command? Apparently
so. This is probably like a dozen lines to implement"... and had it
working before the end of the bus ride.

So the answer is "you use it when writing tutorials about cryptographic
implementation details".

Really I just needed a break from reading mind-bending math stuff so I
implemented a small self-contained thing for fun, and then checked it in
because it came preinstalled in ubuntu and was tiny. I wouldn't have
bothered if you couldn't switch it off in the config when optimizing for
size.

I was surprised as anybody else to get a patch for it, but that means
it's apparently useful to somebody. By all means, switch it off in
Android. :)

Rob

P.S. At a design level I thought about defaulting it "n" but the
defconfig y/n signalling primarily indicates "is this done or not" and
it was finished and worked fine, so... (Well, the examples directory
also has stuff that defaults to "n" but factor isn't really a
demonstration of how to use the toybox infrastructure either.) And
defaulting "n" for other reasons is editorializing, where does it stop?
rev and tac? fallocate? makedevs? freeramdisk? partprobe? People _asked_
me to add most of those, because they needed them. If somebody want to
make a .config file selecting a subset of the commands, you can do that.
It's not my job to guess how people will use generic tools.
enh
2014-12-26 18:46:50 UTC
Permalink
Post by Rob Landley
Post by enh
Post by Rob Landley
Post by Robert Thompson
I ran across a variance between toybox factor and coreutils factor.
...
Post by enh
Post by Rob Landley
As you mentioned, you added a curly bracket level without indenting the code.
I could do a tail call and expect the compiler to turn the recursion into
iteration, but reindenting the code properly is worth the noise in the diff.
The version I checked in won't error out for 'factor ""' or 'factor "36 "'
the way Ubuntu's will, but I think I'm ok with that...?
out of curiosity, what practical use is there for factor? even the
coreutils version gives up around 38 decimal digits, and it's pretty
slow even with numbers that small.
I was reading http://www.muppetlabs.com/~breadbox/txt/rsa.html#14 on a
long bus ride, because I probably have to implement TLS someday (by
which I mean https not thread local storage) because wget can't talk to
the world without encryption anymore (thanks NSA), and the section I
linked to above used "factor", and I went "that's a command? Apparently
so. This is probably like a dozen lines to implement"... and had it
working before the end of the bus ride.
speaking of which (and going back to "simple is complex"), i have an
openssl- (or boringssl-)based md5sum/sha1sum implementation that adds
all the other shas too. (a toybox built with all these is actually a
couple of hundred bytes larger than the one with just md5/sha1sum, but
that's because of the duplicated help strings.)

i know one of your goals is to minimize dependencies, but for us the
goal of minimizing duplication (and thus amount of code to audit) is
probably stronger. i suspect no one really cares that the toybox
hashes are slower than the openssl ones, but the security folks
probably will care about having another TLS implementation. (and
things like reimplementing zlib and bunzip2 probably fall somewhere in
between.)

in this specific case the openssl API is reasonable enough your
implementations could be a drop-in replacement, but i suspect in other
cases part of your motivation for writing your own will have been the
awful API. also in this specific case there's almost no sharing
between the implentations anyway because 99% of the code is the hash
implementation itself. but if you can, keeping API compatibility with
the library you're trying to replace would be good.

anyway, let me know whether you'd like to merge stuff like this into
the main codebase. otherwise i can just "git rm" locally and add the
alternative version to toys/android. i'll get a delete/merge conflict
if you change anything in your version so i'll be able to track
changes, so it's only really a loss if you think you have other users
who'd prefer to use openssl.
Post by Rob Landley
So the answer is "you use it when writing tutorials about cryptographic
implementation details".
Really I just needed a break from reading mind-bending math stuff so I
implemented a small self-contained thing for fun, and then checked it in
because it came preinstalled in ubuntu and was tiny. I wouldn't have
bothered if you couldn't switch it off in the config when optimizing for
size.
I was surprised as anybody else to get a patch for it, but that means
it's apparently useful to somebody. By all means, switch it off in
Android. :)
Rob
P.S. At a design level I thought about defaulting it "n" but the
defconfig y/n signalling primarily indicates "is this done or not" and
it was finished and worked fine, so... (Well, the examples directory
also has stuff that defaults to "n" but factor isn't really a
demonstration of how to use the toybox infrastructure either.) And
defaulting "n" for other reasons is editorializing, where does it stop?
rev and tac? fallocate? makedevs? freeramdisk? partprobe? People _asked_
me to add most of those, because they needed them. If somebody want to
make a .config file selecting a subset of the commands, you can do that.
It's not my job to guess how people will use generic tools.
yeah, i was hoping to abdicate responsibility for subsetting and was
disappointed to find that 'default' didn't mean "you probably want
this". but it makes sense, and the subset that one project needs isn't
necessarily going to be the same as any other project.

it sucks to be me though. the best i can aim for is to try to ensure
that there are roughly the same number of people complaining i put too
much in as people complaining i left too much out :-)

it's a pity the debian popularity contest only has per-package data
(https://qa.debian.org/popcon.php?package=coreutils). if you ask
people they always tell you they use everything "all the time". even
if you broke it two releases ago and removed it one release ago.

to work out which options are important for the commands that toolbox
and toybox have in common, i've been relying on my command-line
history, what i can find in scripts, and whether someone cared enough
to add/fix something. but i don't yet have a plan for all the stuff in
toys/pending. i also haven't thought much about "in the binary" versus
"gets a symlink"; i suspect that the "too much" camp will be further
subdivided into those who're offended by the binary size and those
who're offended by the number of symlinks in /system/bin.

and getting back to factor, i can't decide whether having it paints a
target on my back or gives me something i don't care about to throw
under the bus as a gesture of goodwill :-)
stephen Turner
2014-12-26 22:36:08 UTC
Permalink
including list.
Post by enh
speaking of which (and going back to "simple is complex"), i have an
Post by enh
openssl- (or boringssl-)based md5sum/sha1sum implementation that adds
all the other shas too. (a toybox built with all these is actually a
couple of hundred bytes larger than the one with just md5/sha1sum, but
that's because of the duplicated help strings.)
i know one of your goals is to minimize dependencies, but for us the
goal of minimizing duplication (and thus amount of code to audit) is
probably stronger. i suspect no one really cares that the toybox
hashes are slower than the openssl ones, but the security folks
probably will care about having another TLS implementation. (and
things like reimplementing zlib and bunzip2 probably fall somewhere in
between.)
in regards to openssl, tls, and the like, would these be reusable
implementations for webkit and other web browser backends? If other
programs will be able to use then i would say to implement it to its
fullest needed to support the majority of applications. otherwise if its
only for internal support the bare minimum needed would be fine.
Post by enh
yeah, i was hoping to abdicate responsibility for subsetting and was
disappointed to find that 'default' didn't mean "you probably want
this". but it makes sense, and the subset that one project needs isn't
necessarily going to be the same as any other project.
If were referring to defconfig i would hope it would include all intended
to be included toybox apps that are currently stable or at least have an
option that behaves in the same way.
Rob Landley
2014-12-29 21:07:22 UTC
Permalink
Post by stephen Turner
including list.
On Fri, Dec 26, 2014 at 5:35 PM, stephen Turner
speaking of which (and going back to "simple is complex"), i have an
openssl- (or boringssl-)based md5sum/sha1sum implementation that adds
all the other shas too. (a toybox built with all these is actually a
couple of hundred bytes larger than the one with just
md5/sha1sum, but
that's because of the duplicated help strings.)
i know one of your goals is to minimize dependencies, but for us the
goal of minimizing duplication (and thus amount of code to audit) is
probably stronger. i suspect no one really cares that the toybox
hashes are slower than the openssl ones, but the security folks
probably will care about having another TLS implementation. (and
things like reimplementing zlib and bunzip2 probably fall somewhere in
between.)
in regards to openssl, tls, and the like, would these be reusable
implementations for webkit and other web browser backends?
No. And I'm really leaning towards not doing it if I can avoid it. I
just need the functionality, and don't want to link against external
libraries adding unbounded complexity to the project.

What I'd like is a command I can run and pipe an http:// session through
to turn it into an https:// session, and have _that_ thing worry about
what that means. Unfortunately, the stunnel project appears to be crap
(haven't looked at it in a while, but at the time it wasn't something I
wanted to get on me), and I haven't found a decent small clone of it.
(Tried to talk dropbear into adding one and they didn't want to expand
their scope. Can't blame 'em.)

The actual _math_ of doing the encryption doesn't seem so bad,
especially since "bc" requires a bignum library (thank you posix) and
Peter "let's complicate everything" Anvin swapped out my patch to remove
perl from the kernel build with a version that calls "bc" instead.
(Which busybox doesn't implement.)

Unfortunately, A) this is encryption code so the full cryptographic
paranoia kicks in, and I really dowanna go there, B) the constellation
of certificates needed to verify site identity is just horrific and I DO
NOT WANT TO GO THERE.

This is why "looking at" has not translated into "I'm writing", and it
is NOT in scope for the 1.0 release. (Which means "wget" is bordering on
useless at the moment, but eh...)
Post by stephen Turner
If other
programs will be able to use then i would say to implement it to its
fullest needed to support the majority of applications. otherwise if
its only for internal support the bare minimum needed would be fine.
I'm only interested in the bare minimum. Unfortunately, the bare minimum
is enormous. (Certificates!)

(I did about half the work once to make landley.net work with https and
hit the "I don't have root access on the server, how do I tell my ISP
where to install a certificate" and it went on the todo list. You know
about the todo list...)
Post by stephen Turner
yeah, i was hoping to abdicate responsibility for subsetting and was
disappointed to find that 'default' didn't mean "you probably want
this". but it makes sense, and the subset that one project needs isn't
necessarily going to be the same as any other project.
If were referring to defconfig i would hope it would include all
intended to be included toybox apps that are currently stable or at
least have an option that behaves in the same way.
Defconfig is the maximum sane configuration. It's all the stuff that
works without requiring strange build-time prerequisites like selinux.
Some things in the toybox sub-menu (debug options, the unnecessary
memory freeing to make valgrind and such happy. etc), the "examples"
directory, and the "pending" directory default n. Stuff doesn't get
promoted out of pending until it can default y.

Rob
stephen Turner
2014-12-29 22:58:13 UTC
Permalink
Forgot to include the list.
Post by Rob Landley
(I did about half the work once to make landley.net work with https and
hit the "I don't have root access on the server, how do I tell my ISP
where to install a certificate" and it went on the todo list. You know
about the todo list...)
Runneth over? :-p
Post by Rob Landley
Defconfig is the maximum sane configuration. It's all the stuff that
works without requiring strange build-time prerequisites like selinux.
Some things in the toybox sub-menu (debug options, the unnecessary
memory freeing to make valgrind and such happy. etc), the "examples"
directory, and the "pending" directory default n. Stuff doesn't get
promoted out of pending until it can default y.
Rob
Cool. all sounds good to me. what about building the source on a musl
system, do you know if that is still buggy or not? I am going to be testing
it anyways regardless of buggy or not but just curious what to look for.
Also about your todo list, i heard that mawk was fastest among awks but
sadly its gpl2 so not sure if you could preserve the code/structure that
makes it fast but i'm hoping. I don't know awk..... guess i'm buying a book.
thanks,
stephen
Rob Landley
2014-12-30 02:22:32 UTC
Permalink
Post by stephen Turner
Forgot to include the list.
(I did about half the work once to make landley.net
<http://landley.net> work with https and
hit the "I don't have root access on the server, how do I tell my ISP
where to install a certificate" and it went on the todo list. You know
about the todo list...)
Runneth over?
More a compost heap where ideas moulder until they sprout, really...

Today I found out that the 3.18 kernel build sprouted perl again (and
bisected it but haven't fixed it yet), and got over a dozen new emails
to answer (not just here, also stuff like being cc'd on yet another
person going "If I specify root= on the kernel command line I get
initramfs instead of initmpfs"), and had house repair stuff crop up
(which meant I rescheduled with the guy who wants me to review his
raspberry pi project design)...

If I start tomorrow with the same list of todo items I started today,
I've actually come out _ahead_.
Post by stephen Turner
Defconfig is the maximum sane configuration. It's all the stuff that
works without requiring strange build-time prerequisites like selinux.
Some things in the toybox sub-menu (debug options, the unnecessary
memory freeing to make valgrind and such happy. etc), the "examples"
directory, and the "pending" directory default n. Stuff doesn't get
promoted out of pending until it can default y.
Rob
Cool. all sounds good to me. what about building the source on a musl
system, do you know if that is still buggy or not?
I periodically regression test, but I don't do extensive testing of the
result. (It compiles and a couple things run.)

I'm aware a disagreement between me and rich where he says the linux man
pages are wrong and I say musl fails to run code that glibc and uClibc
both run fine:

http://landley.net/hg/toybox/rev/1512

And I'm aware that musl's regex engine doesn't implement \| in
nonextended regexes so grep.c using those to glue together multiple
targets (again, works on glibc and uClibc) doesn't work on musl and
about when I wrote a path to implement it in musl they refactored their
regex engine so the patch didn't apply (but the new one didn't implement
\| either).

Other than that, I think all the musl stuff should work? If it doesn't,
ping me and I'll try to fix it.
Post by stephen Turner
I am going to be
testing it anyways regardless of buggy or not but just curious what to
look for.
I need to attack the testing directory to at least make everything
that's there PASS so I can use it to regression test. Unfortunately I
left stuff half-done and got several submissions that included failing
tests, so I can't easily use the test suite to find regressions...

It's on the todo list!
Post by stephen Turner
Also about your todo list, i heard that mawk was fastest among awks but
sadly its gpl2 so not sure if you could preserve the code/structure that
makes it fast but i'm hoping.
If its GPL I don't actually plan to _read_ its code. (We're not quite to
the "SCO contamination theory" level of frivolous GPL enforcement suits,
but give it 5-10 years...)
Post by stephen Turner
I don't know awk..... guess i'm buying a book.
There's a posix spec, a PDF of the original AWK book is online at
http://books.cat-v.org/computer-science/awk-programming-language/The_AWK_Programming_Language.pdf

(Whether legally or not, I couldn't tell you...)
Post by stephen Turner
thanks,
stephen
Rob
Isaac Dunham
2014-12-30 04:09:57 UTC
Permalink
Post by Rob Landley
Also about your todo list, i heard that mawk was fastest among awks but
sadly its gpl2 so not sure if you could preserve the code/structure that
makes it fast but i'm hoping.
If its GPL I don't actually plan to _read_ its code. (We're not quite to
the "SCO contamination theory" level of frivolous GPL enforcement suits,
but give it 5-10 years...)
I've read that mawk uses a JIT.
It also has fixed limits, unlike gawk.

The current maintainer is Thomas Dickey, maintainer of xterm, byacc,
dialog, and at least a half-dozen other things. He relicensed dialog
from GPL to LGPL after finding that he replaced all the code except
the function declarations and brackets, so I don't think he's in favor
of that view.
Post by Rob Landley
I don't know awk..... guess i'm buying a book.
There's a posix spec, a PDF of the original AWK book is online at
http://books.cat-v.org/computer-science/awk-programming-language/The_AWK_Programming_Language.pdf
The "One True AWK" (the latest iteration of the implementation that goes
with the book) is under a BSD-ish license, (c) Lucent.
http://www.cs.princeton.edu/~bwk/btl.mirror/

And OpenBSD uses a patched version of it, which can be found on the
usual dozen+ mirrors (under src/usr.bin/awk).

Locally, I generally use github.com/iguleder/lok, which is OpenBSD awk
ported to Linux.

HTH,
Isaac Dunham
stephen Turner
2014-12-30 04:43:53 UTC
Permalink
Post by Isaac Dunham
Post by Rob Landley
Also about your todo list, i heard that mawk was fastest among awks but
sadly its gpl2 so not sure if you could preserve the code/structure
that
Post by Rob Landley
makes it fast but i'm hoping.
If its GPL I don't actually plan to _read_ its code. (We're not quite to
the "SCO contamination theory" level of frivolous GPL enforcement suits,
but give it 5-10 years...)
I've read that mawk uses a JIT.
It also has fixed limits, unlike gawk.
The current maintainer is Thomas Dickey, maintainer of xterm, byacc,
dialog, and at least a half-dozen other things. He relicensed dialog
from GPL to LGPL after finding that he replaced all the code except
the function declarations and brackets, so I don't think he's in favor
of that view.
Post by Rob Landley
I don't know awk..... guess i'm buying a book.
There's a posix spec, a PDF of the original AWK book is online at
http://books.cat-v.org/computer-science/awk-programming-language/The_AWK_Programming_Language.pdf
The "One True AWK" (the latest iteration of the implementation that goes
with the book) is under a BSD-ish license, (c) Lucent.
http://www.cs.princeton.edu/~bwk/btl.mirror/
And OpenBSD uses a patched version of it, which can be found on the
usual dozen+ mirrors (under src/usr.bin/awk).
Locally, I generally use github.com/iguleder/lok, which is OpenBSD awk
ported to Linux.
HTH,
Isaac Dunham
i was just tickled by the article i read where mawk was compared against
other languages and awks. All this started by a family member who worked(s)
on IBM unix systems and mentioned i should checkout nawk.

http://brenocon.com/blog/2009/09/dont-mawk-awk-the-fastest-and-most-elegant-big-data-munging-language/

thanks for the pdf, i will add it to my ebook. being posix i assume this
well be the most standard accross awks.
Rob Landley
2014-12-28 01:02:14 UTC
Permalink
Post by enh
Post by Rob Landley
Post by enh
Post by Rob Landley
The version I checked in won't error out for 'factor ""' or 'factor "36 "'
the way Ubuntu's will, but I think I'm ok with that...?
out of curiosity, what practical use is there for factor? even the
coreutils version gives up around 38 decimal digits, and it's pretty
slow even with numbers that small.
I was reading http://www.muppetlabs.com/~breadbox/txt/rsa.html#14 on a
long bus ride, because I probably have to implement TLS someday (by
which I mean https not thread local storage) because wget can't talk to
the world without encryption anymore (thanks NSA), and the section I
linked to above used "factor", and I went "that's a command? Apparently
so. This is probably like a dozen lines to implement"... and had it
working before the end of the bus ride.
speaking of which (and going back to "simple is complex"), i have an
openssl- (or boringssl-)based md5sum/sha1sum implementation that adds
all the other shas too. (a toybox built with all these is actually a
couple of hundred bytes larger than the one with just md5/sha1sum, but
that's because of the duplicated help strings.)
Actually the main reason I don't include external code is licensing.

Last year I gave two talks about how I went from GPL fanboy to advocate
of the public domain. I didn't quite fit either of them in the assigned
timeslot, but the more coherent of the two is probably:

https://archive.org/download/OhioLinuxfest2013/24-Rob_Landley-The_Rise_and_Fall_of_Copyleft.mp3

The current toybox license places the code into the public domain. It
_looks_ like a BSD license, and I sometimes call it "zero clause BSD"
because of this, but the requirement to copy this specific license text
into derivative works is absent. This means it's a permission grant that
allows reusing the code without even attributing it. (Attribution is
_polite_, but it's possible to plagiaraize shakespeare. That's not a
licensing issue, and these days Google makes it pretty easy for teachers
to catch all those recycled term papers anyway.)

The problem with BSD-style licenses is that there are a lot of them (2
clause BSD, 3 clause BSD, 4 clause BSD, ISC, MIT, Apache, and so on)
that all try to do the same thing but all of them say "you must copy
this specific wording into your derived work", so if you combine code
from two sources under different BSD variants you wind up concatenating
the licenses, and this can get epically silly (the kindle paperwhite's
about->licenses thing is over 300 pages of concatenated license
boilerplate.)

I respect BSD/ISC/Apache license terms enough _not_ to treat them as
public domain. I would like toybox to provide a source of reusable
public domain code, it's one of the goals of the project.

Toybox has included explicitly public domain code from external sources
(such as the xz implementation for toys/pending/xzcat.c), and I've
looked at the libtom bignum library for implementing bc (haven't managed
to make much sense of it, to be honest). But I recently turned down a
ping.c submission that was based on BSD ping, in favor of writing my own.
Post by enh
i know one of your goals is to minimize dependencies,
I'm juggling an awful lot of conflicting goals. (Most of them listed on
the roadmap or design pages.)

Because of this, toybox is probably going to implement more than a lot
of users need, but as long as the commands are self-contained you can
switch off any command in your config that you don't want to ship.

If, for auditing reasons, you don't want to use toybox's sha1sum but
instead want to use an openssl derived version that shares code with
other sha1sum instances you've already cleared and are using elsewhere,
then that's what makes sense for your deployment. (If you grow to trust
toybox's version later, you can swich to it then after everybody else
has looked at it longer.)
Post by enh
but for us the
goal of minimizing duplication (and thus amount of code to audit) is
probably stronger. i suspect no one really cares that the toybox
hashes are slower than the openssl ones, but the security folks
probably will care about having another TLS implementation.
Indeed, and I agree. I don't _want_ to write TLS, I think it's out of
scope for toybox... except that I need the functionality to do basic web
transactions that _are_ in scope. (The internet's changing out from
under me. Two years ago you could talk to github, kernel.org, and
twitter without https. Now if you try they redirect.)

What I really want is an "stunnel" variant that works, so I can pipe an
https session through something that encrypts it for me.

https://www.stunnel.org/index.html

I tried to convince dropbear to add one years ago, but their reply was
more or less "patches welcome".

http://lists.ucc.gu.uwa.edu.au/pipermail/dropbear/2007q1/000506.html
http://lists.ucc.gu.uwa.edu.au/pipermail/dropbear/2008q4/000859.html

I prefer not to link toybox against external libraries (I could give a
long talk about why, but not here), and sucking in nontrivial amounts of
external code to maintain a local copy has its own large downsides. But
calling reasonably standardized external commands and piping stuff
through them? I'm all for it.

In fact toybox commands are designed to be able to call external
versions of commands even when toybox has its own implementation. That's
why mount.c doesn't check if CFG_LOSETUP is enabled before trying to
xpopen("losetup"), if it's there in the $PATH but not in this binary, ok
then.

As for the md5/sha1/sha256/sha3, they're easy to test (their failures
tend to be really obvious), and the two I implemented are inherently
timing invariant and don't have obvious sidechannel attacks. And I _can_
find existing public domain impelmentations of these to start from, such as:

http://cpansearch.perl.org/src/BJOERN/Compress-Deflate7-1.0/7zip/C/Sha256.c

So adding the other hashing functions to toybox makes sense to me,
especially since I need them for a traditional /etc/shadow login.c. (I
need to research android's user database and how to access it.)

That said, I _do_ care that they're slower than other implementations.
That's a simple vs fast balance that's... I took the first speedup
patch, didn't take the second speedup patch, and I need to go back and
look at it...
Post by enh
(and
things like reimplementing zlib and bunzip2 probably fall somewhere in
between.)
One of the goals I'm juggling is "busybox replacement", and they have
this stuff. But again that's just a weighting, busybox alrady contains a
lot of stuff we're _not_ implementing.

If I was starting from scratch today I might leave them out, but I have
a history with both bzip2 and gzip which makes it easier for me to keep
both of them in scope.

The one we really _need_ is deflate/inflate, because we should have a
compression algorithm and that's the simplest and most lightweight one.
The extract side of the other two are there because tarballs come in
that format and a build environment needs to be able to extract them
(another goal I'm juggling. The strace source is _only_ available as .xz
these days, for example.)

But I probably won't bother with the compression side of bzip2 or xz. If
you want to create a new tarball we support gzip and if you want it in
those other formats you can install the other package.

To explain my "history with bzip2 and gzip" above:

I reimplemented bunzip2 years ago because the original was horrid, and
my implementation got sucked up into a bunch of places. (I think the
kernel uses it if you select bzip compression, although these days gzip
or xz are the dominant ones.)

I also wrote 90% of bzip2 compression side support a decade back for
busybox, but got distracted near the end and never got back to it
because the bzip2 compression algorithm is WEIRD:

http://lists.busybox.net/pipermail/busybox/2004-February/010859.html

Even _with_ most of the work done I probably won't bother with bzip2
compression side unless somebody really wants it, both because it's
semi-obsolete these days and because its compression is based on weird
heuristics for the string sorting that I've never managed to clean up
into something understandable. (The "crap.c" above, which is a series of
fallbacks between different sorting algorithms with no explanation of
_why_.) I _can't_ simplify this into something easy to understand that
somebody might want to use as example code in a middle school
programming class, the algorithm is just inherently nuts.

I already did gunzip a few months ago, and I'm working on gzip
compression side support now. I wrote a java implementation of that back
when Java 1.0 didn't include it in the base library. (Java 1.1 came out
before I did the decompression side that time, so I moved on to other
things.) I took info-zip apart back when I as programming for OS/2, I
actually know that one pretty well. So that's probably the only
compressor I'll implement, when it's done it shuld be less than 500
lines of code. Also, Ashwini Sharma asked me to prioritize that so they
can use it in a product.

As for xz: I received an external contribution based on the public
domain decompressor. The "fetch tarball, extract, configure, make,
install" codepath needs to be able to extract them, and the code's
already in. (And is horrible, there's built-in knowledge of various
processor machine language formats, which strongly implies upgrades will
need more of this filigree for new processor variants.)

But I don't particularly want to do the compression side for that.

If you decide to switch off our bunzip2 and use the external version
instead, toybox "tar" should call out to it and pipe stuff through it
just fine. (I dunno if it currently _does_, but once I've cleaned it up...)
Post by enh
in this specific case the openssl API is reasonable enough your
implementations could be a drop-in replacement, but i suspect in other
cases part of your motivation for writing your own will have been the
awful API.
Part, yes. But only part. I mentioned licensing above. There's also the
fact that I can often come up with objectively better code.

In the case of bunzip2, back in 2003 I replaced this:

http://git.busybox.net/busybox/tree/archival/libunarchive/decompress_bunzip2.c?id=6fe55ae93983

With this:

http://git.busybox.net/busybox/tree/archival/libunarchive/decompress_bunzip2.c?id=0d6d88a2058d

That's not just replacing 1658 lines with 531 lines: try actualy reading
the old code. Contemplate the "save state" and the big switch/case in
the main function starting at line 395. (They copy all the local
variables out of a structure, each call, and copy them back before
returning. They use a switch/case with labels covering the whole
function so they can to jump back into the middle of nested loops.
That's so it could return when it ran out of data and be called to
resume decompressing once the buffer was filled. I replaced that with a
get_bits() call that had the filehandle stored away and could read more
data if it needed to.)

Yes, that's Julian Sewards bunzip2 code. That wasn't something toybox
did to it, that's what the upstream package they copied had always been
like.

A more recent case where I shrake a codebase to 1/3 of its original
size/complexity was ifconfig. I described what I did at length here:

http://landley.net/toybox/cleanup.html#ifconfig

The "old" and "new" lines with the totals are links to the original and
changed file. I described each change on the mailing list, and collected
links to all the descriptions on that page. You might want to read just
the first description here:

http://lists.landley.net/pipermail/toybox-landley.net/2013-April/000882.html

Note: the ifconfig I received was a professional contribution from a
team of experienced coders, and what they sent me did work. I'm just...
picky.
Post by enh
also in this specific case there's almost no sharing
between the implentations anyway because 99% of the code is the hash
implementation itself. but if you can, keeping API compatibility with
the library you're trying to replace would be good.
I've pondered adding zlib bindings for deflate/inflate when I get them
done, but that's a post-1.0 thing.

I note that one of my first interactions with Rich Felker (the musl
maintainer) was him explaining to me what would be involved in making an
executable also be a shared library (so you can have libz.so be a
symlink to busybox so -lz was satisfied with the busybox code). Google
finds the old thread at:

http://lists.uclibc.org/pipermail/busybox/2006-April/054373.html

Busybox never did that, but toybox might. Not in the 1.0 release,
though. (I _think_ it's worth the complexity? Obviously only if there's
a config option to not do that...)

However, when researching deflate I read the zlib source, and the
info-zip source, and the plan 9 source, and three different "tiny"
implementations (the _least_ useful of which was miniz.c, classic
example of the kind of code shrinkage tricks I'm trying to _avoid_...)
Post by enh
anyway, let me know whether you'd like to merge stuff like this into
the main codebase. otherwise i can just "git rm" locally and add the
alternative version to toys/android.
Toybox commands can all be switched off. Any command you've got a better
implementation of (for any metric of better), feel free to switch them
off. I'd very much like to _improve_ toybox's version until you feel
it's the better one, but "we audited this other codebase already"
Post by enh
i'll get a delete/merge conflict
if you change anything in your version so i'll be able to track
changes, so it's only really a loss if you think you have other users
who'd prefer to use openssl.
Um, issue to be aware of: the subdirectories are just a developer
convenience, the command namespace is actually flat. So if you have a
NEWTOY(sha1sum) in toys/lsb and another NEWTOY(sha1sum) in toys/android,
the build will break when it hits the duplicate command name.

(Actually since you're not using our build infrastructure you can
probably just ignore that, and point your .mk files at the right .c
files for what you're building... :)
Post by enh
Post by Rob Landley
P.S. At a design level I thought about defaulting it "n" but the
defconfig y/n signalling primarily indicates "is this done or not" and
it was finished and worked fine, so... (Well, the examples directory
also has stuff that defaults to "n" but factor isn't really a
demonstration of how to use the toybox infrastructure either.) And
defaulting "n" for other reasons is editorializing, where does it stop?
rev and tac? fallocate? makedevs? freeramdisk? partprobe? People _asked_
me to add most of those, because they needed them. If somebody want to
make a .config file selecting a subset of the commands, you can do that.
It's not my job to guess how people will use generic tools.
yeah, i was hoping to abdicate responsibility for subsetting and was
disappointed to find that 'default' didn't mean "you probably want
this". but it makes sense, and the subset that one project needs isn't
necessarily going to be the same as any other project.
Indeed. Something I learned back when I maintained busybox: don't try to
guess how people will use a hammer. You'll only get in the way.
Post by enh
it sucks to be me though. the best i can aim for is to try to ensure
that there are roughly the same number of people complaining i put too
much in as people complaining i left too much out :-)
Oh I've still got that, just at a different level. "Should include this
command or not". (You're entirely right "factor" was a questionable call
there. It was sort of on the line even after I wrote it. I just cleaned
up "mix.c" which is another one. Deciding whether to merge that I was
looking at the aumix man page and going "this is simpler, but that's
more standard, but nobody's _asked_ for the bigger one yet and that's
mostly about curses mode instead of command line, and this seems to do
the minimum you need...")

So much easier when there's a standards document to blame. (Of course I
vetoed like 1/3 of the posix command list anyway. Nobody needs sccs in
2014.)
Post by enh
it's a pity the debian popularity contest only has per-package data
(https://qa.debian.org/popcon.php?package=coreutils). if you ask
people they always tell you they use everything "all the time". even
if you broke it two releases ago and removed it one release ago.
Have you read toybox's roadmap.html page? It may not meet your needs but
at least I have _reason_ for listing the commands I did. :)

Always happy to have another viewpoint to rejuggle the weightings...

When I get my darn server reinstalled and get AOSP on it, I want to run
the AOSP build with the toybox commands. (Aboriginal Linux is using an
old version of linux from scratch as a bootstrapping test, but android's
build needs more commands than that. And may use command line options
that toybox doesn't implement yet. I know _you_ aren't trying to get
android self-hosting anytime soon, but I still am. :)
Post by enh
to work out which options are important for the commands that toolbox
and toybox have in common, i've been relying on my command-line
history, what i can find in scripts, and whether someone cared enough
to add/fix something. but i don't yet have a plan for all the stuff in
toys/pending.
My plan is to clean them up (the way I did the other cleanup.html
things) and get them out of pending.

It's surprisingly time consuming, but if you read through the history of
one of the cleanups I documented there, you can see why...
Post by enh
i also haven't thought much about "in the binary" versus
"gets a symlink"; i suspect that the "too much" camp will be further
subdivided into those who're offended by the binary size and those
who're offended by the number of symlinks in /system/bin.
I don't understand the distinction here? (Is your build making
standalone binaries for the toybox commands ala scripts/single.sh? It
didn't look like it was but I have to stare at makefiles a lot to beat
any sense out of 'em...)
Post by enh
and getting back to factor, i can't decide whether having it paints a
target on my back or gives me something i don't care about to throw
under the bus as a gesture of goodwill :-)
Politics, I can't help you with. :)

Rob
enh
2014-12-30 06:24:17 UTC
Permalink
Post by Rob Landley
Post by enh
Post by Rob Landley
Post by enh
Post by Rob Landley
The version I checked in won't error out for 'factor ""' or 'factor "36 "'
the way Ubuntu's will, but I think I'm ok with that...?
out of curiosity, what practical use is there for factor? even the
coreutils version gives up around 38 decimal digits, and it's pretty
slow even with numbers that small.
I was reading http://www.muppetlabs.com/~breadbox/txt/rsa.html#14 on a
long bus ride, because I probably have to implement TLS someday (by
which I mean https not thread local storage) because wget can't talk to
the world without encryption anymore (thanks NSA), and the section I
linked to above used "factor", and I went "that's a command? Apparently
so. This is probably like a dozen lines to implement"... and had it
working before the end of the bus ride.
speaking of which (and going back to "simple is complex"), i have an
openssl- (or boringssl-)based md5sum/sha1sum implementation that adds
all the other shas too. (a toybox built with all these is actually a
couple of hundred bytes larger than the one with just md5/sha1sum, but
that's because of the duplicated help strings.)
Actually the main reason I don't include external code is licensing.
Last year I gave two talks about how I went from GPL fanboy to advocate
of the public domain. I didn't quite fit either of them in the assigned
https://archive.org/download/OhioLinuxfest2013/24-Rob_Landley-The_Rise_and_Fall_of_Copyleft.mp3
The current toybox license places the code into the public domain. It
_looks_ like a BSD license, and I sometimes call it "zero clause BSD"
because of this, but the requirement to copy this specific license text
into derivative works is absent. This means it's a permission grant that
allows reusing the code without even attributing it. (Attribution is
_polite_, but it's possible to plagiaraize shakespeare. That's not a
licensing issue, and these days Google makes it pretty easy for teachers
to catch all those recycled term papers anyway.)
The problem with BSD-style licenses is that there are a lot of them (2
clause BSD, 3 clause BSD, 4 clause BSD, ISC, MIT, Apache, and so on)
that all try to do the same thing but all of them say "you must copy
this specific wording into your derived work", so if you combine code
from two sources under different BSD variants you wind up concatenating
the licenses, and this can get epically silly (the kindle paperwhite's
about->licenses thing is over 300 pages of concatenated license
boilerplate.)
tell me about it.
https://android.googlesource.com/platform/bionic/+/master/libc/NOTICE
Post by Rob Landley
I respect BSD/ISC/Apache license terms enough _not_ to treat them as
public domain. I would like toybox to provide a source of reusable
public domain code, it's one of the goals of the project.
Toybox has included explicitly public domain code from external sources
(such as the xz implementation for toys/pending/xzcat.c), and I've
looked at the libtom bignum library for implementing bc (haven't managed
to make much sense of it, to be honest). But I recently turned down a
ping.c submission that was based on BSD ping, in favor of writing my own.
Post by enh
i know one of your goals is to minimize dependencies,
I'm juggling an awful lot of conflicting goals. (Most of them listed on
the roadmap or design pages.)
Because of this, toybox is probably going to implement more than a lot
of users need, but as long as the commands are self-contained you can
switch off any command in your config that you don't want to ship.
If, for auditing reasons, you don't want to use toybox's sha1sum but
instead want to use an openssl derived version that shares code with
other sha1sum instances you've already cleared and are using elsewhere,
then that's what makes sense for your deployment. (If you grow to trust
toybox's version later, you can swich to it then after everybody else
has looked at it longer.)
yeah, my question really is whether you want me to send patches like
that to the list, or just keep them downstream.
Post by Rob Landley
Post by enh
but for us the
goal of minimizing duplication (and thus amount of code to audit) is
probably stronger. i suspect no one really cares that the toybox
hashes are slower than the openssl ones, but the security folks
probably will care about having another TLS implementation.
Indeed, and I agree. I don't _want_ to write TLS, I think it's out of
scope for toybox... except that I need the functionality to do basic web
transactions that _are_ in scope. (The internet's changing out from
under me. Two years ago you could talk to github, kernel.org, and
twitter without https. Now if you try they redirect.)
What I really want is an "stunnel" variant that works, so I can pipe an
https session through something that encrypts it for me.
https://www.stunnel.org/index.html
I tried to convince dropbear to add one years ago, but their reply was
more or less "patches welcome".
http://lists.ucc.gu.uwa.edu.au/pipermail/dropbear/2007q1/000506.html
http://lists.ucc.gu.uwa.edu.au/pipermail/dropbear/2008q4/000859.html
I prefer not to link toybox against external libraries (I could give a
long talk about why, but not here), and sucking in nontrivial amounts of
external code to maintain a local copy has its own large downsides. But
calling reasonably standardized external commands and piping stuff
through them? I'm all for it.
In fact toybox commands are designed to be able to call external
versions of commands even when toybox has its own implementation. That's
why mount.c doesn't check if CFG_LOSETUP is enabled before trying to
xpopen("losetup"), if it's there in the $PATH but not in this binary, ok
then.
As for the md5/sha1/sha256/sha3, they're easy to test (their failures
tend to be really obvious), and the two I implemented are inherently
timing invariant and don't have obvious sidechannel attacks. And I _can_
http://cpansearch.perl.org/src/BJOERN/Compress-Deflate7-1.0/7zip/C/Sha256.c
So adding the other hashing functions to toybox makes sense to me,
especially since I need them for a traditional /etc/shadow login.c. (I
need to research android's user database and how to access it.)
i can save you some time there: there isn't one. bionic's getpwnam and
friends will do the right thing, though, so toybox's id works fine.
(the patch i sent you fixes bugs that affect id on the desktop too,
nothing Android-specific.)
Post by Rob Landley
That said, I _do_ care that they're slower than other implementations.
That's a simple vs fast balance that's... I took the first speedup
patch, didn't take the second speedup patch, and I need to go back and
look at it...
Post by enh
(and
things like reimplementing zlib and bunzip2 probably fall somewhere in
between.)
One of the goals I'm juggling is "busybox replacement", and they have
this stuff. But again that's just a weighting, busybox alrady contains a
lot of stuff we're _not_ implementing.
If I was starting from scratch today I might leave them out, but I have
a history with both bzip2 and gzip which makes it easier for me to keep
both of them in scope.
The one we really _need_ is deflate/inflate, because we should have a
compression algorithm and that's the simplest and most lightweight one.
The extract side of the other two are there because tarballs come in
that format and a build environment needs to be able to extract them
(another goal I'm juggling. The strace source is _only_ available as .xz
these days, for example.)
But I probably won't bother with the compression side of bzip2 or xz. If
you want to create a new tarball we support gzip and if you want it in
those other formats you can install the other package.
I reimplemented bunzip2 years ago because the original was horrid, and
my implementation got sucked up into a bunch of places. (I think the
kernel uses it if you select bzip compression, although these days gzip
or xz are the dominant ones.)
I also wrote 90% of bzip2 compression side support a decade back for
busybox, but got distracted near the end and never got back to it
http://lists.busybox.net/pipermail/busybox/2004-February/010859.html
Even _with_ most of the work done I probably won't bother with bzip2
compression side unless somebody really wants it, both because it's
semi-obsolete these days and because its compression is based on weird
heuristics for the string sorting that I've never managed to clean up
into something understandable. (The "crap.c" above, which is a series of
fallbacks between different sorting algorithms with no explanation of
_why_.) I _can't_ simplify this into something easy to understand that
somebody might want to use as example code in a middle school
programming class, the algorithm is just inherently nuts.
I already did gunzip a few months ago, and I'm working on gzip
compression side support now. I wrote a java implementation of that back
when Java 1.0 didn't include it in the base library. (Java 1.1 came out
before I did the decompression side that time, so I moved on to other
things.) I took info-zip apart back when I as programming for OS/2, I
actually know that one pretty well. So that's probably the only
compressor I'll implement, when it's done it shuld be less than 500
lines of code. Also, Ashwini Sharma asked me to prioritize that so they
can use it in a product.
As for xz: I received an external contribution based on the public
domain decompressor. The "fetch tarball, extract, configure, make,
install" codepath needs to be able to extract them, and the code's
already in. (And is horrible, there's built-in knowledge of various
processor machine language formats, which strongly implies upgrades will
need more of this filigree for new processor variants.)
But I don't particularly want to do the compression side for that.
If you decide to switch off our bunzip2 and use the external version
instead, toybox "tar" should call out to it and pipe stuff through it
just fine. (I dunno if it currently _does_, but once I've cleaned it up...)
Post by enh
in this specific case the openssl API is reasonable enough your
implementations could be a drop-in replacement, but i suspect in other
cases part of your motivation for writing your own will have been the
awful API.
Part, yes. But only part. I mentioned licensing above. There's also the
fact that I can often come up with objectively better code.
http://git.busybox.net/busybox/tree/archival/libunarchive/decompress_bunzip2.c?id=6fe55ae93983
http://git.busybox.net/busybox/tree/archival/libunarchive/decompress_bunzip2.c?id=0d6d88a2058d
That's not just replacing 1658 lines with 531 lines: try actualy reading
the old code. Contemplate the "save state" and the big switch/case in
the main function starting at line 395. (They copy all the local
variables out of a structure, each call, and copy them back before
returning. They use a switch/case with labels covering the whole
function so they can to jump back into the middle of nested loops.
That's so it could return when it ran out of data and be called to
resume decompressing once the buffer was filled. I replaced that with a
get_bits() call that had the filehandle stored away and could read more
data if it needed to.)
Yes, that's Julian Sewards bunzip2 code. That wasn't something toybox
did to it, that's what the upstream package they copied had always been
like.
A more recent case where I shrake a codebase to 1/3 of its original
http://landley.net/toybox/cleanup.html#ifconfig
The "old" and "new" lines with the totals are links to the original and
changed file. I described each change on the mailing list, and collected
links to all the descriptions on that page. You might want to read just
http://lists.landley.net/pipermail/toybox-landley.net/2013-April/000882.html
Note: the ifconfig I received was a professional contribution from a
team of experienced coders, and what they sent me did work. I'm just...
picky.
Post by enh
also in this specific case there's almost no sharing
between the implentations anyway because 99% of the code is the hash
implementation itself. but if you can, keeping API compatibility with
the library you're trying to replace would be good.
I've pondered adding zlib bindings for deflate/inflate when I get them
done, but that's a post-1.0 thing.
I note that one of my first interactions with Rich Felker (the musl
maintainer) was him explaining to me what would be involved in making an
executable also be a shared library (so you can have libz.so be a
symlink to busybox so -lz was satisfied with the busybox code). Google
http://lists.uclibc.org/pipermail/busybox/2006-April/054373.html
Busybox never did that, but toybox might. Not in the 1.0 release,
though. (I _think_ it's worth the complexity? Obviously only if there's
a config option to not do that...)
However, when researching deflate I read the zlib source, and the
info-zip source, and the plan 9 source, and three different "tiny"
implementations (the _least_ useful of which was miniz.c, classic
example of the kind of code shrinkage tricks I'm trying to _avoid_...)
Post by enh
anyway, let me know whether you'd like to merge stuff like this into
the main codebase. otherwise i can just "git rm" locally and add the
alternative version to toys/android.
Toybox commands can all be switched off. Any command you've got a better
implementation of (for any metric of better), feel free to switch them
off. I'd very much like to _improve_ toybox's version until you feel
it's the better one, but "we audited this other codebase already"
Post by enh
i'll get a delete/merge conflict
if you change anything in your version so i'll be able to track
changes, so it's only really a loss if you think you have other users
who'd prefer to use openssl.
Um, issue to be aware of: the subdirectories are just a developer
convenience, the command namespace is actually flat. So if you have a
NEWTOY(sha1sum) in toys/lsb and another NEWTOY(sha1sum) in toys/android,
the build will break when it hits the duplicate command name.
(Actually since you're not using our build infrastructure you can
probably just ignore that, and point your .mk files at the right .c
files for what you're building... :)
yes, but i think some of the scripts get confused, plus it's good for
me to be able to build and test the desktop version of toybox too if
i'm sending you patches. plus removing it locally means git will
complain loudly if something changes upstream, so i'll be able to keep
track of what's going on.
Post by Rob Landley
Post by enh
Post by Rob Landley
P.S. At a design level I thought about defaulting it "n" but the
defconfig y/n signalling primarily indicates "is this done or not" and
it was finished and worked fine, so... (Well, the examples directory
also has stuff that defaults to "n" but factor isn't really a
demonstration of how to use the toybox infrastructure either.) And
defaulting "n" for other reasons is editorializing, where does it stop?
rev and tac? fallocate? makedevs? freeramdisk? partprobe? People _asked_
me to add most of those, because they needed them. If somebody want to
make a .config file selecting a subset of the commands, you can do that.
It's not my job to guess how people will use generic tools.
yeah, i was hoping to abdicate responsibility for subsetting and was
disappointed to find that 'default' didn't mean "you probably want
this". but it makes sense, and the subset that one project needs isn't
necessarily going to be the same as any other project.
Indeed. Something I learned back when I maintained busybox: don't try to
guess how people will use a hammer. You'll only get in the way.
Post by enh
it sucks to be me though. the best i can aim for is to try to ensure
that there are roughly the same number of people complaining i put too
much in as people complaining i left too much out :-)
Oh I've still got that, just at a different level. "Should include this
command or not". (You're entirely right "factor" was a questionable call
there. It was sort of on the line even after I wrote it. I just cleaned
up "mix.c" which is another one. Deciding whether to merge that I was
looking at the aumix man page and going "this is simpler, but that's
more standard, but nobody's _asked_ for the bigger one yet and that's
mostly about curses mode instead of command line, and this seems to do
the minimum you need...")
So much easier when there's a standards document to blame. (Of course I
vetoed like 1/3 of the posix command list anyway. Nobody needs sccs in
2014.)
(i initially included uuencode/uudecode until one of the other guys on
the team asked what they were. turned out only me and the second
oldest guy had even heard of them...)
Post by Rob Landley
Post by enh
it's a pity the debian popularity contest only has per-package data
(https://qa.debian.org/popcon.php?package=coreutils). if you ask
people they always tell you they use everything "all the time". even
if you broke it two releases ago and removed it one release ago.
Have you read toybox's roadmap.html page? It may not meet your needs but
at least I have _reason_ for listing the commands I did. :)
one thing i was curious about was what busybox configurations tend to
get used in the wild. your roadmap implies that you looked at that,
but i didn't see where.
Post by Rob Landley
Always happy to have another viewpoint to rejuggle the weightings...
i'm assuming we (Android) will get some feedback eventually.
Post by Rob Landley
When I get my darn server reinstalled and get AOSP on it, I want to run
the AOSP build with the toybox commands. (Aboriginal Linux is using an
old version of linux from scratch as a bootstrapping test, but android's
build needs more commands than that. And may use command line options
that toybox doesn't implement yet. I know _you_ aren't trying to get
android self-hosting anytime soon, but I still am. :)
yeah, my builds are more than slow enough even on the fastest desktop
hardware :-)
Post by Rob Landley
Post by enh
to work out which options are important for the commands that toolbox
and toybox have in common, i've been relying on my command-line
history, what i can find in scripts, and whether someone cared enough
to add/fix something. but i don't yet have a plan for all the stuff in
toys/pending.
My plan is to clean them up (the way I did the other cleanup.html
things) and get them out of pending.
It's surprisingly time consuming, but if you read through the history of
one of the cleanups I documented there, you can see why...
Post by enh
i also haven't thought much about "in the binary" versus
"gets a symlink"; i suspect that the "too much" camp will be further
subdivided into those who're offended by the binary size and those
who're offended by the number of symlinks in /system/bin.
I don't understand the distinction here? (Is your build making
standalone binaries for the toybox commands ala scripts/single.sh? It
didn't look like it was but I have to stare at makefiles a lot to beat
any sense out of 'em...)
no, there's one binary that contains n toys, and then m symbolic
links, where n != m. think of the ones that don't have links as my
level of "pending"ness :-) though it's ill-defined what not having a
link means. sometimes it's "i think we'll want this, but i haven't
checked it works for us yet", sometimes "i think this probably doesn't
make any sense on Android, but it's in the default set and i haven't
yet been convinced to kick it out".

but my earlier point was that no one is likely to look too closely at
what's in the toybox binary, especially as long as it's roughly the
same size as the toolbox one used to be. but if they start stumbling
across symbolic links for things they think are a waste of space,
they're more likely to get on my case about it.
Post by Rob Landley
Post by enh
and getting back to factor, i can't decide whether having it paints a
target on my back or gives me something i don't care about to throw
under the bus as a gesture of goodwill :-)
Politics, I can't help you with. :)
Rob
Rob Landley
2014-12-30 19:22:57 UTC
Permalink
Post by enh
Post by Rob Landley
The problem with BSD-style licenses is that there are a lot of them (2
clause BSD, 3 clause BSD, 4 clause BSD, ISC, MIT, Apache, and so on)
that all try to do the same thing but all of them say "you must copy
this specific wording into your derived work", so if you combine code
from two sources under different BSD variants you wind up concatenating
the licenses, and this can get epically silly (the kindle paperwhite's
about->licenses thing is over 300 pages of concatenated license
boilerplate.)
tell me about it.
https://android.googlesource.com/platform/bionic/+/master/libc/NOTICE
I gave a talk about this topic at Ohio LinuxFest last year:

https://archive.org/download/OhioLinuxfest2013/24-Rob_Landley-The_Rise_and_Fall_of_Copyleft.mp3

Public domain licenses collapse together. Whether it's unlicense.org or
http://creativecommons.org/about/cc0 or my "zero clause bsd" or
https://android.googlesource.com/platform/external/dropbear/+/froyo/libtomcrypt/LICENSE
the result is the same: you can take code and use it and your result can
be under _one_ easily understandable license.

None of the BSD variants actually accomplish this, they're so busy
protecting themselves they develop an autoimmune problem.
Post by enh
Post by Rob Landley
If, for auditing reasons, you don't want to use toybox's sha1sum but
instead want to use an openssl derived version that shares code with
other sha1sum instances you've already cleared and are using elsewhere,
then that's what makes sense for your deployment. (If you grow to trust
toybox's version later, you can swich to it then after everybody else
has looked at it longer.)
yeah, my question really is whether you want me to send patches like
that to the list, or just keep them downstream.
I'm all for seeing patches on the list. Even if I don't wind up merging
them it means there's an archive of them.

Speaking of which, I just got reminded on twitter that there _is_ a
gmane archive of toybox already, so I added a link to the web page:

http://news.gmane.org/gmane.linux.toybox

Their web view screws up patches (replacing @ with <at>) so I can't cut
and paste 'em, but I'll live.

(One advantage of two space indents instead of tab characters: applying
patches with cut and paste gets much easier and I almost never have to
do "mixed tab and space" policing because you need 4 levels deep
indenting before you _could_ use a tab with default tabstops...)
Post by enh
Post by Rob Landley
As for the md5/sha1/sha256/sha3, they're easy to test (their failures
tend to be really obvious), and the two I implemented are inherently
timing invariant and don't have obvious sidechannel attacks. And I _can_
http://cpansearch.perl.org/src/BJOERN/Compress-Deflate7-1.0/7zip/C/Sha256.c
So adding the other hashing functions to toybox makes sense to me,
especially since I need them for a traditional /etc/shadow login.c. (I
need to research android's user database and how to access it.)
i can save you some time there: there isn't one. bionic's getpwnam and
friends will do the right thing, though, so toybox's id works fine.
(the patch i sent you fixes bugs that affect id on the desktop too,
nothing Android-specific.)
Ok. I note that lib/login.c was an external contribution, and I was
uncomfortable with it precisely _because_ I think we've got to delegate
this to libc and punt in the case of android.
Post by enh
Post by Rob Landley
Um, issue to be aware of: the subdirectories are just a developer
convenience, the command namespace is actually flat. So if you have a
NEWTOY(sha1sum) in toys/lsb and another NEWTOY(sha1sum) in toys/android,
the build will break when it hits the duplicate command name.
(Actually since you're not using our build infrastructure you can
probably just ignore that, and point your .mk files at the right .c
files for what you're building... :)
yes, but i think some of the scripts get confused, plus it's good for
me to be able to build and test the desktop version of toybox too if
i'm sending you patches. plus removing it locally means git will
complain loudly if something changes upstream, so i'll be able to keep
track of what's going on.
I have a pending change that I'm trying to figure out how to do in a way
that won't screw you up. (Or just a clean way in general, it's... awkward.)

The FLAG_x macros are auto-generated from the strings, based on the
current configuration. The option string that gets parsed has USE_BLAH()
macros that chop out disabled bits, the bit positions of each macro
shift when options are configured out. (The bit namespace is always
efficiently packed.)

The flag parsing infrastructure handles this, comparing the
"allyesconfig" and current .config versions of the strings at build time
and generating "#define FLAG_x 0" entries for the disabled flags. this
means if you do "if (toys.optflags & FLAG_x) blah();" for a disabled
macro, it becomes & 0 which is always zero due to simple constant
propogtation (required by c99!) and dead code elimination (which even
Turbo C for dos got right) kicks in to remove the unused code. (It's
always syntax checked at compile time so you don't have config-dependent
build breaks, but it drops out of the binary when not in use.)

Problem: some commands have shared infrastructure outside of lib/*.c,
and some of them (mv/cp, kill/killall5, and lots of others) have
overlapping option ranges where _both_ command want to support these
options, using common code.

What I _used_ to do was make the MV config symbol "depends on CP" and
then I had the common infrastructure work in CP's flag context. But that
screwed up the standalone builds ("scripts/single.sh COMMAND" which is
also used by scripts/test.sh to test individual commands. ("make tests"
builds the current toybox .config and tests the multiplexer binary, but
the individual tests build standalone).

The standalone builds do not_ have any other commands enabled. If you
give them NAME they enable CONFIG_NAME and CONFIG_NAME_* (and a
selection of CONFIG_TOYBOX_* symbols that should be less handcoded than
it currently is). So if you "scripts/test.sh mv" it builds a broken mv
that improperly handles the options shared with cp.

The _fix_ to all this is to redo the option parsing infrastructure to
leave gaps for the zeroed FLAG macros, and have a #define FORCE_FLAGS I
can set when doing "#define FOR_COMMAND\n#include <toys.h>" that will
enable the flags for _all_ options in this command, even the currently
disabled ones.

The problem is, those option strings I mentiond above with the USE()
macros in them still drop out the disabled options, so when the
lib/args.c infrastructure parses them it gets the sparse namespace and
now the bit positions don't line up.

So I have to have scripts/mkflags.c create a _second_ option string,
based on the current configuration, with a marker character (I'm using
ascii 1, I.E. "\001") to show the gaps where it needs to skip a bit. (I
can't use the original string verbatim because I still want "cp -l" to
reject the option and give a help message when you build it in posix
options only mode (without CONFIG_CP_MORE).) This involves tweaking both
mkflags.c and lib/args.c to know about the new signaling, but eh: not a
big deal.

Next problem: a couple months back I sped up rebuilds by _not_ have
every build be a build all. and one of the ways I did that was by
figuring out which generated/headers.h files wouldn't change based on
config, and not rebuilding them. Both generated/newtoys.h and
generated/oldtoys.h are config invariant, and that's where these option
strings live.

So I need to have the cooked option string data live in a _third_ place
(the logical place is generated/flags.h which is the file
scripts/mkflags.c is already generating), out new macros with a new
prefix, and change main.c to stick that data into toy_list[] instead of
the strings in the actual generated/newtoys.h it's #including when it
generates that list, and _this_ is officially awkward.

Did I mention that automating things so they "just work" is REALLY HARD?
I know projects develop scar tissue as they go along, but I've always
tried to go back and spend the time to clean UP the design even if it
required a large intrustive change. Having option strings occur three
times (even in generated headers) is silly.

But... all three of 'em serve a purpose. The generated/newtoys.h is the
collection of raw data grepped out of toys/*/*.c and used as the master
to generate the other two instances. The generated/oldtoys.h has to be
created by C because I've never managed to get a macro to expand to a
preprocessor directive that gets invoked, so I can't add #defines at
compile time. (Well, I could have scripts/make.sh do something horrible
on the gcc command line spamming -D, but that's not an improvement and
would render both "make V=1" and generated/build.sh just about
unintelligible.)

And trying to redefine USE() macros to convert options strings to
spacers? Even if the preprocessor has some kind of string processing
facility (which it doesn't seem to), USE_BLAH("a(longopt)") is not
trivial to parse.

(Well... if I turned _all_ a-zA-Z into \001 the normal lib/args.c parser
could drop 'em out... except that an option is "anything not parsed as a
control characgter", I.E. the else case at the end of a big stack. There
are commands that want "-9" (such as gzip), and 9 is a control character
in the right context (ala "a#<9")...

So yeah, adding yet MORE crap to generated/flags.h and rejiggling the
option parsing infrastructure to take the flag lists from there and
adding a GREAT BIG COMMENT to main.c because "populate toy_list[]"
doesn't cut it anymore, and explain this in code.html on the website...

So yeah, heads up. You probably care about this one.

(When I do these things right nobody notices I've done _anything_. Which
is how it should be, but it's still hard.)
Post by enh
Post by Rob Landley
So much easier when there's a standards document to blame. (Of course I
vetoed like 1/3 of the posix command list anyway. Nobody needs sccs in
2014.)
(i initially included uuencode/uudecode until one of the other guys on
the team asked what they were. turned out only me and the second
oldest guy had even heard of them...)
Fun piece of trivia: the guy who submitted them to toybox is the lead
architect for the Qualcomm Hexagon processor. (As in the main hardware
guy who designs the actual chip.)
Post by enh
Post by Rob Landley
Post by enh
it's a pity the debian popularity contest only has per-package data
(https://qa.debian.org/popcon.php?package=coreutils). if you ask
people they always tell you they use everything "all the time". even
if you broke it two releases ago and removed it one release ago.
Have you read toybox's roadmap.html page? It may not meet your needs but
at least I have _reason_ for listing the commands I did. :)
one thing i was curious about was what busybox configurations tend to
get used in the wild. your roadmap implies that you looked at that,
but i didn't see where.
After the FSF zealots flamed Tim Bird and Sony to a crisp:

https://lwn.net/Articles/478308/

The various corporations interested in toybox have been VERY quiet about
it. They email me off-list and read me the Mission Impossible speech
about the secretary denying all knowledge if we're caught and their
involvment self destructing in 5 seconds if so.

I tried to correct the record in the comments, ala
https://lwn.net/Articles/480382/ and https://lwn.net/Articles/480836/
but it didn't help. The FSF is a group of political zealots bordering on
a religion, the old line applies about how you can't use rational
argument to talk someone out of a position they didn't arrive at
rationally in the first place. The best I can do is try to prevent the
next generation from following them down the rathole (hence the Ohio
Linufest talk, which I really need to redo. I had 3 hours of material
and a little under an hour to speak, and that was _after_ spending a
week trying to edit it down.)

So I've gotten a lot of off the record emails with very interesting
data, and various people going "we really need this, can't say why" and
I do my best with the information I've got to keep the project on track...
Post by enh
Post by Rob Landley
Always happy to have another viewpoint to rejuggle the weightings...
i'm assuming we (Android) will get some feedback eventually.
Sorry, sorry. The mailing list archive being out of whack really screwed
up my normal working style, I wash my email through gmail for the spam
filtering, but that drops out all dupliciate messages so if I'm cc'd on
a message I get _either_ the inbox copy _or_ the mailing list copy with
the list-id tag, so neither folder has the complete conversation and if
I try for threaded view all the threads are broken because the message
they're replying to is in the other folder. Complaining to the gmail
guys was a brick wall, so I started using the web archive to keep track
of stuff.

(Plus I've got a bunch of things like sed debugging and the above
command line redesign that take a lot of careful study to get right, and
when I pop my head up I have this giant backlog. (I still haven't dealt
with Ashwini's giant pile from october.)

Yesterday got eaten by kernel issues (they put PERL back in the 3.18
kernel build and I had to rip it out again). I have high hopes for today. :)
Post by enh
Post by Rob Landley
When I get my darn server reinstalled and get AOSP on it, I want to run
the AOSP build with the toybox commands. (Aboriginal Linux is using an
old version of linux from scratch as a bootstrapping test, but android's
build needs more commands than that. And may use command line options
that toybox doesn't implement yet. I know _you_ aren't trying to get
android self-hosting anytime soon, but I still am. :)
yeah, my builds are more than slow enough even on the fastest desktop
hardware :-)
This netbook hasn't even got the free disk space to try.
Post by enh
Post by Rob Landley
Post by enh
i also haven't thought much about "in the binary" versus
"gets a symlink"; i suspect that the "too much" camp will be further
subdivided into those who're offended by the binary size and those
who're offended by the number of symlinks in /system/bin.
I don't understand the distinction here? (Is your build making
standalone binaries for the toybox commands ala scripts/single.sh? It
didn't look like it was but I have to stare at makefiles a lot to beat
any sense out of 'em...)
no, there's one binary that contains n toys, and then m symbolic
links, where n != m. think of the ones that don't have links as my
level of "pending"ness :-) though it's ill-defined what not having a
link means. sometimes it's "i think we'll want this, but i haven't
checked it works for us yet", sometimes "i think this probably doesn't
make any sense on Android, but it's in the default set and i haven't
yet been convinced to kick it out".
but my earlier point was that no one is likely to look too closely at
what's in the toybox binary, especially as long as it's roughly the
same size as the toolbox one used to be. but if they start stumbling
across symbolic links for things they think are a waste of space,
they're more likely to get on my case about it.
Makes sense.

Rob
enh
2014-12-30 20:14:54 UTC
Permalink
Post by Rob Landley
Post by enh
Post by Rob Landley
The problem with BSD-style licenses is that there are a lot of them (2
clause BSD, 3 clause BSD, 4 clause BSD, ISC, MIT, Apache, and so on)
that all try to do the same thing but all of them say "you must copy
this specific wording into your derived work", so if you combine code
from two sources under different BSD variants you wind up concatenating
the licenses, and this can get epically silly (the kindle paperwhite's
about->licenses thing is over 300 pages of concatenated license
boilerplate.)
tell me about it.
https://android.googlesource.com/platform/bionic/+/master/libc/NOTICE
https://archive.org/download/OhioLinuxfest2013/24-Rob_Landley-The_Rise_and_Fall_of_Copyleft.mp3
Public domain licenses collapse together. Whether it's unlicense.org or
http://creativecommons.org/about/cc0 or my "zero clause bsd" or
https://android.googlesource.com/platform/external/dropbear/+/froyo/libtomcrypt/LICENSE
the result is the same: you can take code and use it and your result can
be under _one_ easily understandable license.
None of the BSD variants actually accomplish this, they're so busy
protecting themselves they develop an autoimmune problem.
Post by enh
Post by Rob Landley
If, for auditing reasons, you don't want to use toybox's sha1sum but
instead want to use an openssl derived version that shares code with
other sha1sum instances you've already cleared and are using elsewhere,
then that's what makes sense for your deployment. (If you grow to trust
toybox's version later, you can swich to it then after everybody else
has looked at it longer.)
yeah, my question really is whether you want me to send patches like
that to the list, or just keep them downstream.
I'm all for seeing patches on the list. Even if I don't wind up merging
them it means there's an archive of them.
Speaking of which, I just got reminded on twitter that there _is_ a
http://news.gmane.org/gmane.linux.toybox
and paste 'em, but I'll live.
(One advantage of two space indents instead of tab characters: applying
patches with cut and paste gets much easier and I almost never have to
do "mixed tab and space" policing because you need 4 levels deep
indenting before you _could_ use a tab with default tabstops...)
Post by enh
Post by Rob Landley
As for the md5/sha1/sha256/sha3, they're easy to test (their failures
tend to be really obvious), and the two I implemented are inherently
timing invariant and don't have obvious sidechannel attacks. And I _can_
http://cpansearch.perl.org/src/BJOERN/Compress-Deflate7-1.0/7zip/C/Sha256.c
So adding the other hashing functions to toybox makes sense to me,
especially since I need them for a traditional /etc/shadow login.c. (I
need to research android's user database and how to access it.)
i can save you some time there: there isn't one. bionic's getpwnam and
friends will do the right thing, though, so toybox's id works fine.
(the patch i sent you fixes bugs that affect id on the desktop too,
nothing Android-specific.)
Ok. I note that lib/login.c was an external contribution, and I was
uncomfortable with it precisely _because_ I think we've got to delegate
this to libc and punt in the case of android.
at the moment, pw_passwd is always NULL on Android. if it makes your
life easier for it to point to "*" or whatever, let me know, but one
problem we have in places like this -- and one reason i haven't even
bothered with <shadow.h> or <utmpx.h> -- is that in some ways code
that tries to use this stuff is better off if it just doesn't build,
because at least then the author/builder knows they need to sit down
and think about what they're trying to do and what, if anything, that
means on Android.
Post by Rob Landley
Post by enh
Post by Rob Landley
Um, issue to be aware of: the subdirectories are just a developer
convenience, the command namespace is actually flat. So if you have a
NEWTOY(sha1sum) in toys/lsb and another NEWTOY(sha1sum) in toys/android,
the build will break when it hits the duplicate command name.
(Actually since you're not using our build infrastructure you can
probably just ignore that, and point your .mk files at the right .c
files for what you're building... :)
yes, but i think some of the scripts get confused, plus it's good for
me to be able to build and test the desktop version of toybox too if
i'm sending you patches. plus removing it locally means git will
complain loudly if something changes upstream, so i'll be able to keep
track of what's going on.
I have a pending change that I'm trying to figure out how to do in a way
that won't screw you up. (Or just a clean way in general, it's... awkward.)
The FLAG_x macros are auto-generated from the strings, based on the
current configuration. The option string that gets parsed has USE_BLAH()
macros that chop out disabled bits, the bit positions of each macro
shift when options are configured out. (The bit namespace is always
efficiently packed.)
The flag parsing infrastructure handles this, comparing the
"allyesconfig" and current .config versions of the strings at build time
and generating "#define FLAG_x 0" entries for the disabled flags. this
means if you do "if (toys.optflags & FLAG_x) blah();" for a disabled
macro, it becomes & 0 which is always zero due to simple constant
propogtation (required by c99!) and dead code elimination (which even
Turbo C for dos got right) kicks in to remove the unused code. (It's
always syntax checked at compile time so you don't have config-dependent
build breaks, but it drops out of the binary when not in use.)
Problem: some commands have shared infrastructure outside of lib/*.c,
and some of them (mv/cp, kill/killall5, and lots of others) have
overlapping option ranges where _both_ command want to support these
options, using common code.
What I _used_ to do was make the MV config symbol "depends on CP" and
then I had the common infrastructure work in CP's flag context. But that
screwed up the standalone builds ("scripts/single.sh COMMAND" which is
also used by scripts/test.sh to test individual commands. ("make tests"
builds the current toybox .config and tests the multiplexer binary, but
the individual tests build standalone).
The standalone builds do not_ have any other commands enabled. If you
give them NAME they enable CONFIG_NAME and CONFIG_NAME_* (and a
selection of CONFIG_TOYBOX_* symbols that should be less handcoded than
it currently is). So if you "scripts/test.sh mv" it builds a broken mv
that improperly handles the options shared with cp.
The _fix_ to all this is to redo the option parsing infrastructure to
leave gaps for the zeroed FLAG macros, and have a #define FORCE_FLAGS I
can set when doing "#define FOR_COMMAND\n#include <toys.h>" that will
enable the flags for _all_ options in this command, even the currently
disabled ones.
The problem is, those option strings I mentiond above with the USE()
macros in them still drop out the disabled options, so when the
lib/args.c infrastructure parses them it gets the sparse namespace and
now the bit positions don't line up.
So I have to have scripts/mkflags.c create a _second_ option string,
based on the current configuration, with a marker character (I'm using
ascii 1, I.E. "\001") to show the gaps where it needs to skip a bit. (I
can't use the original string verbatim because I still want "cp -l" to
reject the option and give a help message when you build it in posix
options only mode (without CONFIG_CP_MORE).) This involves tweaking both
mkflags.c and lib/args.c to know about the new signaling, but eh: not a
big deal.
Next problem: a couple months back I sped up rebuilds by _not_ have
every build be a build all. and one of the ways I did that was by
figuring out which generated/headers.h files wouldn't change based on
config, and not rebuilding them. Both generated/newtoys.h and
generated/oldtoys.h are config invariant, and that's where these option
strings live.
So I need to have the cooked option string data live in a _third_ place
(the logical place is generated/flags.h which is the file
scripts/mkflags.c is already generating), out new macros with a new
prefix, and change main.c to stick that data into toy_list[] instead of
the strings in the actual generated/newtoys.h it's #including when it
generates that list, and _this_ is officially awkward.
Did I mention that automating things so they "just work" is REALLY HARD?
I know projects develop scar tissue as they go along, but I've always
tried to go back and spend the time to clean UP the design even if it
required a large intrustive change. Having option strings occur three
times (even in generated headers) is silly.
But... all three of 'em serve a purpose. The generated/newtoys.h is the
collection of raw data grepped out of toys/*/*.c and used as the master
to generate the other two instances. The generated/oldtoys.h has to be
created by C because I've never managed to get a macro to expand to a
preprocessor directive that gets invoked, so I can't add #defines at
compile time. (Well, I could have scripts/make.sh do something horrible
on the gcc command line spamming -D, but that's not an improvement and
would render both "make V=1" and generated/build.sh just about
unintelligible.)
And trying to redefine USE() macros to convert options strings to
spacers? Even if the preprocessor has some kind of string processing
facility (which it doesn't seem to), USE_BLAH("a(longopt)") is not
trivial to parse.
(Well... if I turned _all_ a-zA-Z into \001 the normal lib/args.c parser
could drop 'em out... except that an option is "anything not parsed as a
control characgter", I.E. the else case at the end of a big stack. There
are commands that want "-9" (such as gzip), and 9 is a control character
in the right context (ala "a#<9")...
So yeah, adding yet MORE crap to generated/flags.h and rejiggling the
option parsing infrastructure to take the flag lists from there and
adding a GREAT BIG COMMENT to main.c because "populate toy_list[]"
doesn't cut it anymore, and explain this in code.html on the website...
So yeah, heads up. You probably care about this one.
(When I do these things right nobody notices I've done _anything_. Which
is how it should be, but it's still hard.)
while you're there, the one other thing i don't think you can
currently support is the +/- style used by things like lsof. though --
although my usual personal goal is "you shouldn't be able to tell the
difference" -- the world might be a better place if that kind of
nonsense were allowed to die.
Post by Rob Landley
Post by enh
Post by Rob Landley
So much easier when there's a standards document to blame. (Of course I
vetoed like 1/3 of the posix command list anyway. Nobody needs sccs in
2014.)
(i initially included uuencode/uudecode until one of the other guys on
the team asked what they were. turned out only me and the second
oldest guy had even heard of them...)
Fun piece of trivia: the guy who submitted them to toybox is the lead
architect for the Qualcomm Hexagon processor. (As in the main hardware
guy who designs the actual chip.)
huh. do you remember why? i don't see anything on the list except for
the fact that two people arrived with uuencode at the same time.
(probably the majority of Android devices right now have the Hexagon
DSP, so if this is something that's actually useful to those guys i
might want to put it back!)
Post by Rob Landley
Post by enh
Post by Rob Landley
Post by enh
it's a pity the debian popularity contest only has per-package data
(https://qa.debian.org/popcon.php?package=coreutils). if you ask
people they always tell you they use everything "all the time". even
if you broke it two releases ago and removed it one release ago.
Have you read toybox's roadmap.html page? It may not meet your needs but
at least I have _reason_ for listing the commands I did. :)
one thing i was curious about was what busybox configurations tend to
get used in the wild. your roadmap implies that you looked at that,
but i didn't see where.
https://lwn.net/Articles/478308/
The various corporations interested in toybox have been VERY quiet about
it. They email me off-list and read me the Mission Impossible speech
about the secretary denying all knowledge if we're caught and their
involvment self destructing in 5 seconds if so.
i've had positive private feedback from an external entity already,
expressed in the form of disappointment that it wasn't in L :-) they
weren't very helpful either when i asked them which toys they
want/need. maybe they're assuming that as long as the system is
expecting toybox, they can just edit the .config themselves?
Post by Rob Landley
I tried to correct the record in the comments, ala
https://lwn.net/Articles/480382/ and https://lwn.net/Articles/480836/
but it didn't help. The FSF is a group of political zealots bordering on
a religion, the old line applies about how you can't use rational
argument to talk someone out of a position they didn't arrive at
rationally in the first place. The best I can do is try to prevent the
next generation from following them down the rathole (hence the Ohio
Linufest talk, which I really need to redo. I had 3 hours of material
and a little under an hour to speak, and that was _after_ spending a
week trying to edit it down.)
So I've gotten a lot of off the record emails with very interesting
data, and various people going "we really need this, can't say why" and
I do my best with the information I've got to keep the project on track...
Post by enh
Post by Rob Landley
Always happy to have another viewpoint to rejuggle the weightings...
i'm assuming we (Android) will get some feedback eventually.
Sorry, sorry. The mailing list archive being out of whack really screwed
up my normal working style, I wash my email through gmail for the spam
filtering, but that drops out all dupliciate messages so if I'm cc'd on
a message I get _either_ the inbox copy _or_ the mailing list copy with
the list-id tag, so neither folder has the complete conversation and if
I try for threaded view all the threads are broken because the message
they're replying to is in the other folder. Complaining to the gmail
guys was a brick wall, so I started using the web archive to keep track
of stuff.
(Plus I've got a bunch of things like sed debugging and the above
command line redesign that take a lot of careful study to get right, and
when I pop my head up I have this giant backlog. (I still haven't dealt
with Ashwini's giant pile from october.)
Yesterday got eaten by kernel issues (they put PERL back in the 3.18
kernel build and I had to rip it out again). I have high hopes for today. :)
i actually meant we'd get feedback from OEMs/developers about which
toybox commands they would like to see and/or bugs that cause them
pain. but i have been holding off committing changes in Android until
i get feedback here. i can send out another status mail if that would
be helpful.
Post by Rob Landley
Post by enh
Post by Rob Landley
When I get my darn server reinstalled and get AOSP on it, I want to run
the AOSP build with the toybox commands. (Aboriginal Linux is using an
old version of linux from scratch as a bootstrapping test, but android's
build needs more commands than that. And may use command line options
that toybox doesn't implement yet. I know _you_ aren't trying to get
android self-hosting anytime soon, but I still am. :)
yeah, my builds are more than slow enough even on the fastest desktop
hardware :-)
This netbook hasn't even got the free disk space to try.
building for aarch64 or x86-64 pretty much doubles the space
requirements too (since for the time being at least most things need
to be multi-arch) :-(
Post by Rob Landley
Post by enh
Post by Rob Landley
Post by enh
i also haven't thought much about "in the binary" versus
"gets a symlink"; i suspect that the "too much" camp will be further
subdivided into those who're offended by the binary size and those
who're offended by the number of symlinks in /system/bin.
I don't understand the distinction here? (Is your build making
standalone binaries for the toybox commands ala scripts/single.sh? It
didn't look like it was but I have to stare at makefiles a lot to beat
any sense out of 'em...)
no, there's one binary that contains n toys, and then m symbolic
links, where n != m. think of the ones that don't have links as my
level of "pending"ness :-) though it's ill-defined what not having a
link means. sometimes it's "i think we'll want this, but i haven't
checked it works for us yet", sometimes "i think this probably doesn't
make any sense on Android, but it's in the default set and i haven't
yet been convinced to kick it out".
but my earlier point was that no one is likely to look too closely at
what's in the toybox binary, especially as long as it's roughly the
same size as the toolbox one used to be. but if they start stumbling
across symbolic links for things they think are a waste of space,
they're more likely to get on my case about it.
Makes sense.
Rob
stephen Turner
2014-12-31 06:28:41 UTC
Permalink
Post by enh
building for aarch64 or x86-64 pretty much doubles the space
requirements too (since for the time being at least most things need
to be multi-arch) :-(
Whats wrong with compiling only for 64? Not fully supported yet?

Thanks
Stephen
enh
2014-12-31 07:44:08 UTC
Permalink
In L a few things like mediaserver are still 32-bit only. More importantly
for most people though, pretty much all the apps in the store are 32-bit.
Post by stephen Turner
Post by enh
building for aarch64 or x86-64 pretty much doubles the space
requirements too (since for the time being at least most things need
to be multi-arch) :-(
Whats wrong with compiling only for 64? Not fully supported yet?
Thanks
Stephen
stephen Turner
2014-12-31 07:49:08 UTC
Permalink
Post by enh
In L a few things like mediaserver are still 32-bit only. More
importantly for most people though, pretty much all the apps in the store
are 32-bit.
Oh, ok so presumably there shouldnt be a issue if linux apps and the like
were compiled as 64. Just looking ahead for possible bumps in the road per
se.
Post by enh
Post by stephen Turner
Post by enh
building for aarch64 or x86-64 pretty much doubles the space
requirements too (since for the time being at least most things need
to be multi-arch) :-(
Whats wrong with compiling only for 64? Not fully supported yet?
Thanks
Stephen
enh
2014-12-31 18:01:02 UTC
Permalink
On Tue, Dec 30, 2014 at 11:49 PM, stephen Turner
Post by stephen Turner
Post by enh
In L a few things like mediaserver are still 32-bit only. More importantly
for most people though, pretty much all the apps in the store are 32-bit.
Oh, ok so presumably there shouldnt be a issue if linux apps and the like
were compiled as 64. Just looking ahead for possible bumps in the road per
se.
oh, i see. no, there's no problem there. in particular, toybox is a
64-bit binary on aarch64 and x86-64.
Rob Landley
2014-12-31 20:35:24 UTC
Permalink
Post by enh
Post by Rob Landley
Post by enh
i can save you some time there: there isn't one. bionic's getpwnam and
friends will do the right thing, though, so toybox's id works fine.
(the patch i sent you fixes bugs that affect id on the desktop too,
nothing Android-specific.)
Ok. I note that lib/login.c was an external contribution, and I was
uncomfortable with it precisely _because_ I think we've got to delegate
this to libc and punt in the case of android.
at the moment, pw_passwd is always NULL on Android. if it makes your
life easier for it to point to "*" or whatever, let me know, but one
problem we have in places like this -- and one reason i haven't even
bothered with <shadow.h> or <utmpx.h> -- is that in some ways code
that tries to use this stuff is better off if it just doesn't build,
because at least then the author/builder knows they need to sit down
and think about what they're trying to do and what, if anything, that
means on Android.
I am not as up to speed on Android development as I should be. (Todo
list runneth over, etc.) I sat through about half a tutorial on it at
CELF a year ago but it was mostly on the java/windowing stuff and how to
make apk files, and I'm mostly interested in the lower levels of the
system at the moment.

I don't know how multiple users in android work. I'm under the vague
impression that each running app gets its own uid/gid through a process
I don't understand, and it runs in something a bit like an lxc
container, only not really.

I'm more up to speed on the guts of lxc than you'd think because I did a
contract at Parallels a couple years ago, and I ran a table for them at
SCALE 2011 where I gave a "why containers are awesome" pitch to
passers-by. The old docs I wrote are at:

http://landley.net/lxc

That said, I don't think android is actually using containers? I when I
poked at an android device it seemed to be using selinux and chroots or
something? Except you would probably have to use CLONE_NEWNS and
pivot_root because chroot is broken in Linux:

http://landley.net/notes-2011.html#02-06-2011

(I've done a lot of reading on android, but the pieces don't connect up...)
Post by enh
Post by Rob Landley
So yeah, heads up. You probably care about this one.
And the really _fun_ bit is that once I've got OPTSTRPAD_ versions of
the command names in flags.h with the ctrl-A substitution of disabled
options, that means completely disabled commands went to a string of ^A
when they need to be a constant 0 so the option parsing logic can drop
out (or just not be called) when appropriate.

I tried doing a:

#define FILTOPTS(str, name) (str ? OPTSTRPAD_##name : str)

That main.c could use but for the cases where it is a 0, OPTSTRPAD
doesn't get generated for that name, so there's no #define for it, so
the build breaks because even though the symbol isn't used (0 ? BLAH :
0) still needs BLAH to exist so it can be eliminated.

So I think I need to reingineer the regex to feed more data _into_
mkflags.c so it can produce _empty_ entries, except it's only doing that
for OLDTOY() macros..?

Grrr. Wrestling with infrastructure.

Before I can get back to _that_, I'm fighting the fact libgen.h in
glibc's includes is #defining basename to __xpg_basename. So
NEWTOY(basename) breaks if NEWTOY has more than one layer of macro
expansion. That's just craptacular. I could #undef basename right before
creating the command_list but that sort of special case bug workaround
I've tried pretty hard to _avoid_.

Is there just one or two commands using this so I can #include the
problematic header in just those commands?

$ egrep -l 'basename *[(]' lib/*.c toys/*/*.c
lib/lib.c
toys/other/lspci.c
toys/other/pmap.c
toys/other/rmmod.c
toys/pending/last.c
toys/pending/modprobe.c
toys/pending/netstat.c
toys/posix/basename.c
toys/posix/cp.c
toys/posix/ln.c

Not so much.

Let's see, they did the #define because they wanted to implement a
second basename with different semantics from posix (not overwrite its
source string, return a copy) and they didn't want to give the new
implementation a different name because gnu.

Wow the basename man page is crappy in Ubuntu 12.04. "Returns a copy of
the string..." but they didn't mean new allocation???

I'm all for creating a second implementation that allocates a copy and
returns it without modifying its argument. Maybe not the most efficient
behavior but consistent and usable. Andince it mallocs (and malloc can
fail), it would be xbasename(), and it can live in lib/xwrap.c, and I
can include the darn header _there_ so it's not in toys.h polluting the
global namespace with #defines for common unprefixed symbol names.

(Meanwhile, dirname() only used in dirname.c, and that can include the
header itself.)

Sorry for the yak trace:

https://twitter.com/jorendorff/status/537290791576936449

Just trying to explain why I'm not done yet. :)
Post by enh
Post by Rob Landley
(When I do these things right nobody notices I've done _anything_. Which
is how it should be, but it's still hard.)
while you're there, the one other thing i don't think you can
currently support is the +/- style used by things like lsof. though --
although my usual personal goal is "you shouldn't be able to tell the
difference" -- the world might be a better place if that kind of
nonsense were allowed to die.
I note that "tail" and "find" have similar +/- weirdness.

There's a couple options:

1) If it's a numeric value with + there's a "x-" option (instead of
"x#") that defaults to negative but can be made positive with "-x +7".

2) You can pass through unknown options with ^ or ? prefixes, and can
skip option parsing entirely and do it yourself in the command (pass in
a null optstring and it won't run, in either case toys.argv[] always has
the original unpermuted arguments).

I agree that any time you can tell the difference between the toybox
command and the one in ubuntu or similar is a cost to the end user, and
any time there's a cost it should _buy_ you something
(simple/fast/small/secure/etc). But if we weight that one too heavily we
just wind up reimplementing GNU, which is a bad idea on many evels.

When there's a standard (like posix) you can say "this extra is not in
posix, that's a deviation _from_ posix, that's also a cost" and then
weigh that. And complexity itself is a cost that needs to buy you
something, "can we get away with not doing that" is its own weighting.

Unfortunately, lsof doesn't have a spec. (There seems to just be one
magic implemetnation, from purdue.) Does busybox have a second
implementation?

Ooh, yes, just recently added (April 2012, have they been reading the
toybox roadmap? :), so I could check what options they implemented and
use that as a weighting... Let's see, build defconfig busybox from
current git and:

$ ./busybox --help lsof
BusyBox v1.24.0.git (2014-12-31 13:31:27 CST) multi-call binary.

Usage: lsof

Show all open files

That's... not helpful. Right, throw that on the todo list and move on...
Post by enh
Post by Rob Landley
Post by enh
Post by Rob Landley
So much easier when there's a standards document to blame. (Of course I
vetoed like 1/3 of the posix command list anyway. Nobody needs sccs in
2014.)
(i initially included uuencode/uudecode until one of the other guys on
the team asked what they were. turned out only me and the second
oldest guy had even heard of them...)
Fun piece of trivia: the guy who submitted them to toybox is the lead
architect for the Qualcomm Hexagon processor. (As in the main hardware
guy who designs the actual chip.)
huh. do you remember why?
Because I worked a 6 month contract at Qualcomm circa 2010 and we kept
in touch afterwards?

As to whether his employer's interested, I plead the... (spins the wheel
of amendments... twelfth.

(Huh, so that's why the president and vice president can't be from the
same state. Good to know.)
Post by enh
i don't see anything on the list except for
the fact that two people arrived with uuencode at the same time.
It's a posix command, and people were looking for low hanging fruit
nobody was working on yet. It's useful enough (mime encoding does it so
if you manually dismantle an mbox file...) that I didn't shoot it down
out of the posix command list. And Busybox had it (although they've got
mkfs.minix too, so again that's just a weighting.)
Post by enh
(probably the majority of Android devices right now have the Hexagon
DSP, so if this is something that's actually useful to those guys i
might want to put it back!)
My contract at Qualcomm involved getting Linux running _on_ the hexagon,
so you didn't also need an arm chip in a theoretical snapdragon
successor. (As part of a team of a dozen or so people. I did userspace
bringup, getting a native compiler running and X11 and everything,
although it was xclients talking to a remote xserver through the network
because the comet boards didn't have graphics hardware.)

It was an awesome chip design, which I later blogged about at:

http://landley.net/notes-2012.html#24-02-2012

(Skip down to "six stage pipeline" if you just want the chip design bit.)

Alas, victim of corporate poltitics. Their arm "scorpion" guys in
Raleigh felt threatened after my boss hired Thomas Gleixner's Linutronix
to fish the various snapdragon drivers out of the arch/arm directory and
genericize them so Hexagon Linux could use 'em. I still occasionally
poke Richard Kuo and go "get hexagon support into qemu and I'll make it
work on Aboriginal Linux", but even 4 years later they can't get the
resources assigned.

Still, they pushed what we had done upstream and "linux for hexagon" is
in vanilla since like 3.2 or so. (Which would even be useful if you had
a comet board to run it on, but that was a debugging prototype they only
made a few of which never left the company. See the blog entry bits
about bootloaders for why I could never run this on my phone. See also
the bits at the end about the laywers being in the driver's seat at
Qualcomm so what engineering wants to do and what engineering gets to do
are far apart even when engineering divisions aren't fighting.)
Post by enh
Post by Rob Landley
Post by enh
Post by Rob Landley
Post by enh
it's a pity the debian popularity contest only has per-package data
(https://qa.debian.org/popcon.php?package=coreutils). if you ask
people they always tell you they use everything "all the time". even
if you broke it two releases ago and removed it one release ago.
Have you read toybox's roadmap.html page? It may not meet your needs but
at least I have _reason_ for listing the commands I did. :)
one thing i was curious about was what busybox configurations tend to
get used in the wild. your roadmap implies that you looked at that,
but i didn't see where.
https://lwn.net/Articles/478308/
The various corporations interested in toybox have been VERY quiet about
it. They email me off-list and read me the Mission Impossible speech
about the secretary denying all knowledge if we're caught and their
involvment self destructing in 5 seconds if so.
i've had positive private feedback from an external entity already,
expressed in the form of disappointment that it wasn't in L :-) they
weren't very helpful either when i asked them which toys they
want/need. maybe they're assuming that as long as the system is
expecting toybox, they can just edit the .config themselves?
I'm pedaling as fast as I can.

(I've put off this darn flag generation rewrite for a couple months
already because it's big and nasty, but it's now blocking at least three
patch submissions I tried to review over the weekend...)

Sorry it's taking so long.
Post by enh
Post by Rob Landley
Yesterday got eaten by kernel issues (they put PERL back in the 3.18
kernel build and I had to rip it out again). I have high hopes for today. :)
Speaking of which:

http://landley.net/hg/aboriginal/rev/086e1ff5dd19

Yes I replaced 39 lines of perl with 4 lines of shell script _AGAIN_. I
should really post it to linux-kernel but I have to make enough noise
for _magazines_ to start covering the scuffle to get any traction there:

http://www.linuxjournaldigital.com/linuxjournal/june_2013?pg=18#pg18

It's like pulling teeth, last time it took _FIVE_YEARS_ to get the
patches upstream (with lots of "lurkers support me in email" nonsense
and nobody stepping up to admit it, as usual):

http://landey.net/notes-20103.html#28-02-2013

And there are some people who seem to _actively_oppose_ the idea of
simplifying build systems:

http://landley.net/notes-2013.html#28-03-2013

Sorry. Bit of a sore spot...
Post by enh
i actually meant we'd get feedback from OEMs/developers about which
toybox commands they would like to see and/or bugs that cause them
pain. but i have been holding off committing changes in Android until
i get feedback here. i can send out another status mail if that would
be helpful.
I have your previous status mail open in a reply window that I mean to
reply to as soon as I get your patch stack merged. :)

(Although Ashwini Sharma's requests are ahead of yours if there's a
queue. They've been waiting since October...)

But yay more status.

Rob
stephen Turner
2015-01-01 13:52:50 UTC
Permalink
Post by Rob Landley
http://landley.net/hg/aboriginal/rev/086e1ff5dd19
Yes I replaced 39 lines of perl with 4 lines of shell script _AGAIN_. I
should really post it to linux-kernel but I have to make enough noise
http://www.linuxjournaldigital.com/linuxjournal/june_2013?pg=18#pg18
It's like pulling teeth, last time it took _FIVE_YEARS_ to get the
patches upstream (with lots of "lurkers support me in email" nonsense
http://landey.net/notes-20103.html#28-02-2013
And there are some people who seem to _actively_oppose_ the idea of
http://landley.net/notes-2013.html#28-03-2013
Sorry. Bit of a sore spot...
That doesn't make me feel confident i can convince them the kernel should
build on generic systems such as busybox and toybox. I was hoping to have
them remove the gcc/bash dependencies. I dont get this, they made a kernel
not an OS why are they tying themselves to GNU with a death grip? They
should make it as generic as possible to support the widest array of build
environments and uses they can within reason.
enh
2015-01-01 20:44:46 UTC
Permalink
Post by Rob Landley
Post by enh
Post by Rob Landley
Post by enh
i can save you some time there: there isn't one. bionic's getpwnam and
friends will do the right thing, though, so toybox's id works fine.
(the patch i sent you fixes bugs that affect id on the desktop too,
nothing Android-specific.)
Ok. I note that lib/login.c was an external contribution, and I was
uncomfortable with it precisely _because_ I think we've got to delegate
this to libc and punt in the case of android.
at the moment, pw_passwd is always NULL on Android. if it makes your
life easier for it to point to "*" or whatever, let me know, but one
problem we have in places like this -- and one reason i haven't even
bothered with <shadow.h> or <utmpx.h> -- is that in some ways code
that tries to use this stuff is better off if it just doesn't build,
because at least then the author/builder knows they need to sit down
and think about what they're trying to do and what, if anything, that
means on Android.
I am not as up to speed on Android development as I should be. (Todo
list runneth over, etc.) I sat through about half a tutorial on it at
CELF a year ago but it was mostly on the java/windowing stuff and how to
make apk files, and I'm mostly interested in the lower levels of the
system at the moment.
I don't know how multiple users in android work. I'm under the vague
impression that each running app gets its own uid/gid through a process
I don't understand, and it runs in something a bit like an lxc
container, only not really.
uids were used from the beginning to separate apps, so multi-user
support made things complicated.
http://www.cis.syr.edu/~wedu/Research/paper/multi_user_most2014.pdf
seems from skimming be a reasonable introduction. (section II A talks
about this.)
Post by Rob Landley
I'm more up to speed on the guts of lxc than you'd think because I did a
contract at Parallels a couple years ago, and I ran a table for them at
SCALE 2011 where I gave a "why containers are awesome" pitch to
http://landley.net/lxc
That said, I don't think android is actually using containers? I when I
poked at an android device it seemed to be using selinux and chroots or
something? Except you would probably have to use CLONE_NEWNS and
http://landley.net/notes-2011.html#02-06-2011
(I've done a lot of reading on android, but the pieces don't connect up...)
SELinux and the sdcard FUSE daemon.
Post by Rob Landley
Post by enh
Post by Rob Landley
So yeah, heads up. You probably care about this one.
And the really _fun_ bit is that once I've got OPTSTRPAD_ versions of
the command names in flags.h with the ctrl-A substitution of disabled
options, that means completely disabled commands went to a string of ^A
when they need to be a constant 0 so the option parsing logic can drop
out (or just not be called) when appropriate.
#define FILTOPTS(str, name) (str ? OPTSTRPAD_##name : str)
That main.c could use but for the cases where it is a 0, OPTSTRPAD
doesn't get generated for that name, so there's no #define for it, so
0) still needs BLAH to exist so it can be eliminated.
So I think I need to reingineer the regex to feed more data _into_
mkflags.c so it can produce _empty_ entries, except it's only doing that
for OLDTOY() macros..?
Grrr. Wrestling with infrastructure.
Before I can get back to _that_, I'm fighting the fact libgen.h in
glibc's includes is #defining basename to __xpg_basename. So
NEWTOY(basename) breaks if NEWTOY has more than one layer of macro
expansion. That's just craptacular. I could #undef basename right before
creating the command_list but that sort of special case bug workaround
I've tried pretty hard to _avoid_.
Is there just one or two commands using this so I can #include the
problematic header in just those commands?
$ egrep -l 'basename *[(]' lib/*.c toys/*/*.c
lib/lib.c
toys/other/lspci.c
toys/other/pmap.c
toys/other/rmmod.c
toys/pending/last.c
toys/pending/modprobe.c
toys/pending/netstat.c
toys/posix/basename.c
toys/posix/cp.c
toys/posix/ln.c
Not so much.
Let's see, they did the #define because they wanted to implement a
second basename with different semantics from posix (not overwrite its
source string, return a copy) and they didn't want to give the new
implementation a different name because gnu.
Wow the basename man page is crappy in Ubuntu 12.04. "Returns a copy of
the string..." but they didn't mean new allocation???
no, the GNU one always returns a pointer into the original string.
Android has both, though the opposite way round: basename is the POSIX
version and there's a __gnu_basename that gets in the way if you
#include <string.h> with _GNU_SOURCE defined.

the man page would be more helpful if it admitted earlier on that
there are multiple implementations and could include them in the
table.

another way in which their basename is better is that it has const and
non-const overloads for C++. but, yes, it's insane that they reused
the name. at least with strerror_r their version came first and POSIX
standardized something incompatible.
Post by Rob Landley
I'm all for creating a second implementation that allocates a copy and
returns it without modifying its argument. Maybe not the most efficient
behavior but consistent and usable. Andince it mallocs (and malloc can
fail), it would be xbasename(), and it can live in lib/xwrap.c, and I
can include the darn header _there_ so it's not in toys.h polluting the
global namespace with #defines for common unprefixed symbol names.
(Meanwhile, dirname() only used in dirname.c, and that can include the
header itself.)
https://twitter.com/jorendorff/status/537290791576936449
Just trying to explain why I'm not done yet. :)
Post by enh
Post by Rob Landley
(When I do these things right nobody notices I've done _anything_. Which
is how it should be, but it's still hard.)
while you're there, the one other thing i don't think you can
currently support is the +/- style used by things like lsof. though --
although my usual personal goal is "you shouldn't be able to tell the
difference" -- the world might be a better place if that kind of
nonsense were allowed to die.
I note that "tail" and "find" have similar +/- weirdness.
1) If it's a numeric value with + there's a "x-" option (instead of
"x#") that defaults to negative but can be made positive with "-x +7".
2) You can pass through unknown options with ^ or ? prefixes, and can
skip option parsing entirely and do it yourself in the command (pass in
a null optstring and it won't run, in either case toys.argv[] always has
the original unpermuted arguments).
I agree that any time you can tell the difference between the toybox
command and the one in ubuntu or similar is a cost to the end user, and
any time there's a cost it should _buy_ you something
(simple/fast/small/secure/etc). But if we weight that one too heavily we
just wind up reimplementing GNU, which is a bad idea on many evels.
When there's a standard (like posix) you can say "this extra is not in
posix, that's a deviation _from_ posix, that's also a cost" and then
weigh that. And complexity itself is a cost that needs to buy you
something, "can we get away with not doing that" is its own weighting.
Unfortunately, lsof doesn't have a spec. (There seems to just be one
magic implemetnation, from purdue.) Does busybox have a second
implementation?
Ooh, yes, just recently added (April 2012, have they been reading the
toybox roadmap? :), so I could check what options they implemented and
use that as a weighting... Let's see, build defconfig busybox from
$ ./busybox --help lsof
BusyBox v1.24.0.git (2014-12-31 13:31:27 CST) multi-call binary.
Usage: lsof
Show all open files
That's... not helpful. Right, throw that on the todo list and move on...
there's an lsof in toolbox we wrote ourselves in 2010. it doesn't
support any of the myriad options, but it does take a pid argument
(which is incompatible with the purdue lsof where you have to say
"lsof -p pid").
Post by Rob Landley
Post by enh
Post by Rob Landley
Post by enh
Post by Rob Landley
So much easier when there's a standards document to blame. (Of course I
vetoed like 1/3 of the posix command list anyway. Nobody needs sccs in
2014.)
(i initially included uuencode/uudecode until one of the other guys on
the team asked what they were. turned out only me and the second
oldest guy had even heard of them...)
Fun piece of trivia: the guy who submitted them to toybox is the lead
architect for the Qualcomm Hexagon processor. (As in the main hardware
guy who designs the actual chip.)
huh. do you remember why?
Because I worked a 6 month contract at Qualcomm circa 2010 and we kept
in touch afterwards?
As to whether his employer's interested, I plead the... (spins the wheel
of amendments... twelfth.
(Huh, so that's why the president and vice president can't be from the
same state. Good to know.)
Post by enh
i don't see anything on the list except for
the fact that two people arrived with uuencode at the same time.
It's a posix command, and people were looking for low hanging fruit
nobody was working on yet. It's useful enough (mime encoding does it so
if you manually dismantle an mbox file...) that I didn't shoot it down
out of the posix command list. And Busybox had it (although they've got
mkfs.minix too, so again that's just a weighting.)
Post by enh
(probably the majority of Android devices right now have the Hexagon
DSP, so if this is something that's actually useful to those guys i
might want to put it back!)
My contract at Qualcomm involved getting Linux running _on_ the hexagon,
so you didn't also need an arm chip in a theoretical snapdragon
successor. (As part of a team of a dozen or so people. I did userspace
bringup, getting a native compiler running and X11 and everything,
although it was xclients talking to a remote xserver through the network
because the comet boards didn't have graphics hardware.)
http://landley.net/notes-2012.html#24-02-2012
(Skip down to "six stage pipeline" if you just want the chip design bit.)
Alas, victim of corporate poltitics. Their arm "scorpion" guys in
Raleigh felt threatened after my boss hired Thomas Gleixner's Linutronix
to fish the various snapdragon drivers out of the arch/arm directory and
genericize them so Hexagon Linux could use 'em. I still occasionally
poke Richard Kuo and go "get hexagon support into qemu and I'll make it
work on Aboriginal Linux", but even 4 years later they can't get the
resources assigned.
Still, they pushed what we had done upstream and "linux for hexagon" is
in vanilla since like 3.2 or so. (Which would even be useful if you had
a comet board to run it on, but that was a debugging prototype they only
made a few of which never left the company. See the blog entry bits
about bootloaders for why I could never run this on my phone. See also
the bits at the end about the laywers being in the driver's seat at
Qualcomm so what engineering wants to do and what engineering gets to do
are far apart even when engineering divisions aren't fighting.)
Post by enh
Post by Rob Landley
Post by enh
Post by Rob Landley
Post by enh
it's a pity the debian popularity contest only has per-package data
(https://qa.debian.org/popcon.php?package=coreutils). if you ask
people they always tell you they use everything "all the time". even
if you broke it two releases ago and removed it one release ago.
Have you read toybox's roadmap.html page? It may not meet your needs but
at least I have _reason_ for listing the commands I did. :)
one thing i was curious about was what busybox configurations tend to
get used in the wild. your roadmap implies that you looked at that,
but i didn't see where.
https://lwn.net/Articles/478308/
The various corporations interested in toybox have been VERY quiet about
it. They email me off-list and read me the Mission Impossible speech
about the secretary denying all knowledge if we're caught and their
involvment self destructing in 5 seconds if so.
i've had positive private feedback from an external entity already,
expressed in the form of disappointment that it wasn't in L :-) they
weren't very helpful either when i asked them which toys they
want/need. maybe they're assuming that as long as the system is
expecting toybox, they can just edit the .config themselves?
I'm pedaling as fast as I can.
(I've put off this darn flag generation rewrite for a couple months
already because it's big and nasty, but it's now blocking at least three
patch submissions I tried to review over the weekend...)
Sorry it's taking so long.
Post by enh
Post by Rob Landley
Yesterday got eaten by kernel issues (they put PERL back in the 3.18
kernel build and I had to rip it out again). I have high hopes for today. :)
http://landley.net/hg/aboriginal/rev/086e1ff5dd19
Yes I replaced 39 lines of perl with 4 lines of shell script _AGAIN_. I
should really post it to linux-kernel but I have to make enough noise
http://www.linuxjournaldigital.com/linuxjournal/june_2013?pg=18#pg18
It's like pulling teeth, last time it took _FIVE_YEARS_ to get the
patches upstream (with lots of "lurkers support me in email" nonsense
http://landey.net/notes-20103.html#28-02-2013
And there are some people who seem to _actively_oppose_ the idea of
http://landley.net/notes-2013.html#28-03-2013
Sorry. Bit of a sore spot...
Post by enh
i actually meant we'd get feedback from OEMs/developers about which
toybox commands they would like to see and/or bugs that cause them
pain. but i have been holding off committing changes in Android until
i get feedback here. i can send out another status mail if that would
be helpful.
I have your previous status mail open in a reply window that I mean to
reply to as soon as I get your patch stack merged. :)
(Although Ashwini Sharma's requests are ahead of yours if there's a
queue. They've been waiting since October...)
But yay more status.
i'll send out an update to the original thread then.
Rob Landley
2015-01-02 04:40:27 UTC
Permalink
Post by enh
Post by Rob Landley
Post by enh
Post by Rob Landley
Post by enh
i can save you some time there: there isn't one. bionic's getpwnam and
friends will do the right thing, though, so toybox's id works fine.
(the patch i sent you fixes bugs that affect id on the desktop too,
nothing Android-specific.)
Ok. I note that lib/login.c was an external contribution, and I was
uncomfortable with it precisely _because_ I think we've got to delegate
this to libc and punt in the case of android.
at the moment, pw_passwd is always NULL on Android. if it makes your
life easier for it to point to "*" or whatever, let me know, but one
problem we have in places like this -- and one reason i haven't even
bothered with <shadow.h> or <utmpx.h> -- is that in some ways code
that tries to use this stuff is better off if it just doesn't build,
because at least then the author/builder knows they need to sit down
and think about what they're trying to do and what, if anything, that
means on Android.
I am not as up to speed on Android development as I should be. (Todo
list runneth over, etc.) I sat through about half a tutorial on it at
CELF a year ago but it was mostly on the java/windowing stuff and how to
make apk files, and I'm mostly interested in the lower levels of the
system at the moment.
I don't know how multiple users in android work. I'm under the vague
impression that each running app gets its own uid/gid through a process
I don't understand, and it runs in something a bit like an lxc
container, only not really.
uids were used from the beginning to separate apps, so multi-user
support made things complicated.
http://www.cis.syr.edu/~wedu/Research/paper/multi_user_most2014.pdf
seems from skimming be a reasonable introduction. (section II A talks
about this.)
Downloaded and reading, thanks.

(I have half-responses composed in various text files for a bunch of the
patches I started reviewing and then got blocked/distracted about, and
one of 'em was the chown patch where you were doing sed on /etc/passwd
and I'm going "but... android hasn't got one?" Now that I've got the
darn flag generation untangled and I'm making progress on getting sed
properly debugged, I hope to drain that queue soonish...)
Post by enh
Post by Rob Landley
I'm more up to speed on the guts of lxc than you'd think because I did a
contract at Parallels a couple years ago, and I ran a table for them at
SCALE 2011 where I gave a "why containers are awesome" pitch to
http://landley.net/lxc
That said, I don't think android is actually using containers? I when I
poked at an android device it seemed to be using selinux and chroots or
something? Except you would probably have to use CLONE_NEWNS and
http://landley.net/notes-2011.html#02-06-2011
(I've done a lot of reading on android, but the pieces don't connect up...)
SELinux and the sdcard FUSE daemon.
Can of worms. I should come back to this when I get my patch/pending
backlog dealt with...
Post by enh
Post by Rob Landley
Post by enh
Post by Rob Landley
So yeah, heads up. You probably care about this one.
And the really _fun_ bit is that once I've got OPTSTRPAD_ versions of
the command names in flags.h with the ctrl-A substitution of disabled
options, that means completely disabled commands went to a string of ^A
when they need to be a constant 0 so the option parsing logic can drop
out (or just not be called) when appropriate.
#define FILTOPTS(str, name) (str ? OPTSTRPAD_##name : str)
That main.c could use but for the cases where it is a 0, OPTSTRPAD
doesn't get generated for that name, so there's no #define for it, so
0) still needs BLAH to exist so it can be eliminated.
So I think I need to reingineer the regex to feed more data _into_
mkflags.c so it can produce _empty_ entries, except it's only doing that
for OLDTOY() macros..?
Grrr. Wrestling with infrastructure.
And after all that I figured out a less crappy way to implement it, by
replacing the old generated/oldtoys.h that defined OPTSTR_blah macros so
OLDTOY() calls could refer to the base thing's flags without repeating them.

Now that header's gone, and the OPTSTR macros are instead in flags.h and
_those_ are the ones with the edited currently enabled subset with the
CTRL-A spacers, so toy_list[] (and thus lib/args.c) is also actually
using the OPTSTR macros (instead of directly using the ones passed in to
NEWTOY()), and I also removed the flags argument from OLDTOY() entirely
because there was only one instance where it actually differed from the
NEWTOY() the OLDTOY() was derived from (ftpget supported -c but ftpput
didn't, but they shared common help text so this was nonobvious and it's
in pending anyway so I need to clean it up already).

The the above is how I _didn't_ implement it. I need to update code.html...
Post by enh
Post by Rob Landley
Before I can get back to _that_, I'm fighting the fact libgen.h in
glibc's includes is #defining basename to __xpg_basename. So
NEWTOY(basename) breaks if NEWTOY has more than one layer of macro
expansion. That's just craptacular. I could #undef basename right before
creating the command_list but that sort of special case bug workaround
I've tried pretty hard to _avoid_.
Is there just one or two commands using this so I can #include the
problematic header in just those commands?
$ egrep -l 'basename *[(]' lib/*.c toys/*/*.c
lib/lib.c
toys/other/lspci.c
toys/other/pmap.c
toys/other/rmmod.c
toys/pending/last.c
toys/pending/modprobe.c
toys/pending/netstat.c
toys/posix/basename.c
toys/posix/cp.c
toys/posix/ln.c
Not so much.
Let's see, they did the #define because they wanted to implement a
second basename with different semantics from posix (not overwrite its
source string, return a copy) and they didn't want to give the new
implementation a different name because gnu.
Wow the basename man page is crappy in Ubuntu 12.04. "Returns a copy of
the string..." but they didn't mean new allocation???
no, the GNU one always returns a pointer into the original string.
Android has both, though the opposite way round: basename is the POSIX
version and there's a __gnu_basename that gets in the way if you
#include <string.h> with _GNU_SOURCE defined.
the man page would be more helpful if it admitted earlier on that
there are multiple implementations and could include them in the
table.
another way in which their basename is better is that it has const and
non-const overloads for C++. but, yes, it's insane that they reused
the name. at least with strerror_r their version came first and POSIX
standardized something incompatible.
Post by Rob Landley
I'm all for creating a second implementation that allocates a copy and
returns it without modifying its argument. Maybe not the most efficient
behavior but consistent and usable. Andince it mallocs (and malloc can
fail), it would be xbasename(), and it can live in lib/xwrap.c, and I
can include the darn header _there_ so it's not in toys.h polluting the
global namespace with #defines for common unprefixed symbol names.
(Meanwhile, dirname() only used in dirname.c, and that can include the
header itself.)
https://twitter.com/jorendorff/status/537290791576936449
Just trying to explain why I'm not done yet. :)
And I didn't do _that_ either, instead I made a wrapper in
lib/portability.[hc] that provides posix semantics without the define
and only triggers for glibc (which I consider broken).

(Ok, it triggers for uClibc as well because they #define __GLIBC__ which
is their own darn fault for lying to me, and I didn't filter them out
because the workaround works for them too.)

Amusingly, the workaround broke toys/pending/nsenter.c because they
#define GNU_DAMMIT before #including <toys.h> and thus get the gnu
version of basename which is a separate function and the linker
complains about the redefinition. But that's basically the workaround
catching a bug; I'm ok with that and am fixing it in the copy where I'm
merging it into unshare.c.

So once again, "looks like I didn't do anything because I went down
several blind alleys before finding the properly trivial fix". I just
usually don't bother other people with my false starts, although the
whole "importance of publishing negative results" thing means I should
occasionally describe my average day...

http://curt-rice.com/2011/07/21/negative-results-are-important-research-europe/

http://xkcd.com/882/
Post by enh
Post by Rob Landley
Unfortunately, lsof doesn't have a spec. (There seems to just be one
magic implemetnation, from purdue.) Does busybox have a second
implementation?
Ooh, yes, just recently added (April 2012, have they been reading the
toybox roadmap? :), so I could check what options they implemented and
use that as a weighting... Let's see, build defconfig busybox from
$ ./busybox --help lsof
BusyBox v1.24.0.git (2014-12-31 13:31:27 CST) multi-call binary.
Usage: lsof
Show all open files
That's... not helpful. Right, throw that on the todo list and move on...
there's an lsof in toolbox we wrote ourselves in 2010. it doesn't
support any of the myriad options, but it does take a pid argument
(which is incompatible with the purdue lsof where you have to say
"lsof -p pid").
You know, if you guys are ok with submitting that under the toybox
license I'll happily take it, toyboxify it, and expand it to do more
stuff. (A todo list would be nice, the existing lsof man page has no
excuse being 2714 lines long...)

Hmmm, looking at it "-p" does make sense because "lsof blah" treats blah
as a filename. (I could check that blah is a number and doesn't exist,
but that's brittle. A filename that _is_ a number in the current
directory would suddenly change its behavior... Meanwhile busybox's new
lsof ignores its arguments, so "lsof" and "lsof walrus" do the same
thing: show all open files. So that's not much help...)

You'll have to tell me what you consider "least surprise" here. :)
Post by enh
Post by Rob Landley
Post by enh
i actually meant we'd get feedback from OEMs/developers about which
toybox commands they would like to see and/or bugs that cause them
pain. but i have been holding off committing changes in Android until
i get feedback here. i can send out another status mail if that would
be helpful.
I have your previous status mail open in a reply window that I mean to
reply to as soon as I get your patch stack merged. :)
(Although Ashwini Sharma's requests are ahead of yours if there's a
queue. They've been waiting since October...)
But yay more status.
i'll send out an update to the original thread then.
Yay!

Rob
Isaac Dunham
2014-12-30 21:35:45 UTC
Permalink
Post by Rob Landley
https://lwn.net/Articles/478308/
The various corporations interested in toybox have been VERY quiet about
it. They email me off-list and read me the Mission Impossible speech
about the secretary denying all knowledge if we're caught and their
involvment self destructing in 5 seconds if so.
I tried to correct the record in the comments, ala
https://lwn.net/Articles/480382/ and https://lwn.net/Articles/480836/
but it didn't help. The FSF is a group of political zealots bordering on
a religion, the old line applies about how you can't use rational
argument to talk someone out of a position they didn't arrive at
rationally in the first place. The best I can do is try to prevent the
next generation from following them down the rathole (hence the Ohio
Linufest talk, which I really need to redo. I had 3 hours of material
and a little under an hour to speak, and that was _after_ spending a
week trying to edit it down.)
So I've gotten a lot of off the record emails with very interesting
data, and various people going "we really need this, can't say why" and
I do my best with the information I've got to keep the project on track...
For the record, I first heard about toybox when groklaw had some article
complaining about toybox (after that whole blow-up), read the project
site instead of the complaints, and decided that I'd look for a chance
to contribute.

Thanks,
Isaac Dunham
stephen Turner
2014-12-30 22:06:16 UTC
Permalink
Post by Isaac Dunham
Post by Rob Landley
https://lwn.net/Articles/478308/
The various corporations interested in toybox have been VERY quiet about
it. They email me off-list and read me the Mission Impossible speech
about the secretary denying all knowledge if we're caught and their
involvment self destructing in 5 seconds if so.
For the record, I first heard about toybox when groklaw had some article
complaining about toybox (after that whole blow-up), read the project
site instead of the complaints, and decided that I'd look for a chance
to contribute.
Thanks,
Isaac Dunham
They fear they will lose their hold on free software. Truely Free software
is unbound. So obviously they are selling a cheap knock off.

Having companies interested is good, Having companies that help development
etc, is better. Either way Rob you know you have a good product here and
those emails help reinforce it.
Roy Tam
2014-12-31 05:00:27 UTC
Permalink
2014-12-31 3:22 GMT+08:00 Rob Landley <***@landley.net>:
[snip]
Post by Rob Landley
Speaking of which, I just got reminded on twitter that there _is_ a
http://news.gmane.org/gmane.linux.toybox
and paste 'em, but I'll live.
Well you can use their NNTP service (with your favorite news reader)
to get unmangled version. ;)
Post by Rob Landley
Rob
_______________________________________________
Toybox mailing list
http://lists.landley.net/listinfo.cgi/toybox-landley.net
Roy Tam
2015-01-05 04:29:48 UTC
Permalink
2014-12-31 3:22 GMT+08:00 Rob Landley <***@landley.net>:
[snip]
Post by Rob Landley
I'm all for seeing patches on the list. Even if I don't wind up merging
them it means there's an archive of them.
Speaking of which, I just got reminded on twitter that there _is_ a
http://news.gmane.org/gmane.linux.toybox
and paste 'em, but I'll live.
(One advantage of two space indents instead of tab characters: applying
patches with cut and paste gets much easier and I almost never have to
do "mixed tab and space" policing because you need 4 levels deep
indenting before you _could_ use a tab with default tabstops...)
Rob
_______________________________________________
Toybox mailing list
http://lists.landley.net/listinfo.cgi/toybox-landley.net
Rob Landley
2015-01-05 14:55:50 UTC
Permalink
Post by Roy Tam
[snip]
Post by Rob Landley
I'm all for seeing patches on the list. Even if I don't wind up merging
them it means there's an archive of them.
Speaking of which, I just got reminded on twitter that there _is_ a
http://news.gmane.org/gmane.linux.toybox
and paste 'em, but I'll live.
(One advantage of two space indents instead of tab characters: applying
patches with cut and paste gets much easier and I almost never have to
do "mixed tab and space" policing because you need 4 levels deep
indenting before you _could_ use a tab with default tabstops...)
We've had an active month, mailing-list-wise...

Rob
Roy Tam
2015-01-05 16:05:14 UTC
Permalink
Post by Rob Landley
Post by Roy Tam
[snip]
Post by Rob Landley
I'm all for seeing patches on the list. Even if I don't wind up merging
them it means there's an archive of them.
Speaking of which, I just got reminded on twitter that there _is_ a
http://news.gmane.org/gmane.linux.toybox
and paste 'em, but I'll live.
(One advantage of two space indents instead of tab characters: applying
patches with cut and paste gets much easier and I almost never have to
do "mixed tab and space" policing because you need 4 levels deep
indenting before you _could_ use a tab with default tabstops...)
We've had an active month, mailing-list-wise...
but mail-archive mirror doesn't break lines...
Post by Rob Landley
Rob
Robert Thompson
2014-12-25 11:40:40 UTC
Permalink
Yes, I apparently submitted a diff from the tree I used to find the
problem, not the tree I cleanly patched... (Oh well, I'm really stale at C,
so it wasn't *that* clean)

I literally discovered that problem by accident, and submitted a patch
because (bad) code was quicker than a good explanation.
Post by Rob Landley
Post by Robert Thompson
I ran across a variance between toybox factor and coreutils factor.
Coreutils factor will accept numbers on stdin separated by any whitespace
(including newlines and tabs) between integers, but toybox factor was
only
Post by Robert Thompson
accepting one integer per line.
Really?
$ factor ""
factor: `' is not a valid positive integer
$ factor "32 "
factor: `32 ' is not a valid positive integer
$ factor "32 7"
factor: `32 7' is not a valid positive integer
Must be newer than Ubuntu 12.04... Ah, on _stdin_. Right. Confirmed.
Hmmm... might as well make it take both anyway.
Post by Robert Thompson
I added a test for this, and hacked factor to give the expected behavior.
It's not properly indented, and it depends on isspace(), but it seems to
be
Post by Robert Thompson
doing the job.
I think you left a debug printf in there, it's making all the tests fail,
$ VERBOSE=fail scripts/test.sh factor
scripts/make.sh
Generate headers from toys/*/*.c...
Make generated/config.h from .singleconfig.
generated/flags.h generated/help.h
Compile toybox.....
FAIL: factor -32
echo -ne '' | factor -32
--- expected 2014-12-23 20:48:38.689595406 -0600
+++ actual 2014-12-23 20:48:38.693595406 -0600
@@ -1 +1,2 @@
+->: -32
-32: -1 2 2 2 2 2
@@ -20,9 +20,11 @@
static void factor(char *s)
{
long l, ll;
+ while( *s && s[0] && ! isspace(s[0]) ) {
+ printf("->: %s\n",s);
l = strtol(s, &s, 0);
*s and s[0] are the same thing.
@@ -61,6 +63,7 @@
}
}
xputc('\n');
+ }
}
void factor_main(void)
As you mentioned, you added a curly bracket level without indenting the code.
I could do a tail call and expect the compiler to turn the recursion into
iteration, but reindenting the code properly is worth the noise in the diff.
The version I checked in won't error out for 'factor ""' or 'factor "36 "'
the way Ubuntu's will, but I think I'm ok with that...?
Let me know if there are more things to fix.
Thanks,
Rob
Continue reading on narkive:
Loading...