Discussion:
[Nommu] Week ending June 27ish.
(too old to reply)
Rob Landley
2015-06-28 23:15:26 UTC
Permalink
I should probably check in here at least weekly to give a general idea
of what I'm working on and plans going forward.
1) I yanked busybox ash and replaced it with a combination of hush and
bash. (This broke stuff, and I'm fixing it.)
Does bash actually have nommu support?
Sigh. You're right, it doesn't. (I thought my old 2.05b version did, but
nope.)
I asked about mksh a while back
and sadly it doesn't. I've got hush working fine at the moment but I
haven't tried much in the way of scripts, so I'm not sure what its
coverage is like.
Alas, hush is very limited.
Sigh. Time to restart work on toysh I guess.
Are there perhaps any other shells that work on nommu? It would be
nice to be able to put off toysh until you have time to do a really
good job on it rather than rushing it because there's nothing else you
can use...
Eh, I can do it in stages. And setting up another shell is effort (and
natural test/use cases) that's _not_ going into toysh development, so
I'd rather do it right than do other things that get thrown away.

That said, sure, http://www.cod5.org/archive/ links to es and pdksh for
example, and uclibc had like 5 of them (all kinda crappy if I recall).
None really an improvement on hush in terms of actually building stuff.
2) I'm teaching toybox to probe for the existence of fork(), set a nommu
symbol if it's not there, and use sort of a fakeroot version of fork
(re-exec yourself with some way of transferring state to the child,
maybe an environment variable, maybe a pipe...).
BTW toybox does a lot of expensive probing at startup already (before
even entering the applet's main function) that makes simple commands
take about twice the time of their busybox versions.
Could you please be a little more vague?
I'd prefer to respond to this on the toybox list, but since you felt it
should be asked here instead...
Sorry, it just came up as part of the overhead for self-exec topic. I
agree the toybox list would have been a better place.
Or do you mean something other by "probing" than "system calls"? (Memory
usage? CPU usage? Could I have some sort of metric here, or at least the
axis of interest?)
# strace toybox true
set_tid_address(0x15c9eeb4) = 161
rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER|SA_RESTART, 0x15c4cb48}, {SIG_DFL, [], 0}, 8) = 0
getuid32() = 0
geteuid32() = 0
umask(0) = 0755
umask(0755) = 0
getuid32() = 0
geteuid32() = 0
umask(0) = 0755
umask(0755) = 0
Ah, that's two instances of setup. The setup for toybox_main() and the
setup for true.

It can normally skip one of those, but you're explicitly calling
"toybox" so it treats it as a command and does setup for it.
exit_group(0) = ?
+++ exited with 0 +++
It looks like they're all repeated twice when the toybox command is
used to invoke a command by name (rather than with symlinks).
Yes. If you call toybox as a command name, it gets the normal command
setup done for it. I'm not optimizing for that case. Pilot error.

I note that toysh already does manual init stuff because it has to
implement nofork and nothing else does. We know what context we're in
there, and I could optimize stuff. But since it's already dropped
priviliges in the suid case, we _have_ to re-exec to re-acquire that for
TOYBOX_STAYROOT commands. (I was thinking about this at the design level
when I implemented it...)

As I said, I can put a TOYBOX_UMASK flag check around the umask() calls
but I didn't bother because A) they're really trivial, B) toysh is an
example of a command that wants to use toys.old_umask without having set
TOYFLAG_UMASK in its newtoy(). (Admittedly it's re-applying the old
value because it's trying to do longjmp(rebound) cleanup for that nofork
stuff that I said probably isn't worth doing, so... I need to revisit
that and work out what tradeoffs I want to make. There's pending design
work to be done in toysh.)
It's not
as bad without the duplication, but that's still 4 unnecessary
round-trips to kernelspace for a lot of commands. Maybe the get[e]uid
stuff is hard to remove when the suid support is compiled-in, but
getauxval() could be used instead on systems that have it.
There was talk a few years back of putting getuid and friends in vdso
and the result was nobody bothered because the overhead wasn't high
enough for anybody to care. (People tend to call gettimeofday() in tight
loops because it constantly changes.)
Can you make the
probe lazy (at first fork) or so it only happens in toys that need
fork, rather than unconditionally at start?
I don't understand the request here? The vfork/exec probe I was
referring to is after exec, to see if toybox is re-execing itself. The
test _is_ to see if we're the first exec or not...
OK, I misunderstood then. I thought you were probing to see if you can
fork, and otherwise switching to nommu mode.
Yes, at compile time.
Obviously such a probe
would be very expensive (relative to what you do need to do) if
performed when you won't need it. That was probably the start of my
misunderstanding.
No, it's a compile time probe to see if toysh can call fork() or has to
do something elaborately silly instead to make & and job control work in
toysh. (Let alone making cpio -p work, and the task pool stuff if I did
decide to have a compressor work in parallel or similar.)
I don't want to replace fork _entirely_ because fork takes about 5% as
long as exec on systems that _do_ support it, so the toybox shell being
able to fork and run commands internally is a big potential speedup for
shell scripts. But codepaths that are _not_ performance critical should
use the long way round so it gets testing.
This seems like a good approach, but I have a suggestion that will
perform even better: use pthread_create instead of fork for
implementing shell builtins
Suppressing the gag refex for involving pthreads in something for no
reason, I actually looked at that a long time ago (using clone()
directly to create shared process contexts), and didn't like it. Random
http://landley.net/notes-2007.html#20-01-2007
The difference with using clone() directly is that __thread variables
are not going to work and it might not even be safe to call any libc
functions. glibc doesn't document what is or isn't safe; I could go
into detail on the topic with musl if anyone cares to hear.
Feel free, but if threads need to do something non-obvious that clone()
doesn't, I don't want to get threading on me.
Commands that _can't_ run without fork() generally have a reason, such
as they don't free all their memory on error paths, they've changed
signal handlers, they may leave memory mappings, there may be opendir()
state, there could be exit handlers... And no, beefing up the error and
exit handling to make TOYBOX_CLEANUP_TO_HUMOR_VALGRIND be mandatory
doesn't fix it because ctrl-c can come in anywhere.
The obvious case where you need fork not to work around implementation
limitations like the above, but for an actual semantic reason, is for
pipelines and () subshells.
You can't intercept sigstop so you'd have to write some sort of filter
intercepting I/O and faking your own PTY to implement ctrl-z and I am
so, so, so not going there. Plus fg, bg, &.
For many (but not all) such usages, if
there is a builtin version of the commands involved, they could be
implemented in threads without forking.
I did threading for about 5 years under OS/2 back in the 90's. I got the
desire to work with it out of my system.
I could probably come up with
some nice examples if you want to see them, but since the toys don't
seem to be written to support this kind of usage, it's probably not
practical anyway for toybox.
I can make a surprising number of things work if I decide to, but
convincing me opening that can of worms is a good idea in the first
place is a different matter.

One of my early toybox todo items was to parallelize bunzip2 (see
http://landley.net/notes-2007.html##26-12-2007 and commit d3236c1fd785
for example) and I did seriously look at -lpthread for this... and then
decide _so_ not to go there. (If I wanted to parallelize a compressor I
could use a task pool and pipes, but said compressor wouldn't be bunzip2
at this point because the algorithm's essentially deprecated. Gzip is a
streaming compressor, and lzma/xz are the Giant Horrible Algorithm to
make things as small as reasonably achievable. This leaves bunzip2
without much of a niche.)

I could probably actually parallelize gzip (compression of large data
sets is trivial if you don't mind inserting dictionary resets,
decompression is sort of heuristic but still doable, or decompression in
parallel could rely on the compression in parallel to determine the
reset stride... :)
I looked into having xexit() do a longjmp and then cleaning stuff up
reliably, inspecting our own heap and everything, and the answer is
reimplementing that much of the operating system is giant bloat and
complexity and a great big Not Going There.
I agree this is a bad approach -- things like that should be relegated
to busybox and not carried into toybox.
http://lists.busybox.net/pipermail/busybox/2006-March/053270.html
http://lists.busybox.net/pipermail/busybox/2009-January/068158.html
http://lists.busybox.net/pipermail/busybox/2011-February/074626.html
OK.
And then I benchmarked it and noticed that exec is the expensive part by
an order of magnitude, and decided not to bother. Yes vfork() makes that
problematic again, but penalizing android to make nommu work would not
be my first choice, and exec() from a thread has always been
problematic. (Did this change recently?)
exec replaces the calling _process_, not thread, but it works fine if
that's what you want to do.
Which means that a toysh that wanted to use threads for builtins would
need to use vfork for non-builtins meaning it would need two codepaths
to do largely the same thing meaning so not going there...

(I put two codepaths into "tail" because one can't work on pipes and the
other is pathological on large seekable files. I agonized about it, and
am still unhappy, and it's one of the few command config suboptions
remaining. That's about my threshold and comfort level for this sort of
thing.)
Rich
Rob
Rich Felker
2015-06-29 02:57:00 UTC
Permalink
Post by Rob Landley
Are there perhaps any other shells that work on nommu? It would be
nice to be able to put off toysh until you have time to do a really
good job on it rather than rushing it because there's nothing else you
can use...
Eh, I can do it in stages. And setting up another shell is effort (and
natural test/use cases) that's _not_ going into toysh development, so
I'd rather do it right than do other things that get thrown away.
That said, sure, http://www.cod5.org/archive/ links to es and pdksh for
example, and uclibc had like 5 of them (all kinda crappy if I recall).
None really an improvement on hush in terms of actually building stuff.
I'm pretty sure pdksh conforms to POSIX (at least the 1992 version) so
it's probably sufficient for running portable scripts including all
configure scripts. I know your goal is much higher (bash
compatibility) but it sounds like pdksh could get you a working
environment until you finish toysh, with little or no effort spent on
the temporary solution.
Post by Rob Landley
2) I'm teaching toybox to probe for the existence of fork(), set a nommu
symbol if it's not there, and use sort of a fakeroot version of fork
(re-exec yourself with some way of transferring state to the child,
maybe an environment variable, maybe a pipe...).
BTW toybox does a lot of expensive probing at startup already (before
even entering the applet's main function) that makes simple commands
take about twice the time of their busybox versions.
Could you please be a little more vague?
I'd prefer to respond to this on the toybox list, but since you felt it
should be asked here instead...
Sorry, it just came up as part of the overhead for self-exec topic. I
agree the toybox list would have been a better place.
Or do you mean something other by "probing" than "system calls"? (Memory
usage? CPU usage? Could I have some sort of metric here, or at least the
axis of interest?)
# strace toybox true
set_tid_address(0x15c9eeb4) = 161
rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER|SA_RESTART, 0x15c4cb48}, {SIG_DFL, [], 0}, 8) = 0
getuid32() = 0
geteuid32() = 0
umask(0) = 0755
umask(0755) = 0
getuid32() = 0
geteuid32() = 0
umask(0) = 0755
umask(0755) = 0
Ah, that's two instances of setup. The setup for toybox_main() and the
setup for true.
It can normally skip one of those, but you're explicitly calling
"toybox" so it treats it as a command and does setup for it.
exit_group(0) = ?
+++ exited with 0 +++
It looks like they're all repeated twice when the toybox command is
used to invoke a command by name (rather than with symlinks).
Yes. If you call toybox as a command name, it gets the normal command
setup done for it. I'm not optimizing for that case. Pilot error.
OK. This was the source of the duplicates. I would still like to see
no extra syscalls at all (much like my feeling about glibc startup),
but the situation isn't nearly as bad as I thought.

BTW what happens with the suid check/drop being done twice? Do
commands that need suid work when you call them as "toybox cmdname"?
Are they supposed to?
Post by Rob Landley
It's not
as bad without the duplication, but that's still 4 unnecessary
round-trips to kernelspace for a lot of commands. Maybe the get[e]uid
stuff is hard to remove when the suid support is compiled-in, but
getauxval() could be used instead on systems that have it.
There was talk a few years back of putting getuid and friends in vdso
and the result was nobody bothered because the overhead wasn't high
enough for anybody to care. (People tend to call gettimeofday() in tight
loops because it constantly changes.)
I could cache it in libc too, but I feel the risk of returning a stale
value outweighs the performance benefits, so I didn't do it. But if
you're checking it at startup (without having done uid changes),
getauxval is reliable.
Post by Rob Landley
I don't want to replace fork _entirely_ because fork takes about 5% as
long as exec on systems that _do_ support it, so the toybox shell being
able to fork and run commands internally is a big potential speedup for
shell scripts. But codepaths that are _not_ performance critical should
use the long way round so it gets testing.
This seems like a good approach, but I have a suggestion that will
perform even better: use pthread_create instead of fork for
implementing shell builtins
Suppressing the gag refex for involving pthreads in something for no
reason, I actually looked at that a long time ago (using clone()
directly to create shared process contexts), and didn't like it. Random
http://landley.net/notes-2007.html#20-01-2007
The difference with using clone() directly is that __thread variables
are not going to work and it might not even be safe to call any libc
functions. glibc doesn't document what is or isn't safe; I could go
into detail on the topic with musl if anyone cares to hear.
Feel free, but if threads need to do something non-obvious that clone()
doesn't, I don't want to get threading on me.
What they do is specifically setting up TLS, both visible TLS that
belongs to the application/libraries and libc-internal stuff. In
principle the former could be skipped if you know your code is not
using TLS, but the latter could be dangerous not to have setup right.
I *think* (this needs checking) the new thread created manually by
clone will use the same TLS pointer as the thread that called clone.
In this case, libc internals will potentially be reading and writing
the same data, without any synchronization. The most obvious case that
_necessarily_ breaks is that they're both using the same errno, so you
can never reliably check errno, and anything libc-internal that checks
errno as part of its operation could get the wrong value and badly
break.

I know glibc's pthread library is big and heavy, but in musl there's
really nothing you "don't want to get on you". Even if you use all the
threads functions, it's under half the size of stdio, and typical
usage (like what I suggested toybox could so) will just be a few kb.
The real cost, as you've noted, would be making the toys so that they
clean up sufficiently, and sufficiently avoid breaking global state
like fd 0/1/2, to be usable as threads, and I agree that's probably
prohibitive without a significantly different design.
Post by Rob Landley
Commands that _can't_ run without fork() generally have a reason, such
as they don't free all their memory on error paths, they've changed
signal handlers, they may leave memory mappings, there may be opendir()
state, there could be exit handlers... And no, beefing up the error and
exit handling to make TOYBOX_CLEANUP_TO_HUMOR_VALGRIND be mandatory
doesn't fix it because ctrl-c can come in anywhere.
The obvious case where you need fork not to work around implementation
limitations like the above, but for an actual semantic reason, is for
pipelines and () subshells.
You can't intercept sigstop so you'd have to write some sort of filter
intercepting I/O and faking your own PTY to implement ctrl-z and I am
so, so, so not going there. Plus fg, bg, &.
^Z sends SIGTSTP, not SIGSTOP. I agree it would be some work to make
fake job control where no processes exist, but I would just use real
processes for interactive shells with job control. Performance
difference is not noticable there anyway. The case where you get a
measurable benefit from avoiding forking and execing is in scripts,
where you don't have job control to worry about.

Rich
Rob Landley
2015-06-29 20:26:13 UTC
Permalink
Post by Rich Felker
Post by Rob Landley
Are there perhaps any other shells that work on nommu? It would be
nice to be able to put off toysh until you have time to do a really
good job on it rather than rushing it because there's nothing else you
can use...
Eh, I can do it in stages. And setting up another shell is effort (and
natural test/use cases) that's _not_ going into toysh development, so
I'd rather do it right than do other things that get thrown away.
That said, sure, http://www.cod5.org/archive/ links to es and pdksh for
example, and uclibc had like 5 of them (all kinda crappy if I recall).
None really an improvement on hush in terms of actually building stuff.
I should have clarified these were other shells to look at with
licensing I didn't mind distributing, not that I'd confirmed they work
with nommu. (Except the uclinux shells, which are _crap_.)

Those are other things I could go look at instead of writing something I
mean to write anyway, spending the effort advancing a project I _know_
would do what I need it to do instead of debugging another dead end.
Post by Rich Felker
I'm pretty sure pdksh conforms to POSIX (at least the 1992 version) so
Uh-huh. 1992 was 23 years ago. That would mean the shell's active
development predates Linux. That standard is a decade older than the
stale version of bash I'm trying to move off of because stuff keeps
breaking when I use it.

And on those grounds you expect it to be better than hush?
Post by Rich Felker
it's probably sufficient for running portable scripts including all
configure scripts.
You think all configure scripts are portable?

Really?
Post by Rich Felker
I know your goal is much higher (bash compatibility)
Well, some bash extensions. Full bug for bug bash compatability is a
moving target. I'm basically thinking bash 2.x plus stuff people
actually use as they complain about it. (-o pipefail for one thing, and
the ~= comparator for another; portage used that).
Post by Rich Felker
but it sounds like pdksh could get you a working
environment until you finish toysh, with little or no effort spent on
the temporary solution.
Having never put together your own distro build system, you believe
there's no effort involved. Having done this sort of thing repeatedly
for about 15 years, I don't even want to go there.

But sure, let's take a look at the actual pdksh code, in jobs.c it says:

* The interface to the rest of the shell should probably be changed
* to allow use of vfork() when available but that would be way too much
* work :)

...

/* create child process */
forksleep = 1;
while ((i = fork()) < 0 && errno == EAGAIN && forksleep < 32) {
if (intrsig) /* allow user to ^C out... */
break;
sleep(forksleep);
forksleep <<= 1;
}

So no, that doesn't work with nommu either. (And I have no idea what
weird historical bug that was trying to work around back in 1992.)
Post by Rich Felker
Post by Rob Landley
Yes. If you call toybox as a command name, it gets the normal command
setup done for it. I'm not optimizing for that case. Pilot error.
OK. This was the source of the duplicates. I would still like to see
no extra syscalls at all (much like my feeling about glibc startup),
but the situation isn't nearly as bad as I thought.
BTW what happens with the suid check/drop being done twice? Do
commands that need suid work when you call them as "toybox cmdname"?
Are they supposed to?
in scripts/make.sh:

echo "USE_TOYBOX(NEWTOY(toybox, NULL, TOYFLAG_STAYROOT))" > \
generated/newtoys.h

The stayroot tells the multiplexer not to drop privs. (Commit 15a8d71674b4.)
Post by Rich Felker
Post by Rob Landley
It's not
as bad without the duplication, but that's still 4 unnecessary
round-trips to kernelspace for a lot of commands. Maybe the get[e]uid
stuff is hard to remove when the suid support is compiled-in, but
getauxval() could be used instead on systems that have it.
There was talk a few years back of putting getuid and friends in vdso
and the result was nobody bothered because the overhead wasn't high
enough for anybody to care. (People tend to call gettimeofday() in tight
loops because it constantly changes.)
I could cache it in libc too, but I feel the risk of returning a stale
value outweighs the performance benefits, so I didn't do it. But if
you're checking it at startup (without having done uid changes),
getauxval is reliable.
$ man getauxval | grep CONFORMING -A 1
CONFORMING TO
This function is a nonstandard glibc extension.

So you just suggested I replace a posix function with a nonstandard
glibc extension in the name of saving a system call that does an
unlocked fetch on a single integer.

Well that's certainly a point of view.
Post by Rich Felker
Post by Rob Landley
The difference with using clone() directly is that __thread variables
are not going to work and it might not even be safe to call any libc
functions. glibc doesn't document what is or isn't safe; I could go
into detail on the topic with musl if anyone cares to hear.
Feel free, but if threads need to do something non-obvious that clone()
doesn't, I don't want to get threading on me.
What they do is specifically setting up TLS, both visible TLS that
belongs to the application/libraries and libc-internal stuff.
And activate cancellation point stuff and locking in libc that was a
_fun_ source of subtle bugs in uClibc, and then make errno stop being a
global variable, and they open a can of worms of asynchronous race
conditions for debugging...
Post by Rich Felker
In principle the former could be skipped if you know your code is not
using TLS,
Which would mean not using errno?

Me, I just declared a structure in main() so it was at the base of the
stack and passed a darn "that" pointer around to it (or bouncing off
linked list of pid/pointer pairs to look it up if I hadn't wanted to
pass the pointer around, which could be a hash if there were enough
threads but there never were in anything I played with back when I
cared) so get_that() was a lightweight enough function not to hugely care...

I have built my own infrastructure for this sort of thing from first
principles more than once over the years, but my first question these
days is "do we really need to go there".
Post by Rich Felker
but the latter could be dangerous not to have setup right.
I *think* (this needs checking) the new thread created manually by
clone will use the same TLS pointer as the thread that called clone.
In this case, libc internals will potentially be reading and writing
the same data, without any synchronization.
See "decided to just use fork(), which means re-exec /proc/self/exe as
necessary because exec(NULL) doesn't re-exec yourself despite multiple
proposals over the year that the kernel just DO that since the kernel
knows the right inode even if proc isn't mounted"...

Ahem.
Post by Rich Felker
The most obvious case that
_necessarily_ breaks is that they're both using the same errno, so you
can never reliably check errno,
Actually, there are horrible, horrible, horrible ways to make it work.
(Did you now you can implement your own page fault handler in userspace?
Or you can suspend all the other clones with SIGSTOP, do your thing and
check errno, and resume all the other clones. There are a _bunch_ of bad
ways to do this.) But I'm already deep into "not going there"...
Post by Rich Felker
and anything libc-internal that checks
errno as part of its operation could get the wrong value and badly
break.
I know glibc's pthread library is big and heavy, but in musl there's
really nothing you "don't want to get on you".
Did you miss the part where I DID THREADED PROGRAMMING FOR YEARS IN
COLLEGE AND LATER FOR MY DAY JOB? (And then did Java as my main
programming language for about 3 more years, which was all threads
because they didn't wrap poll() or select(), but that's another story.)

Ok, I'm throwing this topic in the "religion" bucket and doing the
hiding behind the sofa pretending to not be home when you ring the
doorbell thing on the topic from now on.
Post by Rich Felker
Post by Rob Landley
Commands that _can't_ run without fork() generally have a reason, such
as they don't free all their memory on error paths, they've changed
signal handlers, they may leave memory mappings, there may be opendir()
state, there could be exit handlers... And no, beefing up the error and
exit handling to make TOYBOX_CLEANUP_TO_HUMOR_VALGRIND be mandatory
doesn't fix it because ctrl-c can come in anywhere.
The obvious case where you need fork not to work around implementation
limitations like the above, but for an actual semantic reason, is for
pipelines and () subshells.
You can't intercept sigstop so you'd have to write some sort of filter
intercepting I/O and faking your own PTY to implement ctrl-z and I am
so, so, so not going there. Plus fg, bg, &.
^Z sends SIGTSTP, not SIGSTOP. I agree it would be some work to make
fake job control where no processes exist, but I would just use real
processes for interactive shells with job control.
As would I. Not threads.
Post by Rich Felker
Performance difference is not noticable there anyway. The case where
you get a measurable benefit from avoiding forking and execing is in
scripts, where you don't have job control to worry about.
You're once again recommending I have two codepaths implementing the
same functionality. (And possibly basing it on the assumption that shell
scripts aren't going to do crazy asynchronous stuff like scripts/make.sh
does, let alone try to implement "disown"...)
Post by Rich Felker
Rich
Rob
Rich Felker
2015-06-30 02:25:00 UTC
Permalink
Post by Rob Landley
Post by Rich Felker
Post by Rob Landley
Are there perhaps any other shells that work on nommu? It would be
nice to be able to put off toysh until you have time to do a really
good job on it rather than rushing it because there's nothing else you
can use...
Eh, I can do it in stages. And setting up another shell is effort (and
natural test/use cases) that's _not_ going into toysh development, so
I'd rather do it right than do other things that get thrown away.
That said, sure, http://www.cod5.org/archive/ links to es and pdksh for
example, and uclibc had like 5 of them (all kinda crappy if I recall).
None really an improvement on hush in terms of actually building stuff.
I should have clarified these were other shells to look at with
licensing I didn't mind distributing, not that I'd confirmed they work
with nommu. (Except the uclinux shells, which are _crap_.)
Yes, I noticed. Whatever that shell from uclinux I was using at first
was, it was utterly awful.
Post by Rob Landley
Post by Rich Felker
I'm pretty sure pdksh conforms to POSIX (at least the 1992 version) so
Uh-huh. 1992 was 23 years ago. That would mean the shell's active
development predates Linux. That standard is a decade older than the
stale version of bash I'm trying to move off of because stuff keeps
breaking when I use it.
I'm not aware of important changes in the shell spec since then, but I
may be mistaken.
Post by Rob Landley
And on those grounds you expect it to be better than hush?
I recall reading that hush is known to be missing some important
functionality, though I don't remember what. It worked fine for my
interactive use and simple scripts though.
Post by Rob Landley
Post by Rich Felker
it's probably sufficient for running portable scripts including all
configure scripts.
You think all configure scripts are portable?
Really?
Anything produced by autotools that's not using custom shell script
level code (just the standard m4 macros, etc.) is theoretically
completely portable, and in practice very much so. I've run these
scripts on all kinds of crazy proprietary unices and when I've
encountered trouble, it hasn't been at the shell interpreter level.
Post by Rob Landley
* The interface to the rest of the shell should probably be changed
* to allow use of vfork() when available but that would be way too much
* work :)
...
/* create child process */
forksleep = 1;
while ((i = fork()) < 0 && errno == EAGAIN && forksleep < 32) {
if (intrsig) /* allow user to ^C out... */
break;
sleep(forksleep);
forksleep <<= 1;
}
So no, that doesn't work with nommu either. (And I have no idea what
weird historical bug that was trying to work around back in 1992.)
OK, so that idea is out. Guess you need to write toysh. :-)
Post by Rob Landley
Post by Rich Felker
Post by Rob Landley
Yes. If you call toybox as a command name, it gets the normal command
setup done for it. I'm not optimizing for that case. Pilot error.
OK. This was the source of the duplicates. I would still like to see
no extra syscalls at all (much like my feeling about glibc startup),
but the situation isn't nearly as bad as I thought.
BTW what happens with the suid check/drop being done twice? Do
commands that need suid work when you call them as "toybox cmdname"?
Are they supposed to?
echo "USE_TOYBOX(NEWTOY(toybox, NULL, TOYFLAG_STAYROOT))" > \
generated/newtoys.h
The stayroot tells the multiplexer not to drop privs. (Commit 15a8d71674b4.)
Looks good.
Post by Rob Landley
Post by Rich Felker
Post by Rob Landley
It's not
as bad without the duplication, but that's still 4 unnecessary
round-trips to kernelspace for a lot of commands. Maybe the get[e]uid
stuff is hard to remove when the suid support is compiled-in, but
getauxval() could be used instead on systems that have it.
There was talk a few years back of putting getuid and friends in vdso
and the result was nobody bothered because the overhead wasn't high
enough for anybody to care. (People tend to call gettimeofday() in tight
loops because it constantly changes.)
I could cache it in libc too, but I feel the risk of returning a stale
value outweighs the performance benefits, so I didn't do it. But if
you're checking it at startup (without having done uid changes),
getauxval is reliable.
$ man getauxval | grep CONFORMING -A 1
CONFORMING TO
This function is a nonstandard glibc extension.
So you just suggested I replace a posix function with a nonstandard
glibc extension in the name of saving a system call that does an
unlocked fetch on a single integer.
Well that's certainly a point of view.
While glibc was the first to add it, it's conceptually something nice
to have on any ELF-based platform or platform with ELF-like process
startup semantics, and I worked with the glibc team on making it suck
less (inability to distinguish no-value from a value equal to the
no-value return code, etc.) and once we agreed on behavior,
implemented the same in musl.

You can take or leave it, but if you're doing Linux-specific stuff in
toybox already, it doesn't 'feel' terribly 'dirty' to me, and it cuts
down on the syscall overhead.
Post by Rob Landley
Post by Rich Felker
Post by Rob Landley
The difference with using clone() directly is that __thread variables
are not going to work and it might not even be safe to call any libc
functions. glibc doesn't document what is or isn't safe; I could go
into detail on the topic with musl if anyone cares to hear.
Feel free, but if threads need to do something non-obvious that clone()
doesn't, I don't want to get threading on me.
What they do is specifically setting up TLS, both visible TLS that
belongs to the application/libraries and libc-internal stuff.
And activate cancellation point stuff and locking in libc that was a
_fun_ source of subtle bugs in uClibc,
That's active even without pthread_create being called, but it won't
be acted upon unless your program calls pthread_cancel somewhere.
Post by Rob Landley
and then make errno stop being a
global variable,
errno is thread-local whether or not there's more than one thread, but
accessing it won't work right if you call clone() yourself.
Post by Rob Landley
and they open a can of worms of asynchronous race
conditions for debugging...
Only if you're accessing the same data. Most of the good uses of
threads have no shared data at all.
Post by Rob Landley
Post by Rich Felker
In principle the former could be skipped if you know your code is not
using TLS,
Which would mean not using errno?
Yes, that was in the text that immediately followed where you split
the quote. :-)
Post by Rob Landley
Post by Rich Felker
The most obvious case that
_necessarily_ breaks is that they're both using the same errno, so you
can never reliably check errno,
Actually, there are horrible, horrible, horrible ways to make it work.
(Did you now you can implement your own page fault handler in userspace?
Or you can suspend all the other clones with SIGSTOP, do your thing and
check errno, and resume all the other clones. There are a _bunch_ of bad
ways to do this.) But I'm already deep into "not going there"...
None of this is safe or reliable. Even if you manage to make it work
on one particular setup, there are all kinds of ways it can and will
break under changes to libc that you shouldn't be poking at..

BTW you can't suspend threads (CLONE_THREAD) separately with SIGSTOP.
You could do it with CLONE_VM without CLONE_THREAD, making processes
that share memory, but these don't work at all under qemu-user.
Post by Rob Landley
Post by Rich Felker
Performance difference is not noticable there anyway. The case where
you get a measurable benefit from avoiding forking and execing is in
scripts, where you don't have job control to worry about.
You're once again recommending I have two codepaths implementing the
same functionality. (And possibly basing it on the assumption that shell
scripts aren't going to do crazy asynchronous stuff like scripts/make.sh
does, let alone try to implement "disown"...)
disown is for job control, and thus irrelevant to scripts.

I'm not pressing you to do this, and it's quite clear that you don't
want to, so that's fine. But I think it would be possible (and IMO
interesting and useful) to make a nommu-friendly shell that uses
threads and thereby is fast at running complex scripts that would
otherwise involve A LOT of re-exec on nommu. It's certainly an
interesting task to leave open for the future of nommu, and that's
part of why I brought it up for discussion here. But please don't
think I'm trying to pressure you to implement it -- I know you don't
want to.

Rich
Christopher Covington
2015-07-02 14:44:26 UTC
Permalink
Hi Rob,
Post by Rob Landley
Post by Rich Felker
but the latter could be dangerous not to have setup right.
I *think* (this needs checking) the new thread created manually by
clone will use the same TLS pointer as the thread that called clone.
In this case, libc internals will potentially be reading and writing
the same data, without any synchronization.
See "decided to just use fork(), which means re-exec /proc/self/exe as
necessary because exec(NULL) doesn't re-exec yourself despite multiple
proposals over the year that the kernel just DO that since the kernel
knows the right inode even if proc isn't mounted"...
Is the following much different from the desired exec(NULL) behavior you're
describing?
Post by Rob Landley
For example, if you have an open file descriptor on an executable file, you
can execute it by calling execveat(), passing the file descriptor, an empty
path, and the AT_EMPTY_PATH flag.
https://lwn.net/Articles/649115/

Chris
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
Rich Felker
2015-07-02 16:13:37 UTC
Permalink
Post by Christopher Covington
Hi Rob,
Post by Rob Landley
Post by Rich Felker
but the latter could be dangerous not to have setup right.
I *think* (this needs checking) the new thread created manually by
clone will use the same TLS pointer as the thread that called clone.
In this case, libc internals will potentially be reading and writing
the same data, without any synchronization.
See "decided to just use fork(), which means re-exec /proc/self/exe as
necessary because exec(NULL) doesn't re-exec yourself despite multiple
proposals over the year that the kernel just DO that since the kernel
knows the right inode even if proc isn't mounted"...
Is the following much different from the desired exec(NULL) behavior you're
describing?
Post by Rob Landley
For example, if you have an open file descriptor on an executable file, you
can execute it by calling execveat(), passing the file descriptor, an empty
path, and the AT_EMPTY_PATH flag.
https://lwn.net/Articles/649115/
It doesn't help, because a program does not have an open file
descriptor to its executable file. Obtaining one is just as hard as
obtaining a pathname (or using /proc/self/exe).

Rich
Rob Landley
2015-07-02 22:08:19 UTC
Permalink
Post by Christopher Covington
Hi Rob,
Post by Rob Landley
Post by Rich Felker
but the latter could be dangerous not to have setup right.
I *think* (this needs checking) the new thread created manually by
clone will use the same TLS pointer as the thread that called clone.
In this case, libc internals will potentially be reading and writing
the same data, without any synchronization.
See "decided to just use fork(), which means re-exec /proc/self/exe as
necessary because exec(NULL) doesn't re-exec yourself despite multiple
proposals over the year that the kernel just DO that since the kernel
knows the right inode even if proc isn't mounted"...
Is the following much different from the desired exec(NULL) behavior you're
describing?
Post by Rob Landley
For example, if you have an open file descriptor on an executable file, you
can execute it by calling execveat(), passing the file descriptor, an empty
path, and the AT_EMPTY_PATH flag.
https://lwn.net/Articles/649115/
The problem is you need to get a file descriptor to your currently
running executable (it's not one of the file descriptors you inherit as
part of your environment), which means you need to find your executable,
which means /proc/self/exe (which may not be mounted) or a search of
$PATH looking for argv[0] (which is just a heuristic, your calling
program can pass you anything in argv and envp; a fun corner cases is
that if you didn't define PATH bash will set a default one but as a
_local_ variable, not exported to the child process).

The kernel already knows this information, it has to be able to provide
/proc/self/exe for you from its internal data about your process. But
there's no way to ask it to use what it's already got, you have to
reconstruct it and feed it back to it.

It would be lovely if this showed up, I've been asking for it on and off
forever: http://lkml.iu.edu/hypermail/linux/kernel/0612.3/0238.html

But then people were going "it would be nice if there was a way to punch
a hole in an exesting file" years before we got a way to do that...

Sigh. The commit that added fexecve() last year
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=51f39a1f0cea
links to a _reply_ to my above post, while missing the point actually
raised in the first message of the thread.

Rob

Isaac Dunham
2015-06-29 22:55:03 UTC
Permalink
Post by Rich Felker
Post by Rob Landley
Are there perhaps any other shells that work on nommu? It would be
nice to be able to put off toysh until you have time to do a really
good job on it rather than rushing it because there's nothing else you
can use...
Eh, I can do it in stages. And setting up another shell is effort (and
natural test/use cases) that's _not_ going into toysh development, so
I'd rather do it right than do other things that get thrown away.
That said, sure, http://www.cod5.org/archive/ links to es and pdksh for
example, and uclibc had like 5 of them (all kinda crappy if I recall).
None really an improvement on hush in terms of actually building stuff.
I'm pretty sure pdksh conforms to POSIX (at least the 1992 version) so
it's probably sufficient for running portable scripts including all
configure scripts. I know your goal is much higher (bash
compatibility) but it sounds like pdksh could get you a working
environment until you finish toysh, with little or no effort spent on
the temporary solution.
Some comments:
- pdksh is a lot closer to bash compatability than "conforms to the 1992
edition of POSIX" would imply.
This is partly because bash copied many of its feaures from ksh.

- there *are* configure scripts that need bash--I don't remember the
software, but I have run into them. Of course, I've also seen configure
scripts that used ed and software that required the old csh (the one
that got "&&" and "||" backwards) to build.
I never looked into exactly what was required, though: I just rm'd
anything like that.
configure scripts used to be portable, but nowdays they often just try.

- pdksh works almost everywhere, but this is portability the hard way:
there are special cases for every *nix of the time, as well as *every*
other OS (yes, DOS and EMX support included...and probably VMS as well).
And the code is every bit as ugly as one would infer from that, and
quite possibly more so.

So theoretically, there probably is a code path that works with nommu.
But it will almost certainly be nontrivial to even identify it, let
alone get nommu properly detected and support automatically built.

- OpenBSD ships a version of ksh that has been minimally altered as
their /bin/sh; this is the basis for loksh. It has been modified
only minimally.

I think it's entirely sensible for Rob to not poke at pdksh; getting it
to work right on all Aboriginal's platforms is likely to be a large
drain. If someone else wants to fix it up and patch Aboriginal until
they can pass the smoketests on all platforms, accepting their patch
to use pdksh in Aboriginal might be sensible; but before that, it's
too big a project for a temporary gain.


Thanks,
Isaac Dunham
Roy Tam
2015-06-30 00:11:26 UTC
Permalink
Hi all,
Post by Isaac Dunham
Post by Rich Felker
Post by Rob Landley
Are there perhaps any other shells that work on nommu? It would be
nice to be able to put off toysh until you have time to do a really
good job on it rather than rushing it because there's nothing else you
can use...
Eh, I can do it in stages. And setting up another shell is effort (and
natural test/use cases) that's _not_ going into toysh development, so
I'd rather do it right than do other things that get thrown away.
That said, sure, http://www.cod5.org/archive/ links to es and pdksh for
example, and uclibc had like 5 of them (all kinda crappy if I recall).
None really an improvement on hush in terms of actually building stuff.
I'm pretty sure pdksh conforms to POSIX (at least the 1992 version) so
it's probably sufficient for running portable scripts including all
configure scripts. I know your goal is much higher (bash
compatibility) but it sounds like pdksh could get you a working
environment until you finish toysh, with little or no effort spent on
the temporary solution.
- pdksh is a lot closer to bash compatability than "conforms to the 1992
edition of POSIX" would imply.
This is partly because bash copied many of its feaures from ksh.
- there *are* configure scripts that need bash--I don't remember the
software, but I have run into them. Of course, I've also seen configure
scripts that used ed and software that required the old csh (the one
that got "&&" and "||" backwards) to build.
I never looked into exactly what was required, though: I just rm'd
anything like that.
configure scripts used to be portable, but nowdays they often just try.
there are special cases for every *nix of the time, as well as *every*
other OS (yes, DOS and EMX support included...and probably VMS as well).
And the code is every bit as ugly as one would infer from that, and
quite possibly more so.
So theoretically, there probably is a code path that works with nommu.
But it will almost certainly be nontrivial to even identify it, let
alone get nommu properly detected and support automatically built.
- OpenBSD ships a version of ksh that has been minimally altered as
their /bin/sh; this is the basis for loksh. It has been modified
only minimally.
I think it's entirely sensible for Rob to not poke at pdksh; getting it
to work right on all Aboriginal's platforms is likely to be a large
drain. If someone else wants to fix it up and patch Aboriginal until
they can pass the smoketests on all platforms, accepting their patch
to use pdksh in Aboriginal might be sensible; but before that, it's
too big a project for a temporary gain.
Regarding pdksh, IIRC it uses fork() for subshells (just like mksh).
I wonder how can it work in NoMMU environment.
Post by Isaac Dunham
Thanks,
Isaac Dunham
_______________________________________________
Toybox mailing list
http://lists.landley.net/listinfo.cgi/toybox-landley.net
Loading...