Post by enhPost by Rob LandleyPost by enhPost by Rob LandleyThe version I checked in won't error out for 'factor ""' or 'factor "36 "'
the way Ubuntu's will, but I think I'm ok with that...?
out of curiosity, what practical use is there for factor? even the
coreutils version gives up around 38 decimal digits, and it's pretty
slow even with numbers that small.
I was reading http://www.muppetlabs.com/~breadbox/txt/rsa.html#14 on a
long bus ride, because I probably have to implement TLS someday (by
which I mean https not thread local storage) because wget can't talk to
the world without encryption anymore (thanks NSA), and the section I
linked to above used "factor", and I went "that's a command? Apparently
so. This is probably like a dozen lines to implement"... and had it
working before the end of the bus ride.
speaking of which (and going back to "simple is complex"), i have an
openssl- (or boringssl-)based md5sum/sha1sum implementation that adds
all the other shas too. (a toybox built with all these is actually a
couple of hundred bytes larger than the one with just md5/sha1sum, but
that's because of the duplicated help strings.)
Actually the main reason I don't include external code is licensing.
Last year I gave two talks about how I went from GPL fanboy to advocate
of the public domain. I didn't quite fit either of them in the assigned
timeslot, but the more coherent of the two is probably:
https://archive.org/download/OhioLinuxfest2013/24-Rob_Landley-The_Rise_and_Fall_of_Copyleft.mp3
The current toybox license places the code into the public domain. It
_looks_ like a BSD license, and I sometimes call it "zero clause BSD"
because of this, but the requirement to copy this specific license text
into derivative works is absent. This means it's a permission grant that
allows reusing the code without even attributing it. (Attribution is
_polite_, but it's possible to plagiaraize shakespeare. That's not a
licensing issue, and these days Google makes it pretty easy for teachers
to catch all those recycled term papers anyway.)
The problem with BSD-style licenses is that there are a lot of them (2
clause BSD, 3 clause BSD, 4 clause BSD, ISC, MIT, Apache, and so on)
that all try to do the same thing but all of them say "you must copy
this specific wording into your derived work", so if you combine code
from two sources under different BSD variants you wind up concatenating
the licenses, and this can get epically silly (the kindle paperwhite's
about->licenses thing is over 300 pages of concatenated license
boilerplate.)
I respect BSD/ISC/Apache license terms enough _not_ to treat them as
public domain. I would like toybox to provide a source of reusable
public domain code, it's one of the goals of the project.
Toybox has included explicitly public domain code from external sources
(such as the xz implementation for toys/pending/xzcat.c), and I've
looked at the libtom bignum library for implementing bc (haven't managed
to make much sense of it, to be honest). But I recently turned down a
ping.c submission that was based on BSD ping, in favor of writing my own.
Post by enhi know one of your goals is to minimize dependencies,
I'm juggling an awful lot of conflicting goals. (Most of them listed on
the roadmap or design pages.)
Because of this, toybox is probably going to implement more than a lot
of users need, but as long as the commands are self-contained you can
switch off any command in your config that you don't want to ship.
If, for auditing reasons, you don't want to use toybox's sha1sum but
instead want to use an openssl derived version that shares code with
other sha1sum instances you've already cleared and are using elsewhere,
then that's what makes sense for your deployment. (If you grow to trust
toybox's version later, you can swich to it then after everybody else
has looked at it longer.)
Post by enhbut for us the
goal of minimizing duplication (and thus amount of code to audit) is
probably stronger. i suspect no one really cares that the toybox
hashes are slower than the openssl ones, but the security folks
probably will care about having another TLS implementation.
Indeed, and I agree. I don't _want_ to write TLS, I think it's out of
scope for toybox... except that I need the functionality to do basic web
transactions that _are_ in scope. (The internet's changing out from
under me. Two years ago you could talk to github, kernel.org, and
twitter without https. Now if you try they redirect.)
What I really want is an "stunnel" variant that works, so I can pipe an
https session through something that encrypts it for me.
https://www.stunnel.org/index.html
I tried to convince dropbear to add one years ago, but their reply was
more or less "patches welcome".
http://lists.ucc.gu.uwa.edu.au/pipermail/dropbear/2007q1/000506.html
http://lists.ucc.gu.uwa.edu.au/pipermail/dropbear/2008q4/000859.html
I prefer not to link toybox against external libraries (I could give a
long talk about why, but not here), and sucking in nontrivial amounts of
external code to maintain a local copy has its own large downsides. But
calling reasonably standardized external commands and piping stuff
through them? I'm all for it.
In fact toybox commands are designed to be able to call external
versions of commands even when toybox has its own implementation. That's
why mount.c doesn't check if CFG_LOSETUP is enabled before trying to
xpopen("losetup"), if it's there in the $PATH but not in this binary, ok
then.
As for the md5/sha1/sha256/sha3, they're easy to test (their failures
tend to be really obvious), and the two I implemented are inherently
timing invariant and don't have obvious sidechannel attacks. And I _can_
find existing public domain impelmentations of these to start from, such as:
http://cpansearch.perl.org/src/BJOERN/Compress-Deflate7-1.0/7zip/C/Sha256.c
So adding the other hashing functions to toybox makes sense to me,
especially since I need them for a traditional /etc/shadow login.c. (I
need to research android's user database and how to access it.)
That said, I _do_ care that they're slower than other implementations.
That's a simple vs fast balance that's... I took the first speedup
patch, didn't take the second speedup patch, and I need to go back and
look at it...
Post by enh(and
things like reimplementing zlib and bunzip2 probably fall somewhere in
between.)
One of the goals I'm juggling is "busybox replacement", and they have
this stuff. But again that's just a weighting, busybox alrady contains a
lot of stuff we're _not_ implementing.
If I was starting from scratch today I might leave them out, but I have
a history with both bzip2 and gzip which makes it easier for me to keep
both of them in scope.
The one we really _need_ is deflate/inflate, because we should have a
compression algorithm and that's the simplest and most lightweight one.
The extract side of the other two are there because tarballs come in
that format and a build environment needs to be able to extract them
(another goal I'm juggling. The strace source is _only_ available as .xz
these days, for example.)
But I probably won't bother with the compression side of bzip2 or xz. If
you want to create a new tarball we support gzip and if you want it in
those other formats you can install the other package.
To explain my "history with bzip2 and gzip" above:
I reimplemented bunzip2 years ago because the original was horrid, and
my implementation got sucked up into a bunch of places. (I think the
kernel uses it if you select bzip compression, although these days gzip
or xz are the dominant ones.)
I also wrote 90% of bzip2 compression side support a decade back for
busybox, but got distracted near the end and never got back to it
because the bzip2 compression algorithm is WEIRD:
http://lists.busybox.net/pipermail/busybox/2004-February/010859.html
Even _with_ most of the work done I probably won't bother with bzip2
compression side unless somebody really wants it, both because it's
semi-obsolete these days and because its compression is based on weird
heuristics for the string sorting that I've never managed to clean up
into something understandable. (The "crap.c" above, which is a series of
fallbacks between different sorting algorithms with no explanation of
_why_.) I _can't_ simplify this into something easy to understand that
somebody might want to use as example code in a middle school
programming class, the algorithm is just inherently nuts.
I already did gunzip a few months ago, and I'm working on gzip
compression side support now. I wrote a java implementation of that back
when Java 1.0 didn't include it in the base library. (Java 1.1 came out
before I did the decompression side that time, so I moved on to other
things.) I took info-zip apart back when I as programming for OS/2, I
actually know that one pretty well. So that's probably the only
compressor I'll implement, when it's done it shuld be less than 500
lines of code. Also, Ashwini Sharma asked me to prioritize that so they
can use it in a product.
As for xz: I received an external contribution based on the public
domain decompressor. The "fetch tarball, extract, configure, make,
install" codepath needs to be able to extract them, and the code's
already in. (And is horrible, there's built-in knowledge of various
processor machine language formats, which strongly implies upgrades will
need more of this filigree for new processor variants.)
But I don't particularly want to do the compression side for that.
If you decide to switch off our bunzip2 and use the external version
instead, toybox "tar" should call out to it and pipe stuff through it
just fine. (I dunno if it currently _does_, but once I've cleaned it up...)
Post by enhin this specific case the openssl API is reasonable enough your
implementations could be a drop-in replacement, but i suspect in other
cases part of your motivation for writing your own will have been the
awful API.
Part, yes. But only part. I mentioned licensing above. There's also the
fact that I can often come up with objectively better code.
In the case of bunzip2, back in 2003 I replaced this:
http://git.busybox.net/busybox/tree/archival/libunarchive/decompress_bunzip2.c?id=6fe55ae93983
With this:
http://git.busybox.net/busybox/tree/archival/libunarchive/decompress_bunzip2.c?id=0d6d88a2058d
That's not just replacing 1658 lines with 531 lines: try actualy reading
the old code. Contemplate the "save state" and the big switch/case in
the main function starting at line 395. (They copy all the local
variables out of a structure, each call, and copy them back before
returning. They use a switch/case with labels covering the whole
function so they can to jump back into the middle of nested loops.
That's so it could return when it ran out of data and be called to
resume decompressing once the buffer was filled. I replaced that with a
get_bits() call that had the filehandle stored away and could read more
data if it needed to.)
Yes, that's Julian Sewards bunzip2 code. That wasn't something toybox
did to it, that's what the upstream package they copied had always been
like.
A more recent case where I shrake a codebase to 1/3 of its original
size/complexity was ifconfig. I described what I did at length here:
http://landley.net/toybox/cleanup.html#ifconfig
The "old" and "new" lines with the totals are links to the original and
changed file. I described each change on the mailing list, and collected
links to all the descriptions on that page. You might want to read just
the first description here:
http://lists.landley.net/pipermail/toybox-landley.net/2013-April/000882.html
Note: the ifconfig I received was a professional contribution from a
team of experienced coders, and what they sent me did work. I'm just...
picky.
Post by enhalso in this specific case there's almost no sharing
between the implentations anyway because 99% of the code is the hash
implementation itself. but if you can, keeping API compatibility with
the library you're trying to replace would be good.
I've pondered adding zlib bindings for deflate/inflate when I get them
done, but that's a post-1.0 thing.
I note that one of my first interactions with Rich Felker (the musl
maintainer) was him explaining to me what would be involved in making an
executable also be a shared library (so you can have libz.so be a
symlink to busybox so -lz was satisfied with the busybox code). Google
finds the old thread at:
http://lists.uclibc.org/pipermail/busybox/2006-April/054373.html
Busybox never did that, but toybox might. Not in the 1.0 release,
though. (I _think_ it's worth the complexity? Obviously only if there's
a config option to not do that...)
However, when researching deflate I read the zlib source, and the
info-zip source, and the plan 9 source, and three different "tiny"
implementations (the _least_ useful of which was miniz.c, classic
example of the kind of code shrinkage tricks I'm trying to _avoid_...)
Post by enhanyway, let me know whether you'd like to merge stuff like this into
the main codebase. otherwise i can just "git rm" locally and add the
alternative version to toys/android.
Toybox commands can all be switched off. Any command you've got a better
implementation of (for any metric of better), feel free to switch them
off. I'd very much like to _improve_ toybox's version until you feel
it's the better one, but "we audited this other codebase already"
Post by enhi'll get a delete/merge conflict
if you change anything in your version so i'll be able to track
changes, so it's only really a loss if you think you have other users
who'd prefer to use openssl.
Um, issue to be aware of: the subdirectories are just a developer
convenience, the command namespace is actually flat. So if you have a
NEWTOY(sha1sum) in toys/lsb and another NEWTOY(sha1sum) in toys/android,
the build will break when it hits the duplicate command name.
(Actually since you're not using our build infrastructure you can
probably just ignore that, and point your .mk files at the right .c
files for what you're building... :)
Post by enhPost by Rob LandleyP.S. At a design level I thought about defaulting it "n" but the
defconfig y/n signalling primarily indicates "is this done or not" and
it was finished and worked fine, so... (Well, the examples directory
also has stuff that defaults to "n" but factor isn't really a
demonstration of how to use the toybox infrastructure either.) And
defaulting "n" for other reasons is editorializing, where does it stop?
rev and tac? fallocate? makedevs? freeramdisk? partprobe? People _asked_
me to add most of those, because they needed them. If somebody want to
make a .config file selecting a subset of the commands, you can do that.
It's not my job to guess how people will use generic tools.
yeah, i was hoping to abdicate responsibility for subsetting and was
disappointed to find that 'default' didn't mean "you probably want
this". but it makes sense, and the subset that one project needs isn't
necessarily going to be the same as any other project.
Indeed. Something I learned back when I maintained busybox: don't try to
guess how people will use a hammer. You'll only get in the way.
Post by enhit sucks to be me though. the best i can aim for is to try to ensure
that there are roughly the same number of people complaining i put too
much in as people complaining i left too much out :-)
Oh I've still got that, just at a different level. "Should include this
command or not". (You're entirely right "factor" was a questionable call
there. It was sort of on the line even after I wrote it. I just cleaned
up "mix.c" which is another one. Deciding whether to merge that I was
looking at the aumix man page and going "this is simpler, but that's
more standard, but nobody's _asked_ for the bigger one yet and that's
mostly about curses mode instead of command line, and this seems to do
the minimum you need...")
So much easier when there's a standards document to blame. (Of course I
vetoed like 1/3 of the posix command list anyway. Nobody needs sccs in
2014.)
Post by enhit's a pity the debian popularity contest only has per-package data
(https://qa.debian.org/popcon.php?package=coreutils). if you ask
people they always tell you they use everything "all the time". even
if you broke it two releases ago and removed it one release ago.
Have you read toybox's roadmap.html page? It may not meet your needs but
at least I have _reason_ for listing the commands I did. :)
Always happy to have another viewpoint to rejuggle the weightings...
When I get my darn server reinstalled and get AOSP on it, I want to run
the AOSP build with the toybox commands. (Aboriginal Linux is using an
old version of linux from scratch as a bootstrapping test, but android's
build needs more commands than that. And may use command line options
that toybox doesn't implement yet. I know _you_ aren't trying to get
android self-hosting anytime soon, but I still am. :)
Post by enhto work out which options are important for the commands that toolbox
and toybox have in common, i've been relying on my command-line
history, what i can find in scripts, and whether someone cared enough
to add/fix something. but i don't yet have a plan for all the stuff in
toys/pending.
My plan is to clean them up (the way I did the other cleanup.html
things) and get them out of pending.
It's surprisingly time consuming, but if you read through the history of
one of the cleanups I documented there, you can see why...
Post by enhi also haven't thought much about "in the binary" versus
"gets a symlink"; i suspect that the "too much" camp will be further
subdivided into those who're offended by the binary size and those
who're offended by the number of symlinks in /system/bin.
I don't understand the distinction here? (Is your build making
standalone binaries for the toybox commands ala scripts/single.sh? It
didn't look like it was but I have to stare at makefiles a lot to beat
any sense out of 'em...)
Post by enhand getting back to factor, i can't decide whether having it paints a
target on my back or gives me something i don't care about to throw
under the bus as a gesture of goodwill :-)
Politics, I can't help you with. :)
Rob