Discussion:
New tests for dirname and wc
(too old to reply)
Felix Janda
2012-10-27 22:22:52 UTC
Permalink
Hello,

attached are some simple tests for dirname and wc and
a fix for a small typo in another test script.

Posix specifies an -m option for wc, which toybox does
not implement. Should there be a test for this, too?

Why do the scripts actually use bash?

Regards,
Felix
-------------- next part --------------
diff -r b88859043af2 -r 7f41c9c49509 scripts/test/cmp.test
--- a/scripts/test/cmp.test Mon Sep 17 00:17:16 2012 -0500
+++ b/scripts/test/cmp.test Sat Oct 27 22:29:02 2012 +0200
@@ -1,4 +1,4 @@
-#/bin/bash
+#!/bin/bash

[ -f testing.sh ] && . testing.sh

-------------- next part --------------
diff -r 88fa9133995e -r 929003d8c4f9 scripts/test/dirname.test
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/scripts/test/dirname.test Sun Oct 28 00:06:38 2012 +0200
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+[ -f testing.sh ] && . testing.sh
+
+#testing "name" "command" "result" "infile" "stdin"
+
+testing "dirname /-only" "dirname ///////" "/\n" "" ""
+testing "dirname trailing /" "dirname a//////" ".\n" "" ""
+testing "dirname combined" "dirname /////a///b///c///d/////" "/////a///b///c\n" "" ""
+testing "dirname /a/" "dirname /////a///" "/\n" "" ""
-------------- next part --------------
diff -r 7f41c9c49509 -r 88fa9133995e scripts/test/wc.test
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/scripts/test/wc.test Sat Oct 27 23:52:43 2012 +0200
@@ -0,0 +1,22 @@
+#!/bin/bash
+
+[ -f testing.sh ] && . testing.sh
+
+#testing "name" "command" "result" "infile" "stdin"
+
+cat >file1 <<EOF
+some words .
+
+some
+lines
+EOF
+
+testing "wc" "wc >/dev/null && echo yes" "yes\n" "" ""
+testing "wc empty file" "wc" "0 0 0\n" "" ""
+testing "wc standard input" "wc" "1 3 5\n" "" "a b\nc"
+testing "wc -c" "wc -c file1" "26 file1\n" "" ""
+testing "wc -l" "wc -l file1" "4 file1\n" "" ""
+testing "wc -w" "wc -w file1" "5 file1\n" "" ""
+testing "wc format" "wc file1" "4 5 26 file1\n" "" ""
+testing "wc multiple files" "wc input - file1" "1 2 3 input\n0 2 3 -\n4 5 26 file1\n5 9 32 total\n" "a\nb" "a b"
+rm file1
Rob Landley
2012-10-28 18:04:23 UTC
Permalink
Post by Felix Janda
Hello,
attached are some simple tests for dirname and wc and
a fix for a small typo in another test script.
Cool.
Post by Felix Janda
Posix specifies an -m option for wc, which toybox does
not implement. Should there be a test for this, too?
That's internationalization support, which I haven't implemented yet.

I think toybox should support utf-8, but am not as interested in
multiple translations and date formats and such. (Those belong at the
GUI/X11 level.)
Post by Felix Janda
Why do the scripts actually use bash?
The tl;dr version is "dash was a mistake on Ubuntu's part".

The long version is that Linux was literally created to run Bash. Linux
evolved out of a terminal program that Linus wrote in i386 assembly
language (booting on the bare metal, from a floppy disk) because the OS
he was using (Minix) couldn't keep up with a 2400 baud modem and he
wanted to dial in to the university unix machine to read usenet message
boards. He added the ability to read and write the minix filesystem so
his term program could download files from usenet, and then he added
the system calls necessary for it to run bash so he didn't have to boot
back into minix to run "rm" and "mkdir" and such between downloads.
(This is all in Linus's biography "Just for Fun", by the way.)

Bash was the first program Linux ever ran, and Bash remained the
standard Linux shell on all Linux distros until Ubuntu made the single
dumbest technical decision in its history. In Ubuntu 6.10, they
redirected #!/bin/sh to point to dash instead of bash, because "the
boot was too slow".

Yes, really: https://wiki.ubuntu.com/DashAsBinSh

Rather than modify the boot scripts to say #!/bin/bash at the top (a
change which was deemed 'too intrusive'), they changed the system
default shell. This broke all sorts of stuff (such as the kernel
build), but Ubuntu was the dominant distro at the time and could force
the change down people's throats.

Note: Ubuntu didn't _stop_ installing bash by default. Bash was still
there. They just added a second shell and redirected /bin/sh to point
to that. So bash is still there if you say #!/bin/bash.

The Ubuntu developers then realized that it hadn't fixed the boot speed
problem, and switched from system V init to upstart to get better
parallelism, something they should have done in the first place and
which rendered the change in #!/bin/sh moot. (Red Hat switched to
systemd instead, and they've been fighting over it ever since. Upstart
probably would have been able to shout down systemd if Ubuntu hadn't
convinced everybody it had horrible technical judgement with dash.)

Meanwhile, dash doesn't support path/{file1,file2} syntax, it doesn't
support
"diff -u <(sort -u file1) <(sort -u file2)", and so on. Last I checked
it didn't even support "set -o pipefail" which ksh and ash and such
have supported for years.

Toybox's built in shell is not attempting to duplicate dash. It's doing
a posix shell and then adding the bash extensions that make sense.

Rob
Felix Janda
2012-10-30 22:37:40 UTC
Permalink
Post by Rob Landley
Post by Felix Janda
Posix specifies an -m option for wc, which toybox does
not implement. Should there be a test for this, too?
That's internationalization support, which I haven't implemented yet.
I think toybox should support utf-8, but am not as interested in
multiple translations and date formats and such. (Those belong at the
GUI/X11 level.)
Ok.
Post by Rob Landley
Post by Felix Janda
Why do the scripts actually use bash?
The tl;dr version is "dash was a mistake on Ubuntu's part".
[...]
Toybox's built in shell is not attempting to duplicate dash. It's doing
a posix shell and then adding the bash extensions that make sense.
Rob
Thanks for the interesting read. Skimming testing.sh I see that it uses
bash extensions and therefore each test script should be executed by bash.
Presumably, the extensions used in testing.sh belong to the (not yet well-
defined) set of sane extensions to be implemented in toybox's sh at some
point?

Felix
Felix Janda
2012-10-30 22:37:40 UTC
Permalink
Post by Rob Landley
Post by Felix Janda
Posix specifies an -m option for wc, which toybox does
not implement. Should there be a test for this, too?
That's internationalization support, which I haven't implemented yet.
I think toybox should support utf-8, but am not as interested in
multiple translations and date formats and such. (Those belong at the
GUI/X11 level.)
Ok.
Post by Rob Landley
Post by Felix Janda
Why do the scripts actually use bash?
The tl;dr version is "dash was a mistake on Ubuntu's part".
[...]
Toybox's built in shell is not attempting to duplicate dash. It's doing
a posix shell and then adding the bash extensions that make sense.
Rob
Thanks for the interesting read. Skimming testing.sh I see that it uses
bash extensions and therefore each test script should be executed by bash.
Presumably, the extensions used in testing.sh belong to the (not yet well-
defined) set of sane extensions to be implemented in toybox's sh at some
point?

Felix
Felix Janda
2012-10-27 22:22:52 UTC
Permalink
Hello,

attached are some simple tests for dirname and wc and
a fix for a small typo in another test script.

Posix specifies an -m option for wc, which toybox does
not implement. Should there be a test for this, too?

Why do the scripts actually use bash?

Regards,
Felix
-------------- next part --------------
diff -r b88859043af2 -r 7f41c9c49509 scripts/test/cmp.test
--- a/scripts/test/cmp.test Mon Sep 17 00:17:16 2012 -0500
+++ b/scripts/test/cmp.test Sat Oct 27 22:29:02 2012 +0200
@@ -1,4 +1,4 @@
-#/bin/bash
+#!/bin/bash

[ -f testing.sh ] && . testing.sh

-------------- next part --------------
diff -r 88fa9133995e -r 929003d8c4f9 scripts/test/dirname.test
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/scripts/test/dirname.test Sun Oct 28 00:06:38 2012 +0200
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+[ -f testing.sh ] && . testing.sh
+
+#testing "name" "command" "result" "infile" "stdin"
+
+testing "dirname /-only" "dirname ///////" "/\n" "" ""
+testing "dirname trailing /" "dirname a//////" ".\n" "" ""
+testing "dirname combined" "dirname /////a///b///c///d/////" "/////a///b///c\n" "" ""
+testing "dirname /a/" "dirname /////a///" "/\n" "" ""
-------------- next part --------------
diff -r 7f41c9c49509 -r 88fa9133995e scripts/test/wc.test
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/scripts/test/wc.test Sat Oct 27 23:52:43 2012 +0200
@@ -0,0 +1,22 @@
+#!/bin/bash
+
+[ -f testing.sh ] && . testing.sh
+
+#testing "name" "command" "result" "infile" "stdin"
+
+cat >file1 <<EOF
+some words .
+
+some
+lines
+EOF
+
+testing "wc" "wc >/dev/null && echo yes" "yes\n" "" ""
+testing "wc empty file" "wc" "0 0 0\n" "" ""
+testing "wc standard input" "wc" "1 3 5\n" "" "a b\nc"
+testing "wc -c" "wc -c file1" "26 file1\n" "" ""
+testing "wc -l" "wc -l file1" "4 file1\n" "" ""
+testing "wc -w" "wc -w file1" "5 file1\n" "" ""
+testing "wc format" "wc file1" "4 5 26 file1\n" "" ""
+testing "wc multiple files" "wc input - file1" "1 2 3 input\n0 2 3 -\n4 5 26 file1\n5 9 32 total\n" "a\nb" "a b"
+rm file1
Rob Landley
2012-10-28 18:04:23 UTC
Permalink
Post by Felix Janda
Hello,
attached are some simple tests for dirname and wc and
a fix for a small typo in another test script.
Cool.
Post by Felix Janda
Posix specifies an -m option for wc, which toybox does
not implement. Should there be a test for this, too?
That's internationalization support, which I haven't implemented yet.

I think toybox should support utf-8, but am not as interested in
multiple translations and date formats and such. (Those belong at the
GUI/X11 level.)
Post by Felix Janda
Why do the scripts actually use bash?
The tl;dr version is "dash was a mistake on Ubuntu's part".

The long version is that Linux was literally created to run Bash. Linux
evolved out of a terminal program that Linus wrote in i386 assembly
language (booting on the bare metal, from a floppy disk) because the OS
he was using (Minix) couldn't keep up with a 2400 baud modem and he
wanted to dial in to the university unix machine to read usenet message
boards. He added the ability to read and write the minix filesystem so
his term program could download files from usenet, and then he added
the system calls necessary for it to run bash so he didn't have to boot
back into minix to run "rm" and "mkdir" and such between downloads.
(This is all in Linus's biography "Just for Fun", by the way.)

Bash was the first program Linux ever ran, and Bash remained the
standard Linux shell on all Linux distros until Ubuntu made the single
dumbest technical decision in its history. In Ubuntu 6.10, they
redirected #!/bin/sh to point to dash instead of bash, because "the
boot was too slow".

Yes, really: https://wiki.ubuntu.com/DashAsBinSh

Rather than modify the boot scripts to say #!/bin/bash at the top (a
change which was deemed 'too intrusive'), they changed the system
default shell. This broke all sorts of stuff (such as the kernel
build), but Ubuntu was the dominant distro at the time and could force
the change down people's throats.

Note: Ubuntu didn't _stop_ installing bash by default. Bash was still
there. They just added a second shell and redirected /bin/sh to point
to that. So bash is still there if you say #!/bin/bash.

The Ubuntu developers then realized that it hadn't fixed the boot speed
problem, and switched from system V init to upstart to get better
parallelism, something they should have done in the first place and
which rendered the change in #!/bin/sh moot. (Red Hat switched to
systemd instead, and they've been fighting over it ever since. Upstart
probably would have been able to shout down systemd if Ubuntu hadn't
convinced everybody it had horrible technical judgement with dash.)

Meanwhile, dash doesn't support path/{file1,file2} syntax, it doesn't
support
"diff -u <(sort -u file1) <(sort -u file2)", and so on. Last I checked
it didn't even support "set -o pipefail" which ksh and ash and such
have supported for years.

Toybox's built in shell is not attempting to duplicate dash. It's doing
a posix shell and then adding the bash extensions that make sense.

Rob
Rob Landley
2012-11-01 14:49:25 UTC
Permalink
Post by Felix Janda
Post by Rob Landley
Post by Felix Janda
Posix specifies an -m option for wc, which toybox does
not implement. Should there be a test for this, too?
That's internationalization support, which I haven't implemented
yet.
Post by Rob Landley
I think toybox should support utf-8, but am not as interested in
multiple translations and date formats and such. (Those belong at
the
Post by Rob Landley
GUI/X11 level.)
Ok.
I note that adding utf-8 support to wc might be an interesting small
project. It's basically mbrtowc() and possibly with wcswidth() on the
result. (I'd have to check the definition of -m to see if they want
characters output or character positions output).

If not, I should get around to it before too long. :)
Post by Felix Janda
Post by Rob Landley
Post by Felix Janda
Why do the scripts actually use bash?
The tl;dr version is "dash was a mistake on Ubuntu's part".
[...]
Toybox's built in shell is not attempting to duplicate dash. It's
doing
Post by Rob Landley
a posix shell and then adding the bash extensions that make sense.
Rob
Thanks for the interesting read. Skimming testing.sh I see that it
uses
bash extensions and therefore each test script should be executed by
bash.
Presumably, the extensions used in testing.sh belong to the (not yet
well-
defined) set of sane extensions to be implemented in toybox's sh at
some
point?
I'm interested in defining what those extensions are, but it's really
data collection. I know that I use <(command) and >(command), the
{curly,bracket} stuff, and pipefail. Several other things are synonyms:
$[1+2] is more or less $((1+2)), saying "function" before a function
definition is a NOP...

Aboriginal linux is building bash 2.05b, because last I checked busybox
ash couldn't build LFS. (This may have changed, I haven't rechecked in
a while.) But most packages I tried didn't need the bash stuff
introduced in 3.x or 4.x. Then again, I know this version of bash is
too old to run gentoo's portage package manager (which uses newer bash
features: some quoting rule changed, and it uses the ~= regex thing).
At one point I patched portage to work with older bash, but that's
pretty stale.

I'd like to get toysh to run portage, the aboriginal linux build, and
make it through linux from scratch (what are they up to, 7.2? I've got
an automated 6.8 build that needs updating...)

Rob
Felix Janda
2012-11-04 23:04:13 UTC
Permalink
Post by Rob Landley
I note that adding utf-8 support to wc might be an interesting small
project. It's basically mbrtowc() and possibly with wcswidth() on the
result. (I'd have to check the definition of -m to see if they want
characters output or character positions output).
If not, I should get around to it before too long. :)
wc -m only cares about counting characters. Attached is a try on
implementing it and some test cases for it. The test cases are only for
UTF-8 locales.

I think that a config option for internalization support should be added.
Post by Rob Landley
I'm interested in defining what those extensions are, but it's really
data collection. I know that I use <(command) and >(command), the
$[1+2] is more or less $((1+2)), saying "function" before a function
definition is a NOP...
Aboriginal linux is building bash 2.05b, because last I checked busybox
ash couldn't build LFS. (This may have changed, I haven't rechecked in
a while.) But most packages I tried didn't need the bash stuff
introduced in 3.x or 4.x. Then again, I know this version of bash is
too old to run gentoo's portage package manager (which uses newer bash
features: some quoting rule changed, and it uses the ~= regex thing).
At one point I patched portage to work with older bash, but that's
pretty stale.
I'd like to get toysh to run portage, the aboriginal linux build, and
make it through linux from scratch (what are they up to, 7.2? I've got
an automated 6.8 build that needs updating...)
LFS is at 7.2. Now with udev from systemd.

Ok, thanks for the elaboration. You don't recall what of LFS required bash
extensions? Now someone just needs to figure out what features of bash
portage uses.

Felix
-------------- next part --------------
diff -r 17692bd604a2 toys/posix/wc.c
--- a/toys/posix/wc.c Sun Nov 04 16:42:03 2012 +0100
+++ b/toys/posix/wc.c Sun Nov 04 23:58:50 2012 +0100
@@ -6,22 +6,24 @@
*
* See http://opengroup.org/onlinepubs/9699919799/utilities/wc.html

-USE_WC(NEWTOY(wc, "cwl", TOYFLAG_USR|TOYFLAG_BIN))
+USE_WC(NEWTOY(wc, "mcwl", TOYFLAG_USR|TOYFLAG_BIN))

config WC
bool "wc"
default y
help
- usage: wc -lwc [FILE...]
+ usage: wc -lwcm [FILE...]

Count lines, words, and characters in input.

-l show lines
-w show words
- -c show characters
+ -c show bytes
+ -m show characters

- By default outputs lines, words, characters, and filename for each
- argument (or from stdin if none).
+ By default outputs lines, words, bytes, and filename for each
+ argument (or from stdin if none). Displays only either bytes
+ or characters.
*/

#include "toys.h"
@@ -48,7 +50,8 @@

static void do_wc(int fd, char *name)
{
- int i, len;
+ int i, len, clen=1, space;
+ wchar_t wchar;
unsigned long word=0, lengths[]={0,0,0};

for (;;) {
@@ -58,9 +61,24 @@
toys.exitval = EXIT_FAILURE;
}
if (len<1) break;
- for (i=0; i<len; i++) {
+ for (i=0; i<len; i+=clen) {
+ if(toys.optflags&8) {
+ clen = mbrtowc(&wchar, toybuf+i, len-i, 0);
+ if(clen==(size_t)(-1)) {
+ if(i!=len-1) {
+ clen = 1;
+ continue;
+ }
+ else break;
+ }
+ if(clen==(size_t)(-2)) break;
+ if(clen==0) clen=1;
+ space = iswspace(wchar);
+ }
+ else space = isspace(toybuf[i]);
+
if (toybuf[i]==10) lengths[0]++;
- if (isspace(toybuf[i])) word=0;
+ if (space) word=0;
else {
if (!word) lengths[1]++;
word=1;
@@ -74,6 +92,8 @@

void wc_main(void)
{
+ setlocale(LC_ALL, "");
+ toys.optflags |= (toys.optflags&8)>>1;
loopfiles(toys.optargs, do_wc);
if (toys.optc>1) show_lengths(TT.totals, "total");
}
diff -r 17692bd604a2 scripts/test/wc.test
--- a/scripts/test/wc.test Sun Nov 04 16:42:03 2012 +0100
+++ b/scripts/test/wc.test Sun Nov 04 23:58:57 2012 +0100
@@ -18,5 +18,29 @@
testing "wc -l" "wc -l file1" "4 file1\n" "" ""
testing "wc -w" "wc -w file1" "5 file1\n" "" ""
testing "wc format" "wc file1" "4 5 26 file1\n" "" ""
-testing "wc multiple files" "wc input - file1" "1 2 3 input\n0 2 3 -\n4 5 26 file1\n5 9 32 total\n" "a\nb" "a b"
+testing "wc multiple files" "wc input - file1" \
+ "1 2 3 input\n0 2 3 -\n4 5 26 file1\n5 9 32 total\n" "a\nb" "a b"
+
+#Tests for wc -m
+if printf "%s" "$LANG" | grep -q UTF-8
+then
+
+printf " " > file1
+for i in $(seq 1 8192)
+do
+ printf "?" >> file1
+done
+testing "wc -m" "wc -m file1" "8193 file1\n" "" ""
+printf " " > file1
+for i in $(seq 1 8192)
+do
+ printf "??" >> file1
+done
+testing "wc -m (invalid chars)" "wc -m file1" "8193 file1\n" "" ""
+testing "wc -mlw" "wc -mlw input" "1 2 11 input\n" "hello, ??!\n" ""
+
+else
+printf "skipping tests for wc -m"
+fi
+
rm file1
diff -r 17692bd604a2 toys.h
--- a/toys.h Sun Nov 04 16:42:03 2012 +0100
+++ b/toys.h Sun Nov 04 23:59:04 2012 +0100
@@ -16,6 +16,7 @@
#include <inttypes.h>
#include <limits.h>
#include <libgen.h>
+#include <locale.h>
#include <math.h>
#include <pty.h>
#include <pwd.h>
@@ -46,6 +47,8 @@
#include <unistd.h>
#include <utime.h>
#include <utmpx.h>
+#include <wchar.h>
+#include <wctype.h>

#include "lib/lib.h"
#include "toys/e2fs.h"
Felix Janda
2012-11-04 23:04:13 UTC
Permalink
Post by Rob Landley
I note that adding utf-8 support to wc might be an interesting small
project. It's basically mbrtowc() and possibly with wcswidth() on the
result. (I'd have to check the definition of -m to see if they want
characters output or character positions output).
If not, I should get around to it before too long. :)
wc -m only cares about counting characters. Attached is a try on
implementing it and some test cases for it. The test cases are only for
UTF-8 locales.

I think that a config option for internalization support should be added.
Post by Rob Landley
I'm interested in defining what those extensions are, but it's really
data collection. I know that I use <(command) and >(command), the
$[1+2] is more or less $((1+2)), saying "function" before a function
definition is a NOP...
Aboriginal linux is building bash 2.05b, because last I checked busybox
ash couldn't build LFS. (This may have changed, I haven't rechecked in
a while.) But most packages I tried didn't need the bash stuff
introduced in 3.x or 4.x. Then again, I know this version of bash is
too old to run gentoo's portage package manager (which uses newer bash
features: some quoting rule changed, and it uses the ~= regex thing).
At one point I patched portage to work with older bash, but that's
pretty stale.
I'd like to get toysh to run portage, the aboriginal linux build, and
make it through linux from scratch (what are they up to, 7.2? I've got
an automated 6.8 build that needs updating...)
LFS is at 7.2. Now with udev from systemd.

Ok, thanks for the elaboration. You don't recall what of LFS required bash
extensions? Now someone just needs to figure out what features of bash
portage uses.

Felix
-------------- next part --------------
diff -r 17692bd604a2 toys/posix/wc.c
--- a/toys/posix/wc.c Sun Nov 04 16:42:03 2012 +0100
+++ b/toys/posix/wc.c Sun Nov 04 23:58:50 2012 +0100
@@ -6,22 +6,24 @@
*
* See http://opengroup.org/onlinepubs/9699919799/utilities/wc.html

-USE_WC(NEWTOY(wc, "cwl", TOYFLAG_USR|TOYFLAG_BIN))
+USE_WC(NEWTOY(wc, "mcwl", TOYFLAG_USR|TOYFLAG_BIN))

config WC
bool "wc"
default y
help
- usage: wc -lwc [FILE...]
+ usage: wc -lwcm [FILE...]

Count lines, words, and characters in input.

-l show lines
-w show words
- -c show characters
+ -c show bytes
+ -m show characters

- By default outputs lines, words, characters, and filename for each
- argument (or from stdin if none).
+ By default outputs lines, words, bytes, and filename for each
+ argument (or from stdin if none). Displays only either bytes
+ or characters.
*/

#include "toys.h"
@@ -48,7 +50,8 @@

static void do_wc(int fd, char *name)
{
- int i, len;
+ int i, len, clen=1, space;
+ wchar_t wchar;
unsigned long word=0, lengths[]={0,0,0};

for (;;) {
@@ -58,9 +61,24 @@
toys.exitval = EXIT_FAILURE;
}
if (len<1) break;
- for (i=0; i<len; i++) {
+ for (i=0; i<len; i+=clen) {
+ if(toys.optflags&8) {
+ clen = mbrtowc(&wchar, toybuf+i, len-i, 0);
+ if(clen==(size_t)(-1)) {
+ if(i!=len-1) {
+ clen = 1;
+ continue;
+ }
+ else break;
+ }
+ if(clen==(size_t)(-2)) break;
+ if(clen==0) clen=1;
+ space = iswspace(wchar);
+ }
+ else space = isspace(toybuf[i]);
+
if (toybuf[i]==10) lengths[0]++;
- if (isspace(toybuf[i])) word=0;
+ if (space) word=0;
else {
if (!word) lengths[1]++;
word=1;
@@ -74,6 +92,8 @@

void wc_main(void)
{
+ setlocale(LC_ALL, "");
+ toys.optflags |= (toys.optflags&8)>>1;
loopfiles(toys.optargs, do_wc);
if (toys.optc>1) show_lengths(TT.totals, "total");
}
diff -r 17692bd604a2 scripts/test/wc.test
--- a/scripts/test/wc.test Sun Nov 04 16:42:03 2012 +0100
+++ b/scripts/test/wc.test Sun Nov 04 23:58:57 2012 +0100
@@ -18,5 +18,29 @@
testing "wc -l" "wc -l file1" "4 file1\n" "" ""
testing "wc -w" "wc -w file1" "5 file1\n" "" ""
testing "wc format" "wc file1" "4 5 26 file1\n" "" ""
-testing "wc multiple files" "wc input - file1" "1 2 3 input\n0 2 3 -\n4 5 26 file1\n5 9 32 total\n" "a\nb" "a b"
+testing "wc multiple files" "wc input - file1" \
+ "1 2 3 input\n0 2 3 -\n4 5 26 file1\n5 9 32 total\n" "a\nb" "a b"
+
+#Tests for wc -m
+if printf "%s" "$LANG" | grep -q UTF-8
+then
+
+printf " " > file1
+for i in $(seq 1 8192)
+do
+ printf "?" >> file1
+done
+testing "wc -m" "wc -m file1" "8193 file1\n" "" ""
+printf " " > file1
+for i in $(seq 1 8192)
+do
+ printf "??" >> file1
+done
+testing "wc -m (invalid chars)" "wc -m file1" "8193 file1\n" "" ""
+testing "wc -mlw" "wc -mlw input" "1 2 11 input\n" "hello, ??!\n" ""
+
+else
+printf "skipping tests for wc -m"
+fi
+
rm file1
diff -r 17692bd604a2 toys.h
--- a/toys.h Sun Nov 04 16:42:03 2012 +0100
+++ b/toys.h Sun Nov 04 23:59:04 2012 +0100
@@ -16,6 +16,7 @@
#include <inttypes.h>
#include <limits.h>
#include <libgen.h>
+#include <locale.h>
#include <math.h>
#include <pty.h>
#include <pwd.h>
@@ -46,6 +47,8 @@
#include <unistd.h>
#include <utime.h>
#include <utmpx.h>
+#include <wchar.h>
+#include <wctype.h>

#include "lib/lib.h"
#include "toys/e2fs.h"

Rob Landley
2012-11-01 14:49:25 UTC
Permalink
Post by Felix Janda
Post by Rob Landley
Post by Felix Janda
Posix specifies an -m option for wc, which toybox does
not implement. Should there be a test for this, too?
That's internationalization support, which I haven't implemented
yet.
Post by Rob Landley
I think toybox should support utf-8, but am not as interested in
multiple translations and date formats and such. (Those belong at
the
Post by Rob Landley
GUI/X11 level.)
Ok.
I note that adding utf-8 support to wc might be an interesting small
project. It's basically mbrtowc() and possibly with wcswidth() on the
result. (I'd have to check the definition of -m to see if they want
characters output or character positions output).

If not, I should get around to it before too long. :)
Post by Felix Janda
Post by Rob Landley
Post by Felix Janda
Why do the scripts actually use bash?
The tl;dr version is "dash was a mistake on Ubuntu's part".
[...]
Toybox's built in shell is not attempting to duplicate dash. It's
doing
Post by Rob Landley
a posix shell and then adding the bash extensions that make sense.
Rob
Thanks for the interesting read. Skimming testing.sh I see that it
uses
bash extensions and therefore each test script should be executed by
bash.
Presumably, the extensions used in testing.sh belong to the (not yet
well-
defined) set of sane extensions to be implemented in toybox's sh at
some
point?
I'm interested in defining what those extensions are, but it's really
data collection. I know that I use <(command) and >(command), the
{curly,bracket} stuff, and pipefail. Several other things are synonyms:
$[1+2] is more or less $((1+2)), saying "function" before a function
definition is a NOP...

Aboriginal linux is building bash 2.05b, because last I checked busybox
ash couldn't build LFS. (This may have changed, I haven't rechecked in
a while.) But most packages I tried didn't need the bash stuff
introduced in 3.x or 4.x. Then again, I know this version of bash is
too old to run gentoo's portage package manager (which uses newer bash
features: some quoting rule changed, and it uses the ~= regex thing).
At one point I patched portage to work with older bash, but that's
pretty stale.

I'd like to get toysh to run portage, the aboriginal linux build, and
make it through linux from scratch (what are they up to, 7.2? I've got
an automated 6.8 build that needs updating...)

Rob
Continue reading on narkive:
Loading...