The adverb \ operated on subsets of the operand y that were taken in a regular way. Now we will take the next step, and operate on irregular subsets of y . This will finally give us the chance to do interesting work with character strings.
Find Unique Items: Monad ~. and Monad ~:
Monad ~. has infinite rank. ~. y is y with duplicate items removed and is called the nub of y. The items of ~. y are in the order of their first appearance in y :
~. 'Green grow the lilacs'
Gren gowthliacs
]a =. _2 ]\ 0 1 1 2 0 1 2 3 1 3 1 2
0 1
1 2
0 1
2 3
1 3
1 2
~.a
0 1
1 2
2 3
1 3
y can have any rank, and ~. y gives the unique items.
Monad ~: has infinite rank and tells you which items monad ~. would pick. ~: y is a Boolean list where each element is 1 if the corresponding item of y would have been selected for the nub of y :
~: 'Green grow the lilacs'
1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 1 1 0 1 1 1
~: a
1 1 0 1 1 0
(~: a) # a
0 1
1 2
2 3
1 3
~. y is equivalent to (~: y) # y .
Dyad u/. has infinite rank. x u/. y applies u to subsets of y for which the corresponding items of x (called the keys) are identical. I will skip the formal description in favor of a verbal one: First find the nub of x, call it nx . Then, for each item of nx, find all matching items of x; make an array of the corresponding items of y (this array will always have rank one more than the rank of an item of y), and apply u to that array; the result becomes one item of the result of x u/. y . The items of x u/. y thus correspond to items of the nub of x . Note that the subsets of y may have different shapes (they will all have the same rank, being made of items of y, but each subset may have a different number of items). For this reason, we usually have u produce a boxed result to avoid framing fills.
]a =. 2 0 1 0 2 </. 'Hello'
+--+--+-+
|Ho|el|l|
+--+--+-+
The subsets were created, and each one was boxed. Note that each subset is a list, even the one with only one element:
$&.> a
+-+-+-+
|2|2|1|
+-+-+-+
3 0 3 0 3 +//. 100 1 200 2 300
600 3
The summing was performed for each subset. Note that the result is sorted not on the value of the keys, but rather on the smallest index of each key in x .
]a =. _2 ]\ 'Fred';100;'Joe';200;'Fred';50;'Sam';30
+----+---+
|Fred|100|
+----+---+
|Joe |200|
+----+---+
|Fred|50 |
+----+---+
|Sam |30 |
+----+---+
A small database of amounts we owe. We can quickly get a total for each creditor:
]b =. ({."1 a) +/@:>/. ({:"1 a)
150 200 30
({."1 a) gives the list of names, which we use as keys for the data given by ({:"1 a) . For each subset, we open the boxes and add up the results. Note that +/@>/. would not do in place of +/@:>/., and understand why.
It would be nice to have names associated with the totals:
(~. {."1 a) ,. <"0 b
+----+---+
|Fred|150|
+----+---+
|Joe |200|
+----+---+
|Sam |30 |
+----+---+
We box the totals using <"0 before we join items of the totals to the items of the nub. Recall that x ,. y concatenates each item of x with the corresponding item of y .We take advantage of the fact that the results of u/. have the same order as the nub.
Apply On Partitions: Monad u;.1 and u;.2
Unlike dyad u/. which applies u to scattered subsets of y, u;.n applies u to sequential subsets of y . In monad u;.n, the subsets are computed from information in y itself; in dyad u;.n, x specifies the subsets. u;.n is really 4 different cases distinguished by the number n; we will start with the case where n is 1, 2, _1, or _2 .
Avoid the error of thinking that an operation on a subset of y somehow modifies y . Never happens. In J, the result of a verb is always a new noun. You may assign that noun to the same name as one of the operands, but until you do, the old operand is unchanged, and both it and the result of the verb are available for further use.
u;.1 y partitions y by finding the items of y that match the first item of y (i. e. 0{y); each such item, called a fret, is the start of an interval of y which runs from the fret to the last item before the next fret (the last interval runs to the end of y). Monad u is applied to each interval (which is always a list of items of y), and the results of u become the items of the overall result of u;.1 y .
<;.1 ' a list of words '
+--+-----+---+------+-+
| a| list| of| words| |
+--+-----+---+------+-+
Each ' ' character, even the one at the end, started a new interval. We are not restricted to boxing the intervals:
#;.1 ' a list of words '
2 5 3 6 1
Here we report the length of each word.
u;._1 y is like u;.1 y except that the fret itself is omitted from the interval. The interval could be empty in that case:
<;._1 ' a list of words '
+-+----+--+-----++
|a|list|of|words||
+-+----+--+-----++
#;._1 ' a list of words '
1 4 2 5 0
u;.2 y and u;._2 y are like u;.1 y and u;._1 y except that the frets are those items of y that match its last item, and each marks the end of an interval:
<;.2 'Mississippi'
+--+---+---+---+
|Mi|ssi|ssi|ppi|
+--+---+---+---+
Use u;.2 y to split a list when you know the ending marker for the intervals:
;: ;._2 (0 : 0)
Fred 500
Joe 200
Fred 50
Sam 30
)
+----+---+
|Fred|500|
+----+---+
|Joe |200|
+----+---+
|Fred|50 |
+----+---+
|Sam |30 |
+----+---+
Here (0 : 0) creates a noun from the lines that follow. Each line ends with an unseen LF character which provides a convenient fret for splitting the lines. We use ;._2 to apply monad ;: to the text of each line (not including the fret, which is dropped by ;._2). Monad ;: splits each string into words and boxes the words. The result is not the same as the array we created earlier, because the second column is character rather than numeric. We could have written noun define instead of (0 : 0) .
Monad u;.n has rank 1 _ when n is 1, 2, _1, or _2 .
Apply On Specified Partitions: Dyad u;.1 and u;.2
When n is 1, 2, _1, or _2, x u;.n y has infinite rank and, in the one-dimensional case where x is a Boolean list of 0s and 1s, resembles u;.n y . The difference is that the frets are given by the positions of 1s in x rather than by values of the items of y . We can define u;.1 y as (({.y)="_ _1 y) u;.1 y, and u;.n for the other values of n similarly.
0 1 0 1 0 +/;.1 (20 30 40 50 60)
70 110
As in this example, some leading items of y may not be in any interval.
In the general case x is a list of boxed Boolean lists, with j{::x supplying the frets for axis j . The partition is then multidimensional:
(0 1 1;1 0 0 1) <;.1 i. 3 4
+------+--+
|4 5 6 |7 |
+------+--+
|8 9 10|11|
+------+--+
In both the monadic and dyadic cases of u;.1, u;._1, u;.2, and u;._2, the partitions to which u is applied have the same rank as y (but the shapes along the leading axes are reduced by the partitioning).
Find Sequence Of Items: Dyad E.
Dyad E. has infinite rank and is used to slide a pattern across an array and look for positions at which the pattern matches the items of the array. x and y should have the same rank, and the sliding occurs along all axes of y, giving a Boolean result with the same shape as y . We will consider only the cases where y is a list.
'is' E. 'Mississippi'
0 1 0 0 1 0 0 0 0 0 0
The 1s in the result tell where the pattern starts. The result of E. is often used as input to u;.1 :
('is' E. 'Mississippi') <;.1 'Mississippi'
+---+-------+
|iss|issippi|
+---+-------+
A more ambitious example:
html =. 0 : 0
<th><a href='page1.html'>Press here to go back</a></th>
<th><a href='page2.html'>Press here to go home</a></th>
<th><a href='page3.html'>Press here to go away</a></th>
</table>
</center>
)
('<a' E. html) {.@:(<@:(8&}.);._1)@:('>'&,);.1 html
+------------+------------+------------+
|'page1.html'|'page2.html'|'page3.html'|
+------------+------------+------------+
That looks like a useful result, but what a mess to produce it! If you want, you may try to understand that, but a better approach is to break it up:
extracthref =: <@:(8&}.) ;._1 @:('>'&,)
This seems understandable, with effort: the execution order is ((<@:(8&}.)) ;._1) @:('>'&,); we prepend '>' to y, then we use ;._1 on the string. That will make the prepended '>' the fret, and so we will break the string into parts that start with '>', throw away the first 8 characters of each part, and box each trimmed-down part. We can even try it on a test bit:
extracthref 'abcdefghijkl>xxx'
+----++
|ijkl||
+----++
Now we can look again at our original line, rewritten:
('<a' E. html) {.@:extracthref ;.1 html
+------------+------------+------------+
|'page1.html'|'page2.html'|'page3.html'|
+------------+------------+------------+
This makes some sense now. The execution order is ('<a' E. html) ({.@:extracthref ;.1) html . We use E. to find all the starting positions of '<a' tags; then, for each one, we split what follows the '<a' into blocks terminated by '>', and then we take the first one, which will have the data before the '>' that matched the '<a' .
The left argument to the partitioning dyads u;.n can be a list of boxes. In this case the first box contains the partition marks for axis 0, then next box contains the marks for axis 1, and so on. If a set of partition marks is an empty list, the corresponding axis will be unpartitioned.
Dyad u;.0 has rank 2 _ . x u;.0 y uses x to specify a subarray of y; the subarray is extracted, and u is applied to it to produce the final result. We will discuss the simple case where x is a rank-2 array.
The first item of x (call that s) gives the starting corner of the subarray. The second item of x gives the length of each axis of the subarray. If x is shorter than the rank of y, unspecified axes are taken in full. For example:
]a =. a. {~ (a. i. 'a') + i. 4 4
abcd
efgh
ijkl
mnop
A cute little expression in its own right. See why this produces the 4×4 array of characters shown. Remember that a. is the alphabet of all ASCII characters in order. An even more elegant way to produce the same result would be (i. 4 4)&+&.(a.&i.) 'a' .
(0 0 ,: 2 2) ];.0 a
ab
ef
Starting corner 0 0; lengths 2 2; result is a 2×2 subarray, left unchanged by monad ] .
(1 2 ,: 3 2) ];.0 a
gh
kl
op
Starting corner 1 2; lengths 3 2; result is a 3×2 subarray, left unchanged by monad ] . You get the idea.
If an item of s is negative, the corresponding axis of the subarray extends backward from the starting point (and its absolute value is used as the starting position):
(2 _2 ,: 2 2) ];.0 a
jk
no
Starting corner 2 2 (the character 'k'); lengths 2 2, but running backward in axis 1; result is a 2×2 subarray, left unchanged by monad ] . Note that the axis extends backward, but the items retain their normal order—you have merely specified the interval by its endpoint rather than its beginning point. If you want to reverse the order of the axis, you can do that too, by making the corresponding length negative:
(2 _2 ,: 2 _2) ];.0 a
kj
on
Starting corner 2 2 (the character 'k'); lengths 2 2, but running backward in axis 1; result is a 2×2 subarray with axis 1 reversed, left unchanged by monad ] .
The subarray must end at the limits of the array. If the length requests more than that, the subarray will be shorter than was requested. A length of _ or __ will always fetch from the starting point to the limit of the array.
Dyad u;.0 is a great way to pick out a subarray to work on. Always consider using it whenever you find yourself using more than one application of {.and }. to select a portion of an array. Even if you are working with a list, using dyad u;.0 is a good way to work on a portion of the list without copying any part you aren't working on.
Apply On All Subarrays: Dyad u;.3 and u;._3
x u;._3 y applies u to subarrays of y as specified by x . The operation is similar to dyad u/ (infix), but dyad u/ slides a one-dimensional window across y to select sequences of items of y, while dyad u;._3 moves a multidimensional window throughout y to select multidimensional regions. When the window has only one dimension, the operation is like dyad u/ but with a little extra control over the positions of the window.
The second item of x (actually, | 1{x) gives the size of the window (if an item is negative the corresponding axis is reversed before u is applied to the window; an infinite value means 'take all the way to the end of the array'). The first item of x (0{x) is the movement vector: the window is positioned at every point at which each item of the window position is an integral multiple of the corresponding item of the movement vector, as long as the window fits inside y . If an item of the movement vector is smaller than the corresponding item of the size, the windows will overlap along that axis.
Dyad u;._3 is useful in imaging applications. The following example averages 2×2-pixel regions in an image using a simple box filter:
]image =. ? 8 8 $ 100
31 88 65 15 68 38 38 49
14 58 84 59 95 55 14 98
40 14 56 25 48 46 96 12
19 31 62 12 65 62 80 24
47 38 20 2 90 42 14 94
41 13 88 9 16 7 36 25
13 78 45 34 45 80 93 65
21 67 90 25 86 47 50 60
(2 2,:2 2) (*&0.25)@:(+/)@:, ;._3 image
47.75 55.75 64 49.75
26 38.75 55.25 53
34.75 29.75 38.75 42.25
44.75 48.5 64.5 67
The rank of each operand presented to u is the same as the rank of y . The shape of each operand presented to u is the shape of y with the leading items replaced by |1{x (formally, this is (|1{x),(#x)}.$y).
If x has rank less than 2, it is processed as if it were 1,:x which will try the window at all possible locations.
x u;.3 y is similar to x u;._3 y, but the window is positioned at every starting point that is within y, even if the entire window will not fit within y . For those positions at which the window will not fit, the operand to u is truncated.
Extracting Variable-Length Fields Using ^: and ;.1
Breaking a string of variable-length fields into its individual fields seems to be a problem requiring a loop. How can you find the second field until you have examined the length of the first? There happens to be a good way to do this in loopless J. We will use as an example a set of character-string records where each record is preceded by a one-digit length (maximum string length is 9 characters). For example, the string
data =. '5There2is1a4tide2in3the7affairs2of3men'
contains 9 records, each containing a single English word. How do we split the string into its records?
First, we examine each possible starting position and calculate how long a record would be if it started at that position. The calculated length must be the entire record length including that of the length field itself. Obviously, we will be calculating spurious lengths at the places that turn out not to be record-start points, but that is the small price we pay for loopless coding. Here, the data length is given by the difference between the ASCII value of each byte and the ASCII value of '0', and we add one to account for the length of the length field:
]l =. >: (a. i. data) - a. i. '0'
6 37 57 54 67 54 3 58 68 2 50 5 69 58 53 54 3 ...
Next, we calculate, for each position, where the next record will start assuming that a record starts at that position. Clearly, we do this by adding the putative record length to the offset of the position. If the resulting total is past the end of the string, we limit the position to one character past the end of the string:
]n =. (#l) <. l + i. # l
6 38 38 38 38 38 9 38 38 11 ...
Now for the trick. The first record starts at offset 0. The next record starts at offset 0{n . The record after that starts at offset (0{n){n, and so on. To get the list of all the starting positions, we use
]pos =. (n,_1) {~^:a: 0
0 6 9 11 16 19 23 31 34 38 _1
The special power u^:a: means ‘keep applying u until the result stops changing, and return the vector of results’. It is like u^:_ in that it applies u until the result stops changing, but u^:a: returns all the intermediate results from u, not just the last one. The result from {~^:a: is the list of start-of-record positions, and all that remains is to collect the records. We discard record pointers that are off the end, and box the records starting at thepositions given. We do this by converting the list of valid record-start points to a Boolean vector, and using the partitioning functions to box the records:
((i. #data) e. pos) <;._1 data
+-----+--+-+----+--+---+-------+--+---+
|There|is|a|tide|in|the|affairs|of|men|
+-----+--+-+----+--+---+-------+--+---+