Cleanup Issue BIT-ARRAY-FUNCTIONS

Status
failed (with amendments) Jun89 X3J13
Category
ADDITION
References
The binary bit-array functions BIT-AND, BIT-IOR, BIT-XOR, BIT-EQV, BIT-NAND, BIT-NOR, BIT-ANDC1, BIT-ANDC2, BIT-ORC1, and BIT-ORC2 (CLtL p.294). The unary bit-array function BIT-NOT (CLtL p.295). The mapping functions EVERY, MAP, NOTANY, NOTEVERY, and SOME (CLtL pp.249-250). The functions COUNT and POSITION (CLtL p.257).
Related issues
none

Problem Description

Logical operations on bit vectors have been found to be useful in such programs as compiler flow analysis. They are easy to implement in straight Common Lisp, but such an implementation is many times slower than an optimized implementation on most machines. This is partly because many machines have instructions to perform these operations or inner kernels of them, and partly because Common Lisp is not a good language for implementing this type of low-level bit-oriented operation.

Common Lisp provides some logical operations on bit arrays, but the provided set is incomplete. Furthermore, the operations that are provided are only defined for arrays of identical dimensions, making them less useful for bit vectors that represent sets, where trailing zero elements are often omitted. Some of the sequence functions are useful for bit vectors, but users (correctly) fear that their implementation may be optimized for general sequences, not for bit vectors.

CLtL does not specify whether BIT-AND and related functions respect the fill-pointer, however the description of fill-pointers on p.295 implies that they should.

This issue contains two alternative proposals.

Proposal (ADD)

1. Allow the binary bit-array functions referenced above to accept arguments of identical rank but unequal dimensions. Nonexistent elements of bit-array-1 or bit-array-2 are assumed to be zero. If the third argument is T or a bit-array, result elements outside the bounds of the array must be zero or an error should be signalled. If the third argument is NIL or omitted, each dimension of the result array is equal to either the corresponding dimension of bit-array-1 or the corresponding dimension of bit-array-2. The larger of the two dimensions is used when necessary to hold all the nonzero elements of the result, otherwise either the larger or the smaller of the two dimensions is used.

2. Allow BIT-NOT with a bit array as the second argument to accept arguments of identical rank but unequal dimensions. Result elements outside the bounds of the array must be zero or an error should be signalled.

3. Add the following functions:

BIT-SUBSETP bit-array-1 bit-array-2

Returns true if for every element of bit-array-1 that is 1, the element with the same subscripts exists in bit-array-2 and is 1. Bit-array-1 and bit-array-2 must have identical rank but need not have identical dimensions.

BIT-DISJOINTP bit-array-1 bit-array-2

Returns true if for every element of one bit-array that is 1, the element with the same subscripts either does not exist in the other bit-array or is 0. Bit-array-1 and bit-array-2 must have identical rank but need not have identical dimensions.

BIT-EQUAL bit-array-1 bit-array-2

Returns true if for every element of one bit-array that is 1, the element with the same subscripts exists in the other bit-array and is 1. Bit-array-1 and bit-array-2 must have identical rank but need not have identical dimensions.

4. Specify that the binary bit-array functions referenced above, the unary bit-array function referenced above, and the three bit-array functions referenced in point 3 respect the fill-pointer of any argument that is one-dimensional and has a fill-pointer.

5. Suggest in the language specification document that compilers should optimize the following functions when the sequence argument is declared to be a bit-vector, taking advantage of any relevant special machine instructions.

COUNT POSITION

6. Suggest in the language specification document that compilers should optimize the following functions when there are two arguments, the second argument is declared to be a bit-vector, and the predicate argument is #'ZEROP, taking advantage of any relevant special machine instructions.

EVERY NOTANY NOTEVERY SOME

Proposal (NO-NEW-FUNCTIONS)

Points 1, 2, 4, 5, and 6 are the same as BIT-ARRAY-FUNCTIONS:ADD.

Substitute for point 3:

3. Do not add the three new functions. Instead, generalize the mapping functions referenced above (EVERY, MAP, NOTANY, NOTEVERY, and SOME) so that they operate on "mappables" rather than just sequences. Define a mappable to be an array or a list. Specify that the mappable arguments to a mapping function, and the result in the case of MAP with a non-NIL first argument, must all be of the same rank (the rank of a list is considered to be 1). Mapping accesses array elements in row-major order. Generalize the existing specification that a mapping function uses the length of the shortest sequence, to say that a mapping function uses on each axis the minimum of the dimensions on that axis of the mappable arguments.

Additional point 7:

7. Suggest in the language specification document that compilers should optimize the functions EVERY, NOTANY, NOTEVERY, and SOME when there are two arguments, the second argument is declared to be a bit-array, and the predicate argument is #'ZEROP, taking advantage of any relevant special machine instructions. In addition compilers should optimize when the second argument is a call with two arguments to one of the binary bit-array functions referenced above, to avoid consing an intermediate result.

Examples

The equivalents of UNION and INTERSECTION for sets represented as bit vectors, with 1's in positions where set elements are present, are BIT-IOR and BIT-AND respectively.

(COUNT 1 (THE BIT-VECTOR BV)) computes the cardinality of a bit vector (called the population on some computers). This is analogous to LOGCOUNT of an integer.

(POSITION BIT (THE BIT-VECTOR BV)) scans for a 1 or 0 bit, but can likely be implemented using a word-at-a-time scan.

(EVERY #'ZEROP (THE BIT-VECTOR BV)) tests whether a bit vector is entirely zero.

(BIT-SUBSETP bit-array-1 bit-array-2) is equivalent to (EVERY #'ZEROP (BIT-ANDC2 bit-array-1 bit-array-2)) [under the first proposal, only when the arguments are of rank 1.]

(BIT-DISJOINTP bit-array-1 bit-array-2) is equivalent to (EVERY #'ZEROP (BIT-AND bit-array-1 bit-array-2)) [under the first proposal, only when the arguments are of rank 1.] This is analogous to NOT of LOGTEST of two integers.

(BIT-EQUAL bit-array-1 bit-array-2) is equivalent to (EVERY #'ZEROP (BIT-XOR bit-array-1 bit-array-2)) [under the first proposal, only when the arguments are of rank 1.] BIT-EQUAL differs from EQUAL only when the arguments are of unequal dimensions.

Rationale

1,2. Relaxing the requirement that bit arrays must have equal dimensions was requested by users who had tried to use these operations on sets.

3. Three new functions are added by BIT-ARRAY-FUNCTIONS:ADD because EVERY only works on vectors, since issue SEQUENCE-FUNCTIONS-EXCLUDE-ARRAYS was rejected. BIT-ARRAY-FUNCTIONS:NO-NEW-FUNCTIONS includes the minimal portion of that proposal needed to avoid adding any new functions, while omitting all the controversial parts.

4. Respecting the fill-pointer of vectors makes the BIT-xxx functions more consistent with the rest of the language. They can be thought of as sequence functions for bit-vectors (and sequence functions always respect the fill-pointer) that have been generalized to work on multidimensional bit-arrays as well.

5,6,7. The suggestion for compiler optimization is to give users the confidence that they will get good results when using sequence and mapping operations on bit vectors. Otherwise we would feel the need to add additional bit-vector-specific functions to perform these operations in a way that is optimized and specialized for bit-vectors. Recommending optimization of a particular way of performing these operations avoids the problem of each implementation choosing a different idiom to optimize, resulting in performance problems when porting.

Current Practice

Symbolics Genera 7.2 has something like the first proposal, but only for bit vectors, not generalized for bit arrays. Genera has some additional functions (BIT-VECTOR-POSITION, BIT-VECTOR-CARDINALITY, and BIT-VECTOR-ZERO-P) that aren't really necessary since they are equivalent to POSITION, COUNT, or EVERY plus a type declaration. The proposal seems to fit into the rest of Common Lisp better than Genera's current practice.

Symbolics Genera 7.2 does not respect the fill-pointer in BIT-AND.

Cost to Implementors

Implementing these very efficiently may require some clever hand coding. Of course the standard cannot mandate any particular level of efficiency and a simple, low-cost implementation is permissible. Implementing the compiler suggestions requires keeping track of type declarations in the compiler, but most compilers already do that. The second proposal requires slightly more compiler analysis than the first proposal.

A run-time type test and dispatch to code specialized for bit-arrays could be used instead of compiler analysis, at a small efficiency cost.

Cost to Users

None, unless some implementation currently violates point 4 and user programs currently depend on that. It seems quite unlikely that anyone would depend on BIT-xxx functions to access past the fill-pointer.

Cost of Non-Adoption

Less featureful language. Some bit array manipulation will have to be written in nonportable Lisp code or in C or assembly language.

Performance Impact

None on programs that don't use these features. Negligibly small on the binary bit-array functions referenced above when array dimensions are equal. Large improvement for programs that can take advantage of these features when running in an implementation that optimizes them.

Benefits

More featureful language.

Aesthetics

More featureful language.

Discussion

This functionality was suggested on the Common Lisp mailing list 12-Jan-89. The detailed design has evolved from what was suggested and is greatly simplified.

The loose specification of the result dimensions in points 1 and 2 is to allow maximum implementation freedom. This is not essential to the proposal and could be changed to require that the result have the same dimension as the larger of the two arguments.

It has been suggested that points 6 and 7 should specify some other predicates to optimize, such as #'PLUSP or (COMPLEMENT #'ZEROP). Moon doesn't think this is important enough to be worth adding.

Edit History