Cleanup Issue ELIMINATE FORCED CONSING

Category
ADDITION
References
CLtL section 14.1,3,5; 15.2,5; 17.4; 18.3; 21.2

Problem Description

Some sequence, list, and string functions in CLtL encourage a programming style that uses excessive storage allocation compared to libraries of routines with similar functionality in other languages, notably C. The only options available to the Common Lisp programmer who uses these functions are to generate a newly-allocated sequence or to destructively modify the argument sequence(s) given the function. The option of providing a sequence, list, or string into which the result of a sequence operation should be placed is not available.

Proposal

Add mutually exclusive :RECYCLE or :MODIFY keyword arguments to those sequence, list, and string functions where such arguments are useful. :RECYCLE is used to pass a sequence to be recycled to hold the result of the sequence, list, or string operation. :MODIFY is used to pass a sequence that is to be modified in some manner to contain the result of the operation. The distinction lies in that :MODIFY is akin to destructively modifying a useful data structure, while :RECYLE is used to simply recycle a data structure that would otherwise be useless and possibly garbage (i.e., a data structure whose contents are no longer of interest and may be altered arbitrarily). The sequence, list, or string function returns the MODIFY'd or RECYLE'd sequence as specified below. It is an error to pass both a :MODIFY and a :RECYCLE argument.

[a] Support a MODIFY argument:

(1) The sequence to be modified must accomodate elements of the type(s) that would have been stored in a newly-allocated sequence. Thus it would be an error to give a target string argument if the result of the operation would normally have had elements that were not all of type STRING-CHAR.

(2a) A non-list MODIFY argument should have an allocated length long enough to accomodate the normal result of the sequence function. The sequence to be modified may have a fill pointer. The fill pointer remains unchanged by this operation so long as it points after the modified elements of the sequence. If the fill pointer points prior to or in the range of the modified elements, it is set to point immediately after the last element filled in by the sequence computation. If the sequence to be modifed is longer than necessary, the unneeded trailing elements are unchanged.

(2b) A list MODIFY argument is extended with new conses to be as long as necessary to accomodate the resulting sequence, if not enough conses are supplied and the :MODIFY-FROM-END keyword is nil. Two values are returned: the first is the modified list with any unused tail remaining intact. (I.e., The unused conses remain as a tail of the modified list that is returned.) The second is that tail of the original list that was not modified by the operation. This second value is useful for continued modification of the tail.

(3) :MODIFY-START and :MODIFY-END keywords are supported. MODIFY-START and MODIFY-END determine where in sequence the result of the sequence computation is placed. It is an error if the sub-length of the sequence specified by these arguments is not long enough to accomodate the sequence computation. If a longer than necessary subsequence is specified, then the elements in the unneeded part of the specified subsequence are unchanged.

(4) A :MODIFY-FROM-END keyword is supported. If non-nil, the sequence to be modified is filled with new elements starting at MODIFY-END (e.g., the end of the sequence if MODIFY-END is not specified), effectively reversing the order of elements of the as the sequence is modified. It is an error if the supplied sequence is not long enough for the result. If the supplied sequence is longer than necessary, leading elements are unchanged.

Sequence functions changed to include MODIFY, MODIFY-START, MODIFY-END, MODIFY-FROM-END: subseq, copy-seq, reverse, remove, remove-if, remove-if-not, remove-duplicates, substitute, substitute-if, substitute-if-not, merge

List functions changed to include MODIFY, MODIFY-START, MODIFY-END, MODIFY-FROM-END: copy-list, butlast

String functions changed to include MODIFY, MODIFY-START, MODIFY-END, MODIFY-FROM-END: string-trim, string-left-trim, string-right-trim, string-upcase, string-downcase, string-capitalize

[a] Support a RECYLE argument:

(1) The sequence to be recycled must accomodate elements of the type(s) that would have been stored in a newly-allocated sequence. Thus it would be an error to give a RECYLE string argument if the result of the operation would normally have had elements that were not all of type STRING-CHAR.

(2a) A non-list RECYCLE argument should have an allocated length long enough to accomodate the normal result of the sequence function. The sequence to be recycled may have a fill pointer. The fill pointer is set to point immediately after the last element filled in by the sequence computation. If the sequence to be recycled is longer than necessary, the unneeded trailing elements are unchanged.

(2b) A list RECYCLE argument is extended with new conses to be as long as necessary to accomodate the resulting list, if not enough conses are supplied. Two values are returned: the first is the result list terminated with NIL. (I.e., The unused conses are spliced out of the recycled list that is returned.) The second is that tail of the original list that was not used by the recycling operation.

(3) No :RECYCLE-START, :RECYCLE-END, or :RECYCLE-FROM-END arguments are supported.

Changed sequence functions: subseq, copy-seq, reverse, remove, remove-if, remove-if-not, remove-duplicates, substitute, substitute-if, substitute-if-not, merge

Changed list functions: make-list, copy-list, copy-alist, copy-tree, butlast, subst, sublis, adjoin, union, intersection, set-difference, set-exclusive-or

Changed string functions: string-trim, string-left-trim, string-right-trim, string-upcase, string-downcase, string-capitalize

Examples

(copy-seq '(1 2 3) :modify '(a b c d e f g))
	=> (1 2 3 d e f g)

(copy-seq '(1 2 3) :modify '(a b c d e f g) :modify-from-end t :modify-end 5)
	=> (a b 3 2 1 f g)

(remove 'a '(b b a b b a b b a b b) :count 1 :recycle '(3))
	=> (b b b b a b b a b b)   ; EQ to :recycle arg, CDR is new conses.

(substitute #\a #\b "This is a string with 2 a's"
	    :modify (make-array 37 :element-type 'string-char :fill-pointer 0))
	=> "This is b string with 2 b's" ; result EQ to MODIFY arg, 
					 ; result has a fill-pointer set to 27 

(substitute #\a #\b "ababa"
	    :modify (make-array 10 :element-type 'string-char
				   :initial-element #'q
	                           :fill-pointer 10))
	=> "aaaaaqqqqq" ; result EQ to MODIFY arg, 
			; result has a fill-pointer set to 10 

(copy-list '(a b c) :recycle '(1 2 3 4 5 6 7))
	=> (a b c)    ; first value
           (4 5 6 7)  ; second value

(copy-list '(a b c) :modify '(1 2 3 4 5 6 7))
	=> (a b c 4 5 6 7)  ; first value
           (4 5 6 7)        ; second value

In related additions, included here since they address the same problem as above, provide extended versions of concatenate, append, revappend, make-string-output-stream, and read-line as follows:

concatenate-into target &rest sequences Like CONCATENATE, but the result type is determined to be the type of TARGET. The result is TARGET containing as many of the elements of SEQUENCES as can be accomodated by the allocated length of TARGET. TARGET's fill pointer, if present is set according to how many elements of TARGET are filled by this operation.

map-into target function sequence &rest sequences Like MAP, but the result type is determined to be the type of TARGET. The result of MAP-INTO is TARGET such that element j is the result of applying FUNCTION to element j of each of the argument sequences. TARGET must be as long as the shortest of the input sequences. TARGET's fill pointer, if present is set according to how many elements of TARGET are filled by this operation.

append-into target &rest lists Like APPEND, but the copied list arguments are copied into conses taken from TARGET. The last list in LISTS is not copied, as in APPEND, rather, the last cons used from TARGET is given the last list in LISTS as its cdr. The result is EQ to TARGET (unless a single list is appended), but contains only those conses needed to hold the appended lists' elements. The tail of unused conses from TARGET is returned as a second value; new conses are allocated if TARGET supplies an insufficient number of conses.

revappend-into target x y Like REVAPPEND, but the new conses are taken from TARGET. The result is EQ to TARGET, but contains only those conses needed to hold X's elements. The tail of unused conses from TARGET is returned as a second value; new conses are allocated if TARGET supplies an insufficient number of conses.

make-string-output-stream &optional string Like the current MAKE-STRING-OUTPUT-STREAM, except if STRING is provided, it must be a string with a fill pointer. Output to the resulting stream is incrementally appended to STRING, as if using VECTOR-PUSH-EXTEND if the string is adjustable, and otherwise using VECTOR-PUSH.

read-line-into-string string &optional input-stream eof-error-p eof-value recursive-p Like READ-LINE, but reads into STRING. STRING must be a string with a fill pointer. The result of READ-LINE-INTO-STRING is incrementally appended to STRING, as if using VECTOR-PUSH-EXTEND if the string is adjustable,and otherwise using VECTOR-PUSH.

In order to facilitate recycling alists and trees, the following two functions are proposed.

flatten-tree tree FLATTEN-TREE would take a tree and return a linear list made of all the conses in the tree, suitable for recycling via a :RECYLE argument.

flatten-alist alist FLATTEN-ALIST would take an alist and return a linear list made of all the conses that would be copied by COPY-ALIST, suitable for recycling via a :RECYLE argument.

Rationale

It is sometimes better to use a more complex programming construct for sake of efficiency than to use a less complex, more expensive one. Providing a functions that do not require dynamic storage allocation provides a means for writing programs that must avoid storage allocation while running as much as possible. Excessive storage allocation (sometimes expressed as "Lisp garbage collects too often") is one of Lisp's most widely publicized faults.

Current Practice

Similar functionality is provided for bit-vectors, as specified in CLtL 17.4. A related capability is provided in the destructive versions of some of the functions extended above (e.g., REMOVE vs. DELETE).

When functionality similar to this is required, users must "roll their own".

[from Moon] Symbolics Common Lisp already has MAP-INTO.

Cost to Implementors

The cost of implementation of these extra sequence, list, and string function arguments is significant, in that existing code to implement these functions must be changed to use storage passed in as an argument by the user.

Cost to Users

No cost for existing programs, this change is upwards compatible. Users may mis-use the new keyword arguments and end up with buggy code, similar to the problems encountered when using C library functions that re-use data structures.

Cost of Non-Adoption

Programmers will continue to "roll their own" non-standard storage re-using code. Others will not go to this effort, but will write programs that require much garbage collection. Lisp will continue to suffer from a handicap in promoting efficient programs when compared to C.

Benefits

Those programmers choosing to invest the extra effort needed to correctly use such facilities can do so with less overhead than previously required, and will see performance improvement.

Aesthetics

Neutral. This proposal is syntactically consistent with the existing keyword arguments to sequence, list, and string functions. However, it adds to the plethora of keyword arguments already present for many of these functions. Stylistically, it provides stronger support for the programming style embodied by the use of the "destructive" versions of many Common Lisp functions.

Discussion

My experience in several commercial Lisp applications is that avoiding storage allocation by using programming techniques such as this may be critical to the success of a project. The current sequence, list, and string functions encourage an "expensive", albeit easy to use, programming style that invites the creation of programs whose performance suffers greatly in comparison to a C program written to solve the same problem. This applies particularly to string-hcaking programs written using Common Lisp versus those written using the standard string library for C.

A related issue: would adoption of these arguments eliminate the need for parallel destructive versions for some of the affected functions? (E.g., does REMOVE with a :MODIFY argument eliminate the need for DELETE.)

Pierson supports the general ideas of an earlier version of this proposal, but believes the proposal itself should be in the form that would appear in a revised manual.

Several people responded in the negative to Fahlmans suggestion that perhaps the new LOOP facility would relieve the need for these extensions. Fahlman is ready to support the proposal in principle if certain details are fixed up.

Moon does not like returning unused conses as a second value as proposed in previous versions of this proposal. He also does not believe that UNION, INTERSECTION, SET-DIFFERENCE, and SET-EXCLUSIVE-OR derive much benfit from previously suggested extensions similar to those in this proposal. Neither should the functions SUBSEQ, COPY-SEQ, COPY-LIST, and BUTLAST be modified, because the functionality is already available from REPLACE, he claims. He thought :TARGET should be changed to :OVERWRITE.

All of the above comments were made prior to version three of this proposal.

MAPCAR and friends might be considered candidates for modification in this proposal. However, these mapping functions likely will be made obsolete by LOOP if they are not already obsolete.

Edit History