sábado, 2 de mayo de 2015

Erudite: a tool for Literate Programming in Common Lisp

Overview

Literate programming is an approach to programming introduced by Donald Knuth in which a program is given as an explanation of the program logic in a natural language, such as English, interspersed with snippets of macros and traditional source code, from which a compilable source code can be generated [1].

Literate programming is certainly controversial, but it can make sense for some projects, at least for long-living projects [2] with not an obvious design and fairly complex algorithms.

LP in Common Lisp

There are some interesting tools if you want to do LP on CL.

Org-mode is an Emacs mode for "keeping notes, maintaining TODO lists, planning projects, and authoring documents with a fast and effective plain-text system". But it can be used for LP via Babel contrib. Its distinctive features are the possibility to you show live code examples and integrate different programming languages; but it is tied to Emacs and it is not particularly thought for Lisp.

Noweb is one of the most popular LP tools. It is language independent and is known by its simplicity (at least compared to WEB, the first LP system).

CLWEB is apparently good if you want to use traditional LP in your CL project.

All these systems suffer from a required tangling phase in which code is extracted from the documents, which then has to be compiled and run. One of the distinctive features of Lisp is the possibility to develop incrementally and on a live system from a Lisp listener. This is not possible any more if a code tangling phase is present.

LP/Lisp is a LP system which solves that. LP directives and text are put in standard Lisp comments. That means there is no tangling phase; the source code is a completely normal Lisp program and thus working from a lisp listener is perfectly possible. The downside of this system is that it is not portable (it compiles on Allegro Common Lisp only). There was a version 2 planned where that and other things would be improved, but it hasn't happened yet.

Erudite

Erudite is my implementation of a Literate Programming tool for Common Lisp. It turned out to be similar to LP/Lisp, but more extensible, with support for multiple input syntaxes, multiple outputs generation, and it is also portable (compiles in several of the open source Lisps).

Similarly to LP/Lisp, there's no code generation involved, so interactive development is possible; and LP directives and text reside in Lisp comments.

Erudite provides it's own LP syntax, which then can be converted to the desired output. Erudite syntax looks like this: @operation{parameters}. So for instance, @emph{this is emphasized}, is transformed to \emph{this is emphasized} in the LaTeX output, and to **this is emphasized** in the Markdown output.

If you want to use raw LaTeX syntax, then you can write LaTeX in comments and then indicate you are using LaTeX as syntax for the parser.

The supported input types are Erudite, LaTeX, Restructured Text (Sphinx flavour), and Markdown.

The supported outputs (for when using Erudite syntax) are LaTeX, Restructured Text (Sphinx flavour), and Markdown.

This is the PDF generated from Erudite source code, and these are the PDF and Markdown outputs of an output test.

References

[1] Knuth, Donald E. (1984). "Literate Programming" (PDF). The Computer Journal (British Computer Society) 27 (2): 97–111. doi:10.1093/comjnl/27.2.97. Retrieved January 4, 2009.
[2] Literate Programming in the Large

martes, 16 de septiembre de 2014

Embedding Python in Common Lisp

The Common Lisp libraries problem

The Common Lisp libraries problem is already known. The Lisp community is not as big as those of more mainstream languages like Python or Java. Although nowadays the problem is not so big (there are lots of good libraries and there's also Quicklisp to gather them all), sometimes there are no Lisp libraries out there for some specific task, or maybe the Lisp library available is partially implemented.

Reusing Python libraries

Python is a mainstream language with a big community and a bunch of solid libraries implemented for it. So, my idea was to try to access those libraries from Lisp.

I tried CLPython first. It is a Python compiler for Common Lisp. All in all, although it is probably good, and works for lots of cases, in my experience it didn't work out too well. I had problems compiling Python libraries, or if not, I got runtime errors when executing the Python code.

So I looked for an alternative, and found Burgled Batteries. It is a Lisp library for accessing Python via its C API. From the library README: "While a number of other Python-by-FFI options exist, burgled-batteries aims for a CLPython-esque level of integration. In other words, deep integration. You shouldn’t have to care that the library you’re using was written in Python—it should Just Work." And indeed, it just works.

The Burgled Batteries API

At the low level layer, Burgled Batteries implements the whole C API (or almost all of it) via CFFI and makes it accessible as normal Common Lisp functions.

At a higher level, it provides access to Python by passing a string with Python code, and also declaring Python functions and then evaluating them, as shown in the following example from the library README:
(asdf:load-system "burgled-batteries")
(in-package #:burgled-batteries)
(startup-python)

(run "1+1") ; => 2

(import "feedparser")
(defpyfun "feedparser.parse" (thing))
(documentation 'feedparser.parse 'function)
; => "Parse a feed from a URL, file, stream, or string"
(feedparser.parse "http://pinterface.livejournal.com/data/atom")
; => #<HASH-TABLE>

(shutdown-python)
As the example shows, marshalling of data types between Lisp and Python is in place (the result of calling the "Python" function is a hash table). More about how this works, and other peculiarities of this Python bridge (like memory handling) appear in Burgled Batteries README.

While this high level API may be enough to access Python libraries with a simple API, it is not so good if what we need to do with Python is more involved. There's no automatic generation of Lisp functions from a Python module introspection implemented at the moment, and manually defining every Python function one would like to use via defpyfun can be cumbersome.

Embedded Python like syntax

So I decided to try to improve the way to communicate with the Python interpreter. Instead of generating Lisp functions from Python modules introspection, I thought that providing an embedded Python like syntax could be a good idea.

This is how the example above looks when using the embedded syntax:
(asdf:load-system "burgled-batteries.syntax")
(in-package #:burgled-batteries)
(burgled-batteries.syntax:enable-python-syntax)
(startup-python)

(import :feedparser)
[^feedparser.parse('http://pinterface.livejournal.com/data/atom')]
; => #<HASH-TABLE>

(shutdown-python)
Python syntax appears between brackets ([]). Note that it is not necessary to declare Python functions in the Lisp world anymore. This is very similar to what Clojure does to access Java. The embedded syntax is implemented as a reader macro (of course), and using ESRAP to do the parsing. In case you are interested on what is going on behind the scenes, you can inspect which calls to the C api are being made by quoting the Python expression:
PYTHON> '[^.feedparser.parse('http://pinterface.livejournal.com/data/atom')]
(LET ((#:TRANSFORMED2520
       (CFFI:CONVERT-FROM-FOREIGN
        (CALL* (REF* "feedparser") "parse"
               (STRING.FROM-STRING*
                "http://pinterface.livejournal.com/data/atom"))
        'PYTHON.CFFI::OBJECT!)))
  #:TRANSFORMED2520)
It is possible to access Lisp references from the Python syntax; that makes the integration quite easy. For instance:
PYTHON> (let (($url "http://pinterface.livejournal.com/data/atom"))
           [^feedparser.parse($url)])
=>

#<HASH-TABLE :TEST EQUAL :COUNT 12 {1005647793}> 
 
As you can see, Lisp variables start with the $ character.
What's more, the idea is that the control flow is implemented in Lisp, making calls to Python via the embedded syntax. Here is a more involved example to see this in action:

PYTHON> (import :icalendar)
PYTHON> (import :datetime)
PYTHON> (let (($cal [icalendar.Calendar()]))
  [$cal.add('prodid', '-//My calendar product//mxm.dk//')]
  (let (($event [icalendar.Event()]))
    [$event.add('summary', 'Python meeting about calendaring')]
    [$event.add('dtstart', datetime.datetime(2005,4,4,8,0,0))]
    [$event.add('dtend', datetime.datetime(2005,4,4,10,0,0))]
    [$event.add('dtstamp', datetime.datetime(2005,4,4,0,10,0))]
    (let (($organizer [icalendar.vCalAddress('MAILTO: noone@example.com')]))
      [$organizer.params['cn'] = icalendar.vText('Max Rasmussen')]
      [$organizer.params['role'] = icalendar.vText('CHAIR')]
      [$event['organizer'] = $organizer]
      [$event['location'] = icalendar.vText('Odense, Denmark')]

      [$event['uid'] = '20050115T101010/27346262376@mxm.dk']
      [$event.add('priority', 5)]

      (let (($attendee [icalendar.vCalAddress('MAILTO:maxm@example.com')]))
         [$attendee.params['cn'] = icalendar.vText('Max Rasmussen')]
         [$attendee.params['ROLE'] = icalendar.vText('REQ-PARTICIPANT')]
         [$event.add('attendee', $attendee, encode=0)])

      (let (($attendee [icalendar.vCalAddress('MAILTO:the-dude@example.com')]))
         [$attendee.params['cn'] = icalendar.vText('The Dude')]
         [$attendee.params['ROLE'] = icalendar.vText('REQ-PARTICIPANT')]
         [$event.add('attendee', $attendee, encode=0)])

      [$cal.add_component($event)]
      [^$cal.to_ical()])))
=>

"BEGIN:VCALENDAR
PRODID:-//My calendar product//mxm.dk//
BEGIN:VEVENT
SUMMARY:Python meeting about calendaring
DTSTART;VALUE=DATE-TIME:20050404T080000
DTEND;VALUE=DATE-TIME:20050404T100000
DTSTAMP;VALUE=DATE-TIME:20050404T001000Z
UID:20050115T101010/27346262376@mxm.dk
LOCATION:Odense\\, Denmark
MAILTO:MAXM@EXAMPLE.COM:attendee
MAILTO:THE-DUDE@EXAMPLE.COM:attendee
ORGANIZER;CN=\"Max Rasmussen\";ROLE=CHAIR:MAILTO: noone@example.com
PRIORITY:5
END:VEVENT
END:VCALENDAR"


Here we accessed the iCalendar Python library. The result is a printed calendar as a string in the Lisp world.

A nice feature of using this syntax is that it is quite compact and readable (doing the same via macros is possible, but not as compact and readable as this); it looks a lot like Python with some minor differences, so it is very clear where the Python code is; and last but not least, it is very easy to copy and paste Python code and make it work with just a few modifications.

The syntax is not exactly that of Python, because we need to decide whether we are referring to a Lisp or a Python binding, and there's also some syntax for indicating when we want the marshalled object [^obj] or just the pointer to the object [obj]. The final syntax is not fully decided yet, I'm still playing with some ideas.

Conclusion

The embedded Python like language integrates quite well with Lisp, and allowed to avoid having to manually define the Python functions we want to access, or generating any glue by introspection. Instead, ffi calls are done on the fly.

It is hard to imagine being able to implement an embedded language like this in other language than Lisp. Access to the compiler parser and PEGs made the implementation very easy.

The burgled-batteries.syntax contrib library is available here, and it is work in progress.


Welcome

Hello. I'm a Common Lisp programmer. I will be posting about my development endeavours from time to time here. Stay tuned!