Gruppen du sender innlegg til, er en Usenet-gruppe. E-postadressen til forfatteren av meldinger som legges inn i denne gruppen, vil vises for alle på Internett.
I have written a basic C wrapper for libmxl2, which parses an XML document, and prints a list containing the parsed data. At the moment, I save the printed sexp to a file, and READ it from Lisp. I am thinking it may be better to avoid the intermediate printing and READing, and instead save the parsed data to a C struct; I can then try to use Lisp to get the sexp from the C struct, using an FFI. What is a good way to store a sexp in C? Is it common practice to pass through an intermediate file like I am doing now?
TIA, Raghavendra.
PS: In case it's relevant, the C wrapper takes an XML document like this
and prints a sexp, which when READ by CLISP looks like this
--8<---------------cut here---------------start------------->8--- ((DRAFT (PROCESSING-INSTRUCTION "omit=\"author\"")) (DOC (XML-NAMESPACE "urn:example:ns:doc") (ID "sexp") (LANG "en-GB") (LATEX (PROCESSING-INSTRUCTION "class=\"article\" fontsize=\"12pt\"")) (DOCINFO (XML-NAMESPACE "urn:example:ns:doc") (AUTHOR (XML-NAMESPACE "urn:example:ns:doc") (FIRSTNAME (XML-NAMESPACE "urn:example:ns:doc") "Schöne") (SURNAME (XML-NAMESPACE "urn:example:ns:doc") "Grüße")) (TITLE (XML-NAMESPACE "urn:example:ns:doc") "An Example Article") (TITLEABBREV (XML-NAMESPACE "urn:example:ns:doc") (ROLE "running-title") "Example art") (KEYWORDSET (XML-NAMESPACE "urn:example:ns:doc") (KEYWORD (XML-NAMESPACE "urn:example:ns:doc") (TYPE (XML-NAMESPACE "urn:example:ns:keyword") "ai") "an articleinfo keyword") (KEYWORD (XML-NAMESPACE "urn:example:ns:doc") (TYPE (XML-NAMESPACE "urn:example:ns:keyword") "ai") "another articleinfo keyword"))) (SECTION (SECTIONINFO (KEYWORDSET (KEYWORD (TYPE (XML-NAMESPACE "urn:example:ns:keyword") "si") "a sectioninfo keyword"))) (TITLE "A Section") (PARA "This is a paragraph in a section. It does not quote any one, so we don't need any `\"' characters.") (PARA "The canonical URI for this document is " (ULINK (URL "http://www.example.org/sexp")) ". Please check there for the latest updates. Meanwhile, here is Äñ example of character entity references.")))) --8<---------------cut here---------------end--------------->8---
> I have written a basic C wrapper for libmxl2, which parses an XML > document, and prints a list containing the parsed data. At the moment, > I save the printed sexp to a file, and READ it from Lisp. I am thinking > it may be better to avoid the intermediate printing and READing, and > instead save the parsed data to a C struct; I can then try to use Lisp > to get the sexp from the C struct, using an FFI. What is a good way to > store a sexp in C? Is it common practice to pass through an > intermediate file like I am doing now?
If you want just to transform XML into SEXPS, take a look at sxml and other XML libs available out there.
You may also consider printing the sexp into a string and read it from Lisp. Printing to a file is not a very good idea since access to HD is very much slower than access to RAM.
This discussion thread took place over seven years ago.
> I have written a basic C wrapper for libmxl2, which parses an XML > document, and prints a list containing the parsed data. At the moment, > I save the printed sexp to a file, and READ it from Lisp. I am thinking > it may be better to avoid the intermediate printing and READing, and > instead save the parsed data to a C struct; I can then try to use Lisp > to get the sexp from the C struct, using an FFI. What is a good way to > store a sexp in C? Is it common practice to pass through an > intermediate file like I am doing now?
Why don't you just use an XML processing library for Common Lisp?
* Kaz Kylheku <20090615105137....@gmail.com> : Wrote on Wed, 3 Jun 2009 20:09:22 +0000 (UTC):
| On 2009-06-03, N. Raghavendra <ra...@mri.ernet.in> wrote:
|> I have written a basic C wrapper for libmxl2, which parses an XML |> document, and prints a list containing the parsed data. At the moment, |> I save the printed sexp to a file, and READ it from Lisp. I am thinking |> it may be better to avoid the intermediate printing and READing, and |> instead save the parsed data to a C struct; I can then try to use Lisp |> to get the sexp from the C struct, using an FFI. What is a good way to |> store a sexp in C? Is it common practice to pass through an |> intermediate file like I am doing now? | | Why don't you just use an XML processing library for Common Lisp?
I'm sure there would be valid reasons to avoid depending on the recent offerings of the free-common-lisp-library vendors in this department.
| It seems like your wrapper is a waste of effort.
If you looked at his posted example carefully. libxml2 has done extra processing on his document: libxml2 has expanded this directive
I'd imagine the OP would prefer to use libxml2 for its features.
To answer the OP, I'd expect that it may not be worth the effort to construct lisp sexps directly from C code.
The next easiest thing would be to prepare a stream in C which lisp could call READ on directly. But CL:READ would be likely be the slowest step; I noticed you are dealing with UTF-8 strings and so any clever speedups from munging 8bit-clean character arrays between C and lisp will not be possible and you'd go through the extra copying/stream layers in your lisp.
Are there any other characterestics of your XML data that you want to avoid a file layer?
At 2009-06-03T10:36:53-07:00, gugamilare wrote: > If you want just to transform XML into SEXPS, take a look at sxml and > other XML libs available out there.
Yes, I have looked at it and at the others mentioned at cliki.
> You may also consider printing the sexp into a string and read it from > Lisp.
> This discussion thread took place over seven years ago.
Yes, I knew that.
> Why don't you just use an XML processing library for Common Lisp?
I have looked at the recommended XML libraries at cliki, and prefer libxml2 to them. Partly because of convenience, as libxml2 is already there on my FreeBSD system, due to many ports depending on it. It has good documentation and an excellent tutorial, and is easy to use. Also, it isn't clear to me that the Common Lisp XML libraries mentioned at cliki have all the features that libxml2 has.
So, I decided to use libxml2 for what it seems to be good at, namely parsing XML, and to then use Lisp for working with the parsed tree.
> It seems like your wrapper is a waste of effort.
Given the documentation of libxml2, it didn't take much effort.
At 2009-06-04T09:57:02+05:30, Madhu wrote: > * Kaz Kylheku <20090615105137....@gmail.com> : > | Why don't you just use an XML processing library for Common Lisp?
> I'm sure there would be valid reasons to avoid depending on the recent > offerings of the free-common-lisp-library vendors in this department.
Indeed, as mentioned in an earlier posting, I have looked at the XML libraries featured at cliki, and decided to use libxml2.
> If you looked at his posted example carefully. libxml2 has done extra > processing on his document: libxml2 has expanded this directive
>> This discussion thread took place over seven years ago.
> Yes, I knew that.
>> Why don't you just use an XML processing library for Common Lisp?
> I have looked at the recommended XML libraries at cliki, and prefer > libxml2 to them. Partly because of convenience, as libxml2 is already > there on my FreeBSD system, due to many ports depending on it. It has > good documentation and an excellent tutorial, and is easy to use. Also, > it isn't clear to me that the Common Lisp XML libraries mentioned at > cliki have all the features that libxml2 has.
> So, I decided to use libxml2 for what it seems to be good at, namely > parsing XML, and to then use Lisp for working with the parsed tree.
Indeed, "As of release 2.4.16, libxml2 passed all 1800+ tests from the OASIS XML Tests Suite."
>> It seems like your wrapper is a waste of effort.
> Given the documentation of libxml2, it didn't take much effort.
What I would do, is to use directly libxml2 from Common Lisp using CFFI. That is, implement that "wrapper" in Lisp.
An alternative would be to use cxml, and to patch it to make it pass the 1800+ tests from the OASIS XML Tests Suite.
>> This discussion thread took place over seven years ago.
> Yes, I knew that.
>> Why don't you just use an XML processing library for Common Lisp?
> I have looked at the recommended XML libraries at cliki, and prefer > libxml2 to them.
I see. It seems like a good way to solve this problem is to use the intrinsic C functions in CLISP to create the structure and return it as an object. (I noted your mention in your original article that you are using CLISP).
The problem with this is that, I suspect, you can't do this with a dynamic FFI module. This technique needs access to symbols within the CLISP image that are not available for dynamic linkage. I.e. you have to make a statically linked CLISP C module via the linking set mechanism.
Here is an example.
Firstly, since we won't be using the FFI compiler at all, but writing our own intrinsic function, we need an alternate way to generate the C module boilerplate required for CLISP module linking. That alternate way is a utility called modprep.lisp which is in the linkkit directory.
With modprep, we can write in a preprocessed C. Let's use the suffix "mp" for this language:
Modprep provides considerable conveniences for writing C that interfaces with CLISP. (Good to read the modprep.lisp source code in the linkkit directory).
The #include "clisp.h" is mandatory. The &optional dummy is a workaround for the fact that DEFUN requires a second argument.
Note how I'm ``cheating'' by using the CL-USER package. Also note how the DEFUN syntax is very helpful. We can use Lisp symbol syntax, which is automatically mapped to some C friendly name that we don't have to care about.
Because our function takes an argument, we need to clean it up before returning, otherwise there will be a crash. That is the purpose of the popSTACK() call.
Let's continue. The next step is to know where your CLISP linkkit directory is and install this as an environment variable:
You have to watch out for GC safety when writing intrinsic code. Objects that are only referenced by your C variables are not reachable by the collector. If you call any function within CLISP that may trigger GC, you have to park all your live objects somewhere where they are reachable; usually the stack is used for this purpose.
On Thu, 04 Jun 2009 14:35:43 +0530, <ra...@mri.ernet.in> wrote:
> I have looked at the recommended XML libraries at cliki, and prefer > libxml2 to them. Partly because of convenience, as libxml2 is already > there on my FreeBSD system, due to many ports depending on it. It has > good documentation and an excellent tutorial, and is easy to use. Also, > it isn't clear to me that the Common Lisp XML libraries mentioned at > cliki have all the features that libxml2 has.
You are parsing the entire file first, which means that any error in the xml file probably kills you. S-XML has an event interface, I just used it to to handle data mined from a site with a fair potential for error (lots of hand entered data). That event interface replaced a perl complete file parser, and the lisp version is much easier to maintain, plus I get all the valid data upto any error, and the data beyond it. I was able to correct some minor bugs in the perl version that were difficult to handle in that mode of parsing.
It's nice to have features [and maybe libxml2 has error-recovery], but unless you are really using them, being able to refine or correct the code seems a better option. S-XML's event interface was a little funny, but worked out fine.
Back when I interfaced to a C program, I used a stream to feed lisp. That was faster and stayed away from the R/W speed of the disk subsystem, etc. For simple files, using the file system is simple and OK, but sooner or later the stream solution will prove to be better and using it on a simple problem is a good step.
-- Humans are allergic to change. "We've always done it that way" is not a good reason to continue to do so. That's why I have a clock on my office wall that runs backwards. It forces visitors to think. They hate me for that. - Admiral Hopper.
> It seems like a good way to solve this problem is to use the intrinsic > C functions in CLISP to create the structure and return it as an > object. (I noted your mention in your original article that you are > using CLISP). > [snip] > Here is an example. > [snip] > Hope this helps.
Yes, it indeed does. Thank you for taking the time to write an example. I have been trying to understand the section on "External Modules" in the CLISP manual, and not making much progress with it. I think your example will be very useful.
> You are parsing the entire file first, which means that any error in > the xml file probably kills you. S-XML has an event interface, I just > used it to to handle data mined from a site
Actually, I am using the XmlTextReader API of libxml2, and not the tree API which first parses the entire document into a tree in memory. The XmlTextReader enables one to process each part of the document as it is visited by the reader,
> with a fair potential for error (lots of hand entered data). It's > nice to have features [and maybe libxml2 has error-recovery], but > unless you are really using them, being able to refine or correct the > code seems a better option.
The documents I process are almost always created by myself, are at the least well-formed, and usually known to be valid with respect to some schema. So, I didn't bother to check the facilities for recovering from parsing errors, in libxml2 or in any of the Common Lisp libraries I had read about.
> For simple files, using the file system is simple and OK, but sooner > or later the stream solution will prove to be better and using it on a > simple problem is a good step.