[ Start > PikeHints > TextFileParsing ] [ Edit this Page | Show Page Versions | Show Raw Source ]

Pike, while not predominantly a text parsing language in the way that Perl or REXX is, still has a rich set of text manipulation facilities.

The sscanf and sprintf functions are capable of far more than their C counterparts, and can in fact handle quite complex translations:

//Sample data from /etc/passwd
string input=#"stan:x:1004:1003:,,,:/home/stan:/bin/true
sscanf(input,"%{%s:%s:%d:%d:%s:%s:%sn%}",array(array(string|int)) users);
string output=sprintf("%{User %[0]s (uid %[2]d) uses shell %[6]s in %[5]s.n%}",users);

More advanced functions can be found in the http://pike.ida.liu.se/generated/manual/modref/ex/predef_3A_3A/Parser.html module. One relatively new class is Parser.Tabular, which is capable of quite complex translations, though it requires a format definition. The format string can be code-generated, as in the following:

array(mapping(string:string)) parsecsv(string filename)
    Stdio.FILE f=Stdio.FILE(filename);
    return Parser.Tabular(f,sprintf("[Tabular description begin]ncsvn%{:%s      [,]n%}[Tabular description end]",array_sscanf(f->gets(),"%{%*["]%s%*["]%*[,]%}")[0]))->fetch()->csv;

The first line of the file is parsed by sscanf and sprintf to produce a template that will parse the rest of the file. With an input file such as (where the first row contains column names):


this will produce an array of mappings:

({ /* 3 elements */
   ([ /* 3 elements */
     "Asdf": "Hello",
     "Qwer": "World",
     "Zxcv": "!"
   ([ /* 3 elements */
     "Asdf": "1234",
     "Qwer": "2345",
     "Zxcv": "3456"
   ([ /* 3 elements */
     "Asdf": "This",
     "Qwer": "has",
     "Zxcv": "a,comma"

Many other parser functions are also available. Explore the Parser module and have fun!


XML to Mapping

TODO: A simple XML parser demo would be handy.

Powered by PikeWiki2

gotpike.org | Copyright © 2004 - 2009 | Pike is a trademark of Department of Computer and Information Science, Linköping University