TextFileParsing

[ Start > PikeHints > TextFileParsing ] [ Edit this Page | Show Page Versions | Show Raw Source ]


Pike, while not predominantly a text parsing language in the way that Perl or REXX is, still has a rich set of text manipulation facilities.

The sscanf and sprintf functions are capable of far more than their C counterparts, and can in fact handle quite complex translations:

//Sample data from /etc/passwd
string input=#"stan:x:1004:1003:,,,:/home/stan:/bin/true
dewey:x:1005:1004:,,,:/home/dewey:/bin/bash
gilberta:x:1006:1005:,,,:/home/gilberta:/bin/bash
";
sscanf(input,"%{%s:%s:%d:%d:%s:%s:%sn%}",array(array(string|int)) users);
string output=sprintf("%{User %[0]s (uid %[2]d) uses shell %[6]s in %[5]s.n%}",users);

More advanced functions can be found in the Parser module. One relatively new class is Parser.Tabular, which is capable of quite complex translations, though it requires a format definition. The format string can be code-generated, as in the following:

array(mapping(string:string)) parsecsv(string filename)
{
    Stdio.FILE f=Stdio.FILE(filename);
    return Parser.Tabular(f,sprintf("[Tabular description begin]ncsvn%{:%s      [,]n%}[Tabular description end]",array_sscanf(f->gets(),"%{%*["]%s%*["]%*[,]%}")[0]))->fetch()->csv;
}

The first line of the file is parsed by sscanf and sprintf to produce a template that will parse the rest of the file. With an input file such as (where the first row contains column names):

"Asdf","Qwer","Zxcv"
"Hello","World","!"
1234,2345,3456
This,has,"a,comma"

this will produce an array of mappings:

({ /* 3 elements */
   ([ /* 3 elements */
     "Asdf": "Hello",
     "Qwer": "World",
     "Zxcv": "!"
   ]),
   ([ /* 3 elements */
     "Asdf": "1234",
     "Qwer": "2345",
     "Zxcv": "3456"
   ]),
   ([ /* 3 elements */
     "Asdf": "This",
     "Qwer": "has",
     "Zxcv": "a,comma"
   ])
})

Many other parser functions are also available. Explore the Parser module and have fun!

1 XML

XML to Mapping

TODO: A simple XML parser demo would be handy.


Powered by PikeWiki2

 
gotpike.org | Copyright © 2004 - 2009 | Pike is a trademark of Department of Computer and Information Science, Linköping University