TextFileParsing

[ Start > PikeHints > TextFileParsing ] [ Edit this Page | Show Page Versions | Show Formatted ]


Pike, while not predominantly a text parsing language in the way that Perl or REXX is, still has a rich set of text manipulation facilities.

The {link:sscanf|http://pike.ida.liu.se/generated/manual/modref/ex/predef_3A_3A/sscanf.html} and {link:sprintf|http://pike.ida.liu.se/generated/manual/modref/ex/predef_3A_3A/sprintf.html} functions are capable of far more than their C counterparts, and can in fact handle quite complex translations:

{code:pike}
//Sample data from /etc/passwd
string input=#"stan:x:1004:1003:,,,:/home/stan:/bin/true
dewey:x:1005:1004:,,,:/home/dewey:/bin/bash
gilberta:x:1006:1005:,,,:/home/gilberta:/bin/bash
";
sscanf(input,"%{%s:%s:%d:%d:%s:%s:%s\n%}",array(array(string|int)) users);
string output=sprintf("%{User %[0]s (uid %[2]d) uses shell %[6]s in %[5]s.\n%}",users);
{code}

More advanced functions can be found in the {link:Parser|http://pike.ida.liu.se/generated/manual/modref/ex/predef_3A_3A/Parser.html} module. One relatively new class is Parser.Tabular, which is capable of quite complex translations, though it requires a format definition. The format string can be code-generated, as in the following:

{code:pike}
array(mapping(string:string)) parsecsv(string filename)
{
    Stdio.FILE f=Stdio.FILE(filename);
    return Parser.Tabular(f,sprintf("[Tabular description begin]\ncsv\n%{:%s      [,]\n%}[Tabular description end]",array_sscanf(f->gets(),"%{%*[\"]%s%*[\"]%*[,]%}")[0]))->fetch()->csv;
}
{code}

The first line of the file is parsed by sscanf and sprintf to produce a template that will parse the rest of the file. With an input file such as (where the first row contains column names):

{code}
"Asdf","Qwer","Zxcv"
"Hello","World","!"
1234,2345,3456
This,has,"a,comma"
{code}

this will produce an array of mappings:

{code}
({ /* 3 elements */
   ([ /* 3 elements */
     "Asdf": "Hello",
     "Qwer": "World",
     "Zxcv": "!"
   ]),
   ([ /* 3 elements */
     "Asdf": "1234",
     "Qwer": "2345",
     "Zxcv": "3456"
   ]),
   ([ /* 3 elements */
     "Asdf": "This",
     "Qwer": "has",
     "Zxcv": "a,comma"
   ])
})
{code}

Many other parser functions are also available. Explore the Parser module and have fun!

1 XML

[PikeHints/XML to Mapping]

__TODO__: A simple {link:XML parser|http://pike.ida.liu.se/generated/manual/modref/ex/predef_3A_3A/Parser/XML.html} demo would be handy.

Powered by PikeWiki2

 
gotpike.org | Copyright © 2004 - 2009 | Pike is a trademark of Department of Computer and Information Science, Linköping University