Saturday, June 15, 2013

[Python] read formatted input

I have come to the same conclusion as this blog and this "physics forum thread" (is this a real forum? or a copy of another forum?).
There is no Python equivalent of C/C++ fscanf or MATLAB fscanf, sscanf or textscan.
Here are some alternatives that I have found.
  1. numpy.genfromtext() does more or less exactly the same thing. It reads strings, via StringIO, and file. There is a nice section on importing data in the NumPyUser Guide.
    • instead of format specifiers like '%8f%4s%2d' use delimiter=(8, 4, 2) and set dtype=(float, str, int). Voila!
    • But genfromtext() does so much more! Using dtypes you can also set field names. There are options for skipping headers and footers, See the documentation.
  2. parse 1.6.1 offers parse(), the opposite of format() on PyPI. I haven't tried it, and I wish there was more documentation, specifically examples of multiple parsed tokens, but it does seem to be a python version of textscan, but for strings only.
  3. The re module in the standard Python reference is an obvious choice to parse tokens from strings. There is even a section on simulating scanf that offers recipes for %f and other formatters.
  4. For simple delimiters, one can use either of the following:
    • csv module from the standard Python reference
    • numpy.loadtxt() which has the added advantage of reading in data as NumPy arrays.
    • str.split() obviously
There are probably many other methods, but for MATLAB converts, once they move from disbelief and denial onto acceptance, it's pretty straightforward issue to resolve.

No comments:

Post a Comment

Fork me on GitHub