Parsing unevenly spaced columns from text in PHP

If you want to parse unevenly or jaggedly spaced columns out of a text file in PHP, below are two functions that will help you do it.

Let's say that you have the following text output:


col1 col2 col3
==== ==== ====
1 a b c d e 103 14 as d9
2 a 103 14 as d9
3 a 103 14 as d9


And we want to transform the data into the following:


col1 col2 col3
1 a b c d e
103 14 as d9
2 a
103 14 as d9
3 a
103 14 as d9

We need an algorithm that detects large spaces and small spaces and makes a weighted estimate as to which column the data belongs. Perhaps you are parsing columns from a data source that changes frequently the amount of spaces that are between each column and that breaks your parser. This is a more robust solution than just using the positions of the column headers to parse out the data in the columns. This algorithm deals with a small degree of noise or error that may show up in the text you are parsing.





Here is a more efficient algorithm you can use if your columns have the same positions as the headers.







comments powered by Disqus