Sunday, March 27, 2011

c# how do I count lines in a textfile

any problems with doing this?

int i = new StreamReader("file.txt").ReadToEnd().Split(new char[] {'\n'}).Length

From stackoverflow
  • Well, the problem with doing this is that you allocate a lot of memory when doing this on large files.

    I would rather read the file line by line and manually increment a counter. This may not be a one-liner but it's much more memory-efficient.

    Alternatively, you may load the data in even-sized chunks and count the line breaks in these. This is probably the fastest way.

  • Sure - it reads the entire stream into memory. It's terse, but I can create a file today that will fail this hard.

    Read a character at a time and increment your count on newline.

    EDIT - after some quick research If you want terse and want that shiny new generic feel, consider this:

    public class StreamEnumerator : IEnumerable<char>
    {
        StreamReader _reader;
    
        public StreamEnumerator(Stream stm)
        {
            if (stm == null)
                throw new ArgumentNullException("stm");
            if (!stm.CanSeek)
                throw new ArgumentException("stream must be seekable", "stm");
            if (!stm.CanRead)
                throw new ArgumentException("stream must be readable", "stm");
    
            _reader = new StreamReader(stm);
        }
    
        public IEnumerator<char> GetEnumerator()
        {
            int c = 0;
            while ((c = _reader.Read()) >= 0)
            {
                yield return (char)c;
            }
        }
    
        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }
    

    which defines a new class which allows you to enumerate over streams, then your counting code can look like this:

    StreamEnumerator chars = new StreamEnumerator(stm);
    int lines = chars.Count(c => c == '\n');
    

    which gives you a nice terse lambda expression to do (more or less) what you want.

    I still prefer the Old Skool:

        public static int CountLines(Stream stm)
        {
            StreamReader _reader = new StreamReader(stm);
            int c = 0, count = 0;
            while ((c = _reader.Read()) != -1)
            {
                if (c == '\n')
                {
                    count++;
                }
            }
            return count;
        }
    

    NB: Environment.NewLine version left as an exercise for the reader

    spoulson : This wouldn't work when searching Environment.NewLine, which is usually a two character string (CrLf).
    JMD : He's got the right idea though. So how about using a RegEx to search for instances of Environment.NewLine?
  • Assuming the file exists and you can open it, that will work.

    It's not very readable or safe...

  • If you're looking for a short solution, I can give you a one-liner that at least saves you from having to split the result:

    int i = File.ReadAllLines("file.txt").Count;
    

    But that has the same problems of reading a large file into memory as your original. You should really use a streamreader and count the line breaks as you read them until you reach the end of the file.

  • The method you posted isn't particularly good. Lets break this apart:

    // new StreamReader("file.txt").ReadToEnd().Split(new char[] {'\n'}).Length
    //     becomes this:
    var file = new StreamReader("file.txt").ReadToEnd(); // big string
    var lines = file.Split(new char[] {'\n'});           // big array
    var count = lines.Count;
    

    You're actually holding this file in memory twice: once to read all the lines, once to split it into an array. The garbage collector hates that.

    If you like one liners, you can write System.IO.File.ReadAllLines(filePath).Length, but that still retrieves the entire file in an array. There's no point doing that if you aren't going to hold onto the array.

    A faster solution would be:

    int TotalLines(string filePath)
    {
        using (StreamReader r = new StreamReader(filePath))
        {
            int i = 0;
            while (r.ReadLine() != null) { i++; }
            return i;
        }
    }
    

    The code above holds (at most) one line of text in memory at any given time. Its going to be efficient as long as the lines are relatively short.

0 comments:

Post a Comment