Saturday, February 19, 2011

Interactive Statistical Analysis tool

I'm looking for a basic software for statistical analysis. Most important is simple and intuitive use, getting started "right out of the box". At least basic operations should be interactive. Free would be a bonus :)

The purpose is analysis of data dumps and logs of various processes.

  • Importing a comma/tab separated file
  • sorting and filtering rows on conditions
  • basic aggregates: count, average, deviation, regression, trend
  • visualization - plotting the data,bin distribution etc.

Excel fails (at least for me) for the filtering and re-combining data, I guess something like "Excel with SQL" would be nice. I've been using MS Access + Excel and copying around data before, but that's a pain.

Do you have any recommendation?

Clarification I am not looking for a specific tool for IIS/web server logs, but various data end event logs (mostly from custom applications) with tab-separated values.

From stackoverflow
  • Specifically for Log file analysis I would recommend Microsoft's Log Parser(free), which will allow you to run queries with basic aggregation against all types of text based files (and across sets of files), XML, CSV, Event Log, the Registry, file system, Active Directory, etc..

    There is also a free GUI build on top of it called Log Parser Lizard GUI which makes it more user friendly and can do basic graphing etc.

    peterchen : That looks like a good start, but I can't get at lizard to accept my columns (first line in a tab separated file) - do oyu know how?
    peterchen : I gave up on the lizard GUI for now, but OMG the Log parser is, like, sooo awesome!!!
  • I used Tableau Software at a previous gig, and it's pretty amazing - extremely intuitive and easy to use.

    Unfortunately it's also pricey.

  • You might find gretl very useful (thanks GNU !!!)

    http://gretl.sourceforge.net/win32/

  • I would consider looking at R, it is:

    • Free
    • Widely used by statisticians
    • Fairly easy to use.
    • Can easily do everything you mention in your post
    Tal Galili : Hi Carlos, Although I love R a lot, I have yet to see a dedicated package in it for log file analysis. Therefore, I am not so sure it will fit the bill... Tal
    Carlos Rendon : @Tal, Why would you need a dedicated package? The whole point of R is to make statistical analysis easy. Just about every R program does the 4 things mentioned by peterchen and R has easy to use, built-in support for all of all of them.
    Tal Galili : Hi Carlos, As I wrote - I really love R, BUT... Log file analysis, in order to allow one to be able to do something in the level of well developed softwares, one need to do a lot of preprocessing (for example - taking out bot i.p's). I have no doubt it can be done in R. But without a package from someone who helped create some framework - I am not sure how much work this might take (again, it well depend on the depth of the analysis which is being employed) Cheers, Tal

0 comments:

Post a Comment