Monday, March 28, 2011

Free tools to speed up web development

As a little side project, I am starting to build a new website for a certain organization I am affiliated to, which current site is simply outdated.

I am regularly a Java developer, and last time I really did some web development was back in the late 90s, when <p> was still more popular than <div> and Javascript was cutting-edge technology (JQuery is for lazy bums these days :) ).

Anyway, I feel really outdated. The website basically is going to be:

  • Django based
  • mostly serve static information pages
  • it will have a dynamic news and updates page (based on Django admin capabilites)
  • and some basic apps I'll develop myself (polls, small registration app, etc...)

My problem is designing the whole thing. I found some nice web-based CSS layout generators that got me going, but I still feel I'm wasting my time smoothing out the CSS files and aligning <div>s.

Are there any tools - the simpler and faster, the better - that you recommend I can use to speed up the design part of the site so I can concentrate on the real work?

I don't need anything fancy, just a nice looking layout and design that I can tweak a bit so the site will look presentable.

From stackoverflow
  • No idea what the site is going to be, but have you thought of using a pre-build CMS like Drupal, Joomla etc etc.? You can then tweak templates etc, rather than worrying about making it from scratch.

    Yuval A : I think a CMS is a little bit of an overkill. I update the question a bit, but like I said most of the site will be static content. I don't need a huge CMS framework just to get a nice template. Isn't there a better way to get that?
    AnonJr : Not all frameworks are huge, and they will save you a significant chunk of time. sitepoint.com had a contest a while back that illustrated the difference. Can't find the link at the moment.
    Steven Robbins : Possibly something like Drupal/Joomla would be overkill, but there are lighter ones. Big advantage is they let you just work on the imporant stuff.. the content.. withoug having to worry about browser compatibility, SEO etc etc
  • - http://patterntap.com/
    - http://www.dotemplate.com/ (interesting concept of customizing template)
    - http://www.templatemonster.com/
    - http://www.freelayouts.com/websites/html-templates
    - http://www.templateyes.com/

  • A quick google search led me to this website. Perhaps this is along the lines of what you are looking for?

  • In my previous employment I created dozens of templates for websites.
    The most useful too I ever discovered is the Firefox Web Developer Toolbar.
    It has a wealth of small useful tools. My favorite feature is the ability to edit the CSS and see the results in real-time. This saves on the whole edit - upload - refresh cycle. Watch out for ie CSS inconsistencies though! Off the top of my head, these are the most important gotchas.

    double margin bug [google: double margin bug]
    incorrect (but more intuitive) box model [google: box model]
    incorrect (but more intuitive) float clearing [google: clearfix]

    FireBug is another really usefull Firefox plugin for more in depth analysis.

    AnonJr : There's a developer toolbar for IE as well - and its built into IE8. :D Both toolbars are wonderful for developing, troubleshooting, but I don't think they answer the question...
    Yuval A : This does not answer my question. I do not need debugging tools. I need methods to minimize time I spend on designing.
    annakata : Firebug allows for rapid dev of organic CSS because you can see live edits in the browser rather than being forced into code/refresh/code/refresh cycles - it's *extremely* effective at minimising design time
  • There's one piece of advice that saves more time than any other when it comes to rapid development of CSS styled sites and that's KEEP IT SIMPLE

    Use an attractive simple layout that doesn't require pixel perfection and that can 'gracefully' degrade in less compliant browsers (IE6). Minimise the amount of CSS and fix the basic bugs mentioned by meouw above. Then get on an concentrate on content and functionality...the real work

  • A colleague of mine has been trying to convince me all week that Dreamweaver, 5 years after I was last forced to use it at gunpoint, is actually now worthwhile for knocking up a design quickly and painlessly, and is also now competent at producing the HTML for that design.

    I refuse to invest the 10 minutes it would take to find out based on my previous experiences of it, but you might like to give the demo a quick run around the block :)

  • I'd recommend finding a CMS package, since you're using Django, look into django-cms. It has TinyMCE and Markdown Support so updating your pages should be easy. Also django-cms integrates well with the Django admin interface.

  • Have you looked into any CSS frameworks? If you are competent enough with CSS something like a framework could help speed things up.

  • I second Brandon's suggestion to use a CSS framework. It won't give you 100% freedom to design anything you like, but it can speed up your design process greatly and free up your hands to do the coding you really want.

    Suggestions:

    Tom Deleu : In particular, I like 960 grid system the best. really simple to use, and very clean XHTML...
    Tom : blueprint is recommended, it's very flexible with less bloat.
    Jens Roland : Ah, I forgot the YUI grid. I tend to forget YUI because it's so incredibly big (not saying it's slow, it's just big), and I'm such a sucker for lean little packages.
    Yuval A : Jens thanks for your answer, I think I will go with one of those. Enjoy the bounty ;)
    Jens Roland : Wow, cool - my first bounty! Thanks!
  • I've always found Open Source Web Design to be a good resource when looking to get started trying to design something.

  • Maybe a system such as phpNuke or something similar?

    lpfavreau : Seems the OP wants to use Django (Python) and not PHP though.
  • Don't forget firebug :) if you're worried about tweaking the design it's really great. With the inspect feature allowing you to real-time edit the CSS of your page.

    https://addons.mozilla.org/en-US/firefox/addon/1843

  • I won't lie to you. This website isn't the best place to go if you're looking for reliable Web Design advice. Stack Overflow is a programming community and programmers rarely know anything about design. If you want to get some real advice then I would strongly recommend the main Web Design/Development forums on the Internet, especially SitePoint.

    That being said, as a former freelance Web Designer/Developer I'll offer my input on the issue. Not that you should value it, of course. After all, this is a programming website.

    NEVER EVER SAY CSS LAYOUT GENERATOR EVER AGAIN! If you're going to seriously get into designing web pages then you need to learn semantic XHTML and CSS first. Whilst many people tout W3Schools as the definitive resource I see it as a programmers answer (i.e. not very good) and would prefer that you read up on the subject using...Google. There are countless examples of great websites to pick up the basics of Web Design/Development that Google is probably the best website for the job. Also, with a plethora of new websites offering this information you know it's going to be more improved than W3Schools. You'll seriously want to get clued-up on writing your design because it'll be much harder to fix things later on in the project.

    If you're going to be designing web pages it would be a good idea to learn what actually makes a good design. Check out CSS Vault for a fantastic resource of some of the best-designed web pages around, of course with all the source code intact so you can have a play around with their code and see how they've managed some of the wonderful effects they've produced. I've learnt more than a thing or two from websites that have been featured on CSS Vault. On top of that you should read up on Web Design from the big Web Design/Development sites. Two of my favourites are SitePoint and A List Apart, two names that you'll near time and time again when you hear people talk about resources. Browse those websites, check their forums, see what REAL Web Designers/Developers are using, not what programmers are using.

    On the subject of CSS Frameworks; they do help! The problem with using them is that you'll often spend so much time looking for a worthwhile framework that you could have finished most of the CSS for your website yourself. You'll either love them or hate them, but many people will say that they're not necessary.

    Once you've got your mind set on what a good design looks like and you've got the resources you need to make something of value I suggest that you get to work! In reality when you're designing a web page all you really need is a text editor with a save function, an image manipulation program, a browser window and FireBug. An IDE helps a lot of people, but if you do use one then you'll definitely want to work in its text mode. I use NotePad++ or Emacs exclusively but a lot of people like to use Aptana Studio, so it may be worth a look.

    When you're getting to actually building the code behind your website you can't really go much worse than your favourite IDE/Text Editor and a source control tool. As a Java programmer you're better suited to talk about programming so I won't lecture you on a subject you already know.

    In the end, Web Design is going to take time and many of the tools that we choose to use that we claim will "save time" save very little in reality. If you're not a design guru then it will take you a substantial amount of time to create a great-looking website. It's a fact of life. Call me old-fashioned (a funny word coming from a 21 year old) but I still think that the quickest way is to sketch a design out on a piece of paper (a image program if really necessary) and to just get out there and make the damn thing! Again, I'll have to take this hunting for the silver bullet mentality as a programmers trait, one that really won't help that much when designing, because designing a web page is vastly different to writing a Java program.

    In short, ignore everything you read here, read what REAL designers are doing and just make the damn thing!

    lpfavreau : I completely agree with everything you said EnderMB. I would however point out that the OP is asking for a shortcut for a CSS layout as a developer (like most other visitors on this site) and not how to become a designer. He doesn't want something fancy, just not too ugly. But again, good post.
    Yuval A : very nice post indeed, but a bit off from what I was looking for. thanks anyways :)
    • A piece of paper to sketch your design
    • A text editor (preferably set up to save directly to a development server local or otherwise) to write up the initial HTML/CSS
    • FireBug/IEDeveloperToolbar to inspect each element that looks wrong and edit its CSS directly in the browser until it looks fine, followed by commits with the text editor
  • I second Jens Roland's list of CSS frameworks.

    But I would also suggest you have a look at websites such as Smashing Magazine. You might find this article of interest, amongst a lot of others. They tend to do a lot of nice and long mash-ups for the web developers that will at least give you inspiration if not a direct solution.

  • If you're doing any JavaScript development, use an advanced editor that highlights errors and warnings as you type. This kind of functionality has until recently been the province of static language editors only, but the free NetBeans 6 achieves this amazing feat. Traditionally, you first discovered JavaScript typos and simple bugs when first loading the parent page into a browser. The speed-up you get from being able to short-cut these iterations is profound.

    NetBeans 6 also highlights CSS errors and is a more than capable editor for most all languages a web developer is likely to use these days.

Good inflection library for PHP?

I'm looking for a good inflection (or well, a library that can turn plural into singular and vice verse - which a kind of inflection) library for PHP, it could be a part of some current framework or a stand alone library the only requirement I have is that it's compatible with the MIT license.

From stackoverflow

Mid vs Mid$

According to the documentation in VB6 the Mid function returns a variant, but Mid$ returns a string and apparently this is more efficient.

My questions are :

  1. What simple test can I use to discern the difference in performance ? I tried looking at simple app which did a few string operations, with Perfmon, but there was no discernible difference.

  2. Is it worth worrying about? I've gotten into the habit of using the $-ized functions, but should I recommend everybody on my team to use it as well ?

From stackoverflow
  • Isn't worth worrying about. It's a remnant from Microsoft Basic of 15-20 years ago when a fast processor was orders of magnitude slower than anything today.

    It has a certain esthetic appeal to use Mid$ rather than let VB determine what your datatypes are, though. And if you have any loops that are executing it, say, thousands of tiems a second, then your curiosity factor might increase. Otherwise, neh.

    Here's a link to someone who measured the difference. Mid$ was about 2.5 times as fast as Mid. Including tests going back to VB4.

    jakdep : Thanks. Interesting link, exactly what I was looking for.
    garykindel : Might want to consider including a 3rd party library like Stamina. Includes many string handling routines written in C that are much faster than VB6. http://www.hallogram.com/stamina/routines.html
  • Honestly, I think it's negligible.

    Maybe you can try something like this. Download the "High-Performance Timer Object" from http://ccrp.mvps.org/, do a long loop (1.000.000 iterations or so) of string operations, and measure the run time difference. By "operations" I mean: Concatenation of Variants as opposed to concatenation of Strings. Mid() and Mid$() will very likely perform the same. OTOH - you can test that as well.

    If you did, I'd be interested if you posted the results.

    jakdep : +1 for the Timer Object link. I compared Mid("ABC") with Mid$("ABC"), as in the link provided by le dorfier, over 100,000,000 iterations and measured the duration with High-Performance stopwatch. Mid() took 35.364 seconds and Mid$() took 13.56 seconds. So, it matches the results shown in the link.
    jakdep : *cough* *cough* Sorry, that is +1 if I could upvote it.
    Tomalak : I am surprised. This really *is* a mentionable difference.
  • Whilst performance between them is negligible its not really a differentiator as to which to use anyway.

    There can be some nuances when using a variant when a strong type is required. For example what happens when you pass a variant to a parameter expecting a ByRef string? Nothing bad but something a little more than passing an address happens.

    Hence if you know that you want to work with strings then go ahead and use the $ versions of these functions the behaviour of them and their use in other expressions is simpler and better understood. If you know you need a variant and your inputs are variants then sure use the non $ versions.

    mafutrct : Exactly! Variants should be avoided whenever possible.

Return first match of Ruby regex

I'm looking for a way to perform a regex match on a string in Ruby and have it short-circuit on the first match.

The string I'm processing is long and from what it looks like the standard way (match method) would process the whole thing, collect each match, and return a MatchData object containing all matches.

match = string.match(/regex/)[0].to_s
From stackoverflow
  • You could try variableName[/regular expression/]. This is an example output from irb:

    irb(main):003:0> names = "erik kalle johan anders erik kalle johan anders"
    => "erik kalle johan anders erik kalle johan anders"
    irb(main):004:0> names[/kalle/]
    => "kalle"
    
    Gishu : Is this not doing a match and returning the first result behind the scenes ?
    Daniel Beardsley : After some benchmarking with various length strings and looking at the C source, it turns out Regex.match does short-circuit and only finds the first match.
  • If only an existence of a match is important, you can go with

    /regexp/ =~ "string"
    

    Either way, match should only return the first hit, while scan searches throughout entire string. Therefore if

    matchData = "string string".match(/string/)
    matchData[0]    # => "string"
    matchData[1]    # => nil - it's the first capture group not a second match
    

What is the best way to replace or substitute if..else if..else trees in programs?

This question is motivated by something I've lately started to see a bit too often, the if..else if..else structure. While it's simple and has its uses, something about it keeps telling me again and again that it could be substituted with something that's more fine-grained, elegant and just generally easier to keep up-to-date.

To be as specific as possible, this is what I mean:

if (i == 1) {
    doOne();
} else if (i == 2) {
    doTwo();
} else if (i == 3) {
    doThree();
} else {
    doNone();
}

I can think of two simple ways to rewrite that, either by ternary (which is just another way of writing the same structure):

(i == 1) ? doOne() : 
(i == 2) ? doTwo() :
(i == 3) ? doThree() : doNone();

or using Map (in Java and I think in C# too) or Dictionary or any other K/V structure like this:

public interface IFunctor() {
    void call();
}

public class OneFunctor implemets IFunctor() {
    void call() {
     ref.doOne();
    }
}

/* etc. */    

Map<Integer, IFunctor> methods = new HashMap<Integer, IFunctor>();
methods.put(1, new OneFunctor());
methods.put(2, new TwoFunctor());
methods.put(3, new ThreeFunctor());
/* .. */
(methods.get(i) != null) ? methods.get(i).call() : doNone();

In fact the Map method above is what I ended up doing last time but now I can't stop thinking that there has to be better alternatives in general for this exact issue.

So, which other -and most likely better- ways to replace the if..else if..else are out there and which one is your favorite?

Your thoughts below this line!


Okay, here are your thoughts:

First, most popular answer was switch statement, like so:

switch (i) {
    case 1:  doOne(); break;
    case 2:  doTwo(); break;
    case 3:  doThree(); break;
    default: doNone(); break;
}

That only works for values which can be used in switches, which at least in Java is quite a limiting a factor. Acceptable for simple cases though, naturally.

The other and perhaps a bit fancier way you seem to sugges is to do it using polymorphism. The Youtube lecture linked by CMS is an excellent watch, go see it here: "The Clean Code Talks -- Inheritance, Polymorphism, & Testing" As far as I understood, this would translate to something like this:

public interface Doer {
    void do();
}

public class OneDoer implements Doer {
    public void do() {
     doOne();
    }
}
/* etc. */

/* some method of dependency injection like Factory: */
public class DoerFactory() {
    public static Doer getDoer(int i) {
     switch (i) {
      case 1: return new OneDoer();
      case 2: return new TwoDoer();
      case 3: return new ThreeDoer();
      default: return new NoneDoer();
     }
    }
}

/* in actual code */

Doer operation = DoerFactory.getDoer(i);
operation.do();

Two interesting points from the Google talk:

  • Use Null Objects instead of returning nulls (and please throw only Runtime Exceptions)
  • Try to write a small project without if:s.

Also in addition one post worth mentioning in my opinion is CDR who provided his perverse habits with us and while not recommended to use, it's just very interesting to look at.

Thank you all for the answers (so far), I think I might have learned something today!

From stackoverflow
  • Use a switch/case it's cleaner :p

  • The switch statement of course, much prettier then all those if's and else's.

  • In this simple case you could use a switch.

    Otherwise a table-based approach looks fine, it would be my second choice whenever the conditions are regular enough to make it applicable, especially when the number of cases is large.

    Polymorphism would be an option if there are not too many cases, and conditions and behaviour are irregular.

  • These constructs can often be replaced by polymorphism. This will give you shorter and less brittle code.

    vava : not me but I don't think code will be shorter :)
    WolfmanDragon : polymorphism is great, when it can be used. How about inside of loops? I have never seen it used there. can it be done?
    Brian Rasmussen : @Vadim - not necessarily, but if the if/switch blocks are repeated it probably will.
    Brian Rasmussen : @WolfmanDragon - sure it can. It is just calling a method based on the actual runtime type.
    WolfmanDragon : I'm going over and asking this as a question then, I don't see it.
    mouviciel : I downvoted. This guy is fed up with a construct as complex as `if ... else if ... else if ...` and the suggested solution is polymorphism? Sorry, it doesn't sound right for me.
    Brian Rasmussen : @mouviciel, please check out the google tech talk on polymorphism for additional input.
    mouviciel : Excuse my previous rough comment, I thought that in the given example a switch were the simplest solution (as suggested in http://stackoverflow.com/questions/519515 which is the follow-up of this question). Anyway, I will follow your advice and take a look at this google tech talk.
    David Thornley : If the difference is worth subclassing about, polymorphism is the answer. Otherwise, I don't think it's worth worrying about. If trees are only harmful when they're bushy, not when they're long and straggly.
    Brian Rasmussen : @David - I didn't say that polymorphism is the answer to every possible problem. It's not a silver bullet.
    Pavel Feldman : While I also do think that polymorphism can be the right solution here, this comment could be more specific and give some examples. I'm too lazy, so just upvoting :)
    Pavel Feldman : If in one place you use switch(i) { case 1:doOne();break; case 2:doTwo();break; } and in another place switch(i){ case 1:doFirst(); case 2:doSecond() } then it clearly means polymorphism is a must.
  • A switch statement:

    switch(i)
    {
      case 1:
        doOne();
        break;
    
      case 2:
        doTwo();
        break;
    
      case 3:
        doThree();
        break;
    
      default:
        doNone();
        break;
    }
    
  • In Object Oriented languages, it's common to use polymorphism to replace if's.

    I liked this Google Clean Code Talk that covers the subject:

    The Clean Code Talks -- Inheritance, Polymorphism, & Testing

    ABSTRACT

    Is your code full of if statements? Switch statements? Do you have the same switch statement in various places? When you make changes do you find yourself making the same change to the same if/switch in several places? Did you ever forget one?

    This talk will discuss approaches to using Object Oriented techniques to remove many of those conditionals. The result is cleaner, tighter, better designed code that's easier to test, understand and maintain.

  • There's two parts to that question.

    How to dispatch based on a value? Use a switch statement. It displays your intent most clearly.

    When to dispatch based on a value? Only at one place per value: create a polymorphic object that knows how to provide the expected behavior for the value.

  • Outside of using a switch statement, which can be faster, none. If Else is clear and easy to read. having to look things up in a map obfuscates things. Why make code harder to read?

  • switch (i) {
      case 1:  doOne(); break;
      case 2:  doTwo(); break;
      case 3:  doThree(); break;
      default: doNone(); break;
    }
    

    Having typed this, I must say that there is not that much wrong with your if statement. Like Einstein said: "Make it as simple as possible, but no simpler".

  • In OO paradigm you could do it using good old polymorphism. Too big if - else structures or switch constructs are sometimes considered a smell in the code.

  • Depending on the type of thing you are if..else'ing, consider creating a hierarchy of objects and using polymorphism. Like so:

    class iBase
    {
       virtual void Foo() = 0;
    };
    
    class SpecialCase1 : public iBase
    {
       void Foo () {do your magic here}
    };
    
    class SpecialCase2 : public iBase
    {
       void Foo () {do other magic here}
    };
    

    Then in your code just call p->Foo() and the right thing will happen.

  • switch statement or classes with virtual functions as fancy solution. Or array of pointers to functions. It's all depends on how complex conditions are, sometimes there's no way around those if's. And again, creating series of classes to avoid one switch statement is clearly wrong, code should be as simple as possible (but not simpler)

  • Naturally, this question is language-dependent, but a switch statement might be a better option in many cases. A good C or C++ compiler will be able to generate a jump table, which will be significantly faster for large sets of cases.

  • The Map method is about the best there is. It lets you encapsulate the statements and breaks things up quite nicely. Polymorphism can complement it, but its goals are slightly different. It also introduces unnecessary class trees.

    Switches have the drawback of missing break statements and fall through, and really encourage not breaking the problem into smaller pieces.

    That being said: A small tree of if..else's is fine (in fact, i argued in favor for days about have 3 if..elses instead of using Map recently). Its when you start to put more complicated logic in them that it becomes a problem due to maintainability and readability.

  • If you really must have a bunch of if tests and want to do different things whenwver a test is true I would recommend a while loop with only ifs- no else. Each if does a test an calls a method then breaks out of the loop. No else there's nothing worse than a bunch of stacked if/else/if/else etc.

  • I would go so far as to say that no program should ever use else. If you do you are asking for trouble. You should never assume if it's not an X it must be a Y. Your tests should test for each individually and fail following such tests.

    mghie : This is far too strong, as can be seen in the question - "else doNone()" may of course be the right thing to do, instead of enumerating all other possible values.
  • In python, I would write your code as:

    actions = {
               1: doOne,
               2: doTwo,
               3: doThree,
              }
    actions[i]()
    
  • I use the following short hand just for fun! Don't try anyof these if code clearity concerns you more than the number of chars typed.

    For cases where doX() always returns true.

    i==1 && doOne() || i==2 && doTwo() || i==3 && doThree()
    

    Ofcourse I try to ensure most void functions return 1 simply to ensure that these short hands are possible.

    You can also provide assignments.

    i==1 && (ret=1) || i==2 && (ret=2) || i==3 && (ret=3)
    

    Like instad of writting

    if(a==2 && b==3 && c==4){
        doSomething();
    else{
        doOtherThings();
    }
    

    Write

    a==2 && b==3 && c==4 && doSomething() || doOtherThings();
    

    And in cases, where not sure what the function will return, add an ||1 :-)

    a==2 && b==3 && c==4 && (doSomething()||1) || doOtherThings();
    

    I still find it faster to type than using all those if-else and it sure scares all new noobs out. Imagine a full page of statement like this with 5 levels of indenting.

    "if" is rare in some of my codes and I have given it the name "if-less programming" :-)

    Esko : This is just perverse :) +1 for that alone.
  • I regard these if-elseif-... constructs as "keyword noise". While it may be clear what it does, it is lacking in conciseness; I regard conciseness as an important part of readability. Most languages provide something like a switch statement. Building a map is a way to get something similar in languages that do not have such, but it certainly feels like a workaround, and there is a bit of overhead (a switch statement translates to some simple compare operations and conditional jumps, but a map first is built in memory, then queried and only then the compare and jump takes place).

    In Common Lisp, there are two switch constructs built in, cond and case. cond allows arbitrary conditionals, while case only tests for equality, but is more concise.

    (cond ((= i 1)
           (do-one))
          ((= i 2)
           (do-two))
          ((= i 3)
           (do-three))
          (t
           (do-none)))

    (case i (1 (do-one)) (2 (do-two)) (3 (do-three)) (otherwise (do-none)))

    Of course, you could make your own case-like macro for your needs.

    In Perl, you can use the for statement, optionally with an arbitrary label (here: SWITCH):

    SWITCH: for ($i) {
        /1/ && do { do_one; last SWITCH; };
        /2/ && do { do_two; last SWITCH; };
        /3/ && do { do_three; last SWITCH; };
        do_none; };
    
  • The example given in the question is trivial enough to work with a simple switch. The problem comes when the if-elses are nested deeper and deeper. They are no longer "clear or easy to read," (as someone else argued) and adding new code or fixing bugs in them becomes more and more difficult and harder to be sure about because you might not end up where you expected if the logic is complex.

    I've seen this happen lots of times (switches nested 4 levels deep and hundreds of lines long--impossible to maintain), especially inside of factory classes that are trying to do too much for too many different unrelated types.

    If the values you're comparing against are not meaningless integers, but some kind of unique identifier (i.e. using enums as a poor man's polymorphism), then you want to use classes to solve the problem. If they really are just numeric values, then I would rather use separate functions to replace the contents of the if and else blocks, and not design some kind of artificial class hierarchy to represent them. In the end that can result in messier code than the original spaghetti.

Textual Irregularities

Does anybody know of a library or piece of software out there that will locate irregularities in text? For example, lets say I have...

1. Name 1, Comment
2. Name 2, Comment
3. Name 3 , Comment
5. Name 10, Comment

This software or library would first cut up portions of text that it would find similar (much alike a piece of compression software would encode repetitive similar portions of text to compress it down) but using a variable for error tolerance it could find similar portions of text, now much alike a text comparison application or diff/merge tool it could actually highlight what it sees as different. I'm thinking about possibly making this tool but I do not wish to reinvent the wheel. If there is anything out there anywhere remotely capable of this I would really like to know to possibly help on this project or at least know not to make one. Not to mention this answer could possibly help other people hunting for the same thing, I would think the demand would be high enough for the supply that's why it boggles my mind that I can't find anything at all.

From stackoverflow
  • If you are into Python, you might try difflib.

    It's not an exact solution to your problem, but it might be helpful.

  • Depending on what sort of real life irregularities you want to find or correct this problem is radically different.

    Here is your example updated with real text:

    1. Lazarus Long, Get the first shot off fast.
    2. Hiro Protagonist, Greatest swordfighter[sic] in the world.
    3. Alice , Down the rabbit hole.
    5. Orem, Sink of power.
    

    In this example the errors could be fixed with a decent text editor with find an replace. Text editors and hex editors can work miracles if you get creative with wildcards. The problem remains simple as long as your delimiting factors are in existence (. or ,). As you have probably already know; as soon as one of those is missing the problem becomes much more complex.

    Example of a hard problem:

    1. Lazarus Long, Get the first shot off fast.
     2. Hiro Protagonist  Greatest swordfighter[sic] in the world.
    3. Alice , Down the rabbit hole.
    5 . Orem, , Sink of power.
    

    I would probably attack this in a few steps. 1. Clean up extra spaces. 2. Find out key statistics such as the number of delimiters per line and the avg number of words or characters per delimited column. Most names are one or two words, comments are unknown or limited by input. 3. Find lines with a statistically improbably number of key features. 4. Try your best to correct them.

    I understand that this is not directly solving your problem, but maybe one idea can patch your problem over for a bit. It is possible that past wheel wrights never completed any designs.

  • Sounds basically like you'd want to use Regex to create an "ideal response" then compare the rest of the lines against it.

    Or you could write a more complicated program which would boil each line down into a Regex query, and then compare the queries to each other to see which ones are different.