Friday, March 4, 2011

Refactoring a massive function into many files.

I've been trying to refactor a "bit" of code that I'd previously developed. Basically, the project was my response to not knowing how to use XSLT effectively, so I developed an XML transformation system in PHP. The program reads through the tags of an XML file and does something along these lines to convert it to HTML:

private function getTemplate(...) {
    switch ($nodeName) {
        case "a" :
            // code here to generate a link tag
            break;
        case "box" :
            // code here to generate the divs and whatnot to create a box
            break;
        case "ref" :
            // look up an external reference file and include a bibliography
            break;
        default :
            // do the default thing
    }
}

That was all working great, except that I ended up with 26 branches to my switch, and that once switch block was over 1000 lines of code. Needless to say, it made maintenance slightly more difficult.

What I've done now is to pull the code of each branch out into its own file (named "a.php", "box.php", "ref.php" ...) and include that file each time:

if (file_exists("templates/$nodeName.php")) {
    include "templates/$nodeName.php";
} else {
    // do the default thing
}

Again, this works, but benchmarking it shows that it has slowed down processing times by 50%. I'm assuming that this is because there's now up to 4000 includes being done now.

What I was considering was to put the code for each template into a function, and if the function hasn't been declared then include the file, and then run the function - the only problem with that is that the existing code has been written in the scope of the original function, using $this, etc.

Given that this code is not run in real time (eg: it merely processes the XML into static HTML files which are stored - it's not done on the fly), do you have any advice for me here?

From stackoverflow
  • caveat: i don't know PHP, but I can google and i like function pointers better than huge switch statements, so if you can't just use XSLT...

    ...one option would be to adopt a naming convention for your 'worker' functions that incorporated the html node's tag name, e.g. <tagname>_converter, include all the functions, and replace your switch statement with something like:

    private function getTemplate(...) 
    {
        $func = $nodeName + "_converter";
        if (function_exists($func))    //see if function is defined, somehow...
        {
            $func();    //call conversion function (pass whatever is needed)
        }
    }
    
  • Try something like this:

    //utils.php
    function handle_box($node)
    {
    //...
    }
    
    function handle_link($node)
    {
    //....
    }
    ?\>
    
    

    then:

    
    require_once 'templates/utils.php';
    
    
    function getTemplate()
    {
      switch($node)
      {
          case "a" :
            handle_link($node,$otherParams);
          break;
      }
    }
    
    }
    

    Basically this refactors all the functionality for each node into its own function. From here you can work on a more generalized solution. It is very similar to what you had original but the switch/case statements will be a lot more manageable without a ton of code inside of them. You can then look towards potentially implementing the Stratgy design pattern if you feel it is necessary.

    Kudos : I think that's actually what his original code was, or if anything worse. Fortunately for you, I'm 9 points off downvoting ;)
    nickf : no, i've actually just done a cut and paste of my original code into external files - no changes made to it at all (eg: wrapping each into a function)
    grepsedawk : If that was his original code, it would be pretty amazing to turn 26 case statements into 1000 lines...
  • Learn XSLT.

    annakata : right tool for the right job
    nickf : least. helpful. answer. ever.
    Robert Rossney : You may not think it's helpful, but it's right. Do you want a maintainable, performant solution? Don't use a tool that's obviously wrong for the job, especially when there's one that's obviously right.
  • Without getting into XSLT itself and focusing on the question of refactoring: You already have considered a fair amount of different strategies. I will offer an optimization tip on one of the approaches you already experimented with, and one other approach which might work for you.

    When you said that there was 50% performance decrease when relying on many files. I take it you don't have PHP Op-code cache enabled. You do that, and that 50% lose should vanish.

    On the other approach, I would suggest creating a base class NodeProcessor or so. And then create a new class for every node ANodeProcessor, RefNodeProcessor, ...etc.

    Alternatively, you could only create one class NodeProcessor which contains methods in the form of: processTag. Then:

    1. You can call $this->processNode($tag);
    2. An in that method: $name = 'process'. $tag;
    3. Then: return $this->nodeProcessor->{$name};
    4. NodeProcessor should have a __call() method to do the "default thing" for tags with no processTag method defined.

    Again, don't forget to add op-code caching.

0 comments:

Post a Comment