Friday, February 4, 2011

Library or algorithm to explode an alphanumeric range

I was wondering if there is an open source library or algorithm that can expand a non-numeric range. For example, if you have 1A to 9A you should get 1A, 2A, 3A, 4A, 5A, 6A, 7A, 8A, 9A.

I've tried Googling for this and the best I could come up with were Regex that would expand numerics with dashes (1-3 becoming 1,2,3).

  • The question is not clear.

    For example, if I give E6T to 5G9 as input, what should be the output?

    From Niyaz
  • And what about E6 to 5G9 ?

    From Mendelt
  • I was trying to leave it somewhat open because the number of possibilities is staggering. I believe this one of those questions that could not be answered 100% here without going through a lot of technical detail about is considered a "good" or "bad" range. I'm just trying to find a jumping point for ideas on how other people have tackled this problem. I was hoping that someone wrote a blog post explaining how they went about it solving this problem or created a whole library to handle this.

    From Gariig
  • I would say the first step in the solution will be to define how characters and numbers interact and form a sequence. The given example isn't clear, as you would at least assume it to run 1A, 1B .... 8Y, 8Z, 9A - that's assuming your input is restricted to decimal followed by a single character.

    If you can define a continuous sequence for characters and decimals, then you it will simply be a matter of some recursion / looping to generate part of that sequence.

    For example, you could assume that each character in the input is one of (1-9A-Z), therefore you could easily make that continuous by grabbing the decimal ascii value of the alpha characters and subtracting 55, in effect giving you the range (1-35)

    From iAn
  • If we assume that the start and end ranges will follow the same alternating pattern, and limit the range of digits to 0-9 and A-Z, we can think of each group of digits as a component in a multi-dimensonal coordinate. For example, 1A would correspond to the two-dimensional coordinate (1,A) (which is what Excel uses to label its two-dimensional grid of rows and columns); whereas AA1BB2 would be a four-dimensional coordinate (AA,1,BB,2).

    Because each component is independent, to expand the range between two coordinates we just return all combinations of the expansion of each component. Below is a quick implementation I cooked up this afternoon. It works for an arbitrary number of alternations of normal and alphabetic numbers, and handles large alphabetic ranges (i.e. from AB to CDE, not just AB to CD).

    Note: This is intended as a rough draft of an actual implementation (I'm taking off tomorrow, so it is even less polished than usual ;). All the usual caveats regarding error handling, robustness, (readability ;), etc, apply.

    IEnumerable<string> ExpandRange( string start, string end ) {
      // Split coordinates into component parts.
      string[] startParts = GetRangeParts( start );
      string[] endParts = GetRangeParts( end );
    
      // Expand range between parts 
      //  (i.e. 1->3 becomes 1,2,3; A->C becomes A,B,C).
      int length = startParts.Length;
      int[] lengths = new int[length];
      string[][] expandedParts = new string[length][];
      for( int i = 0; i < length; ++i ) {
        expandedParts[i] = ExpandRangeParts( startParts[i], endParts[i] );
        lengths[i] = expandedParts[i].Length;
      }
    
      // Return all combinations of expanded parts.
      int[] indexes = new int[length];
      do {
          var sb = new StringBuilder( );
          for( int i = 0; i < length; ++i ) {
            int partIndex = indexes[i];
            sb.Append( expandedParts[i][partIndex] );
          }
          yield return sb.ToString( );
      } while( IncrementIndexes( indexes, lengths ) );
    }
    
    readonly Regex RangeRegex = new Regex( "([0-9]*)([A-Z]*)" );
    string[] GetRangeParts( string range ) {
      // Match all alternating digit-letter components of coordinate.
      var matches = RangeRegex.Matches( range );
      var parts =
        from match in matches.Cast<Match>( )
        from matchGroup in match.Groups.Cast<Group>( ).Skip( 1 )
        let value = matchGroup.Value
        where value.Length > 0
        select value;
      return parts.ToArray( );
    }
    
    string[] ExpandRangeParts( string startPart, string endPart ) {
      int start, end;
      Func<int, string> toString;
    
      bool isNumeric = char.IsDigit( startPart, 0 );
      if( isNumeric ) {
        // Parse regular integers directly.
        start = int.Parse( startPart );
        end = int.Parse( endPart );
        toString = ( i ) => i.ToString( );
      }
      else {
        // Convert alphabetic numbers to integers for expansion,
        //  then convert back for display.
        start = AlphaNumberToInt( startPart );
        end = AlphaNumberToInt( endPart );
        toString = IntToAlphaNumber;
      }
    
      int count = end - start + 1;
      return Enumerable.Range( start, count )
        .Select( toString )
        .Where( s => s.Length > 0 )
        .ToArray( );
    }
    
    bool IncrementIndexes( int[] indexes, int[] lengths ) {
      // Increment indexes from right to left (i.e. Arabic numeral order).
      bool carry = true;
      for( int i = lengths.Length; carry && i > 0; --i ) {
        int index = i - 1;
        int incrementedValue = (indexes[index] + 1) % lengths[index];
        indexes[index] = incrementedValue;
        carry = (incrementedValue == 0);
      }
      return !carry;
    }
    
    // Alphabetic numbers are 1-based (i.e. A = 1, AA = 11, etc, mod base-26).
    const char AlphaDigitZero = (char)('A' - 1);
    const int AlphaNumberBase = 'Z' - AlphaDigitZero + 1;
    int AlphaNumberToInt( string number ) {
      int sum = 0;
      int place = 1;
      foreach( char c in number.Cast<char>( ).Reverse( ) ) {
        int digit = c - AlphaDigitZero;
        sum += digit * place;
        place *= AlphaNumberBase;
      }
      return sum;
    }
    
    string IntToAlphaNumber( int number ) {
      List<char> digits = new List<char>( );
      while( number > 0 ) {
        int digit = number % AlphaNumberBase;
        if( digit == 0 )  // Compensate for 1-based alphabetic numbers.
          return "";
    
        char c = (char)(AlphaDigitZero + digit);
        digits.Add( c );
        number /= AlphaNumberBase;
      }
    
      digits.Reverse( );
      return new string( digits.ToArray( ) );
    }
    
  • As noted by others, it would be useful to be more specific. I don't think you can expect there to be a library that will generate ranges according to any arbitrary order on string you can come up with.

    If you can simply define what the successor of any given string is, then the solutions is quite easy. That is, if you have a successor function S on strings (e.g. with S('3A') = '4A'), then something like the following can be used:

    s = initial_string
    while s != final_string do
      output s
      s = S(s)
    output s
    

    Something I have used in the past to generate all strings of a given length l and with given range b to e of characters, is the following piece of (pseudo-)code. It can be easily adapted to a wide range of variations.

    // initialise s with b at every position
    for i in [0..l) do
      s[i] = b
    done = false
    while not done do
      output s
      j = 0
      // if s[j] is e, reset it to b and "add carry"
      while j < l and s[j] == e do
        s[j] = b
        j = j + 1
        if j == l then
          done = true
      if not done then
        s[j] = s[j] + 1
    

    For example, to start at a specific string you need only the change the initialisation. To set the end you only need to change the behaviour for the inner while to separately handle position l (limiting to the character in the end string on that position and if reached decrementing l).

    Emperor XLII : Just wanted to mention that this is an excellent solution if you consider the problem to be linear (i.e. treating `A1B2` as a single number). I still think the problem is multi-dimensional, but I guess we'll just have to wait for the author to clarify :)
    From mweerden

0 comments:

Post a Comment