Wednesday, March 16, 2011

Search for a regexp in a java arraylist

ArrayList <String> list = new ArrayList(); 
list.add("behold");
list.add("bend");
list.add("bet");
list.add("bear");
list.add("beat");
list.add("become");
list.add("begin");

There is a way to search for the regexp bea.* and get the indexes like in ArrayList.indexOf ?

EDIT: returning the items is fine but I need something with more performance than a Linear search

From stackoverflow
  • I do not believe there is a Java API way of doing this, nor is there a Apache Commons way of doing this. It would not be difficult to roll your own however.

  • Is there a built-in method? Not that I know of. However, it should be rather easy to do it yourself. Here's some completely untested code that should give you the basic idea:

    import java.util.regex.Pattern;
    import java.util.ListIterator;
    import java.util.ArrayList;
    
    /**
     * Finds the index of all entries in the list that matches the regex
     * @param list The list of strings to check
     * @param regex The regular expression to use
     * @return list containing the indexes of all matching entries
     */
    List<int> getMatchingIndexes(List<String> list, String regex) {
      ListIterator<String> li = list.listIterator();
    
      List<int> indexes = new ArrayList<int>();
    
      while(li.hasNext()) {
        int i = li.nextIndex();
        String next = li.next();
        if(Pattern.matches(regex, next)) {
          indexes.add(i);
        }
      }
    
      return indexes
    }
    

    I might have the usage of Pattern and ListIterator parts a bit wrong (I've never used either), but that should give the basic idea. You could also do a simple for loop instead of the while loop over the iterator.

    : Personally, I think that api methods should take arguments and return values of the most abstract type possible. Hence, my pedantic correction of your answer would be: public List getMatchingIndices(Listlist, String regex){..}
    Herms : Good point. I was just throwing it together really quick and didn't pay as much attention to that stuff.
    Alan Moore : FYI, is not a valid type parameter. You would have to make it a List. Also, when you use a regex in a loop like this, you should compile it into a Pattern object before entering the loop, like DJClayworth did.
    Herms : Hmm, I thought autoboxing took care of the int->Integer thing. Oh well. Like I said, it was untested and quickly thrown together :)
  • One option is to use Apache Commons CollectionUtils "select" method. You would need to create a Predicate object (an object with a single "evaluate" method that uses the regular expression to check for a match and return true or false) and then you can search for items in the list that match. However, it won't return the indexes, it will return a collection containing the items themselves.

  • Herms got the basics right. If you want the Strings and not the indexes then you can improve by using the Java 5 foreach loop:

    import java.util.regex.Pattern;
    import java.util.ListIterator;
    import java.util.ArrayList;
    
    /**
     * Finds the index of all entries in the list that matches the regex
     * @param list The list of strings to check
     * @param regex The regular expression to use
     * @return list containing the indexes of all matching entries
     */
    List<String> getMatchingStrings(List<String> list, String regex) {
    
      ArrayList<String> matches = new ArrayList<String>();
    
      Pattern p = Pattern.compile(regex);
    
      for (String s:list) {
        if (p.matcher(s).matches()) {
          matches.add(s);
        }
      }
    
      return matches
    }
    
    Herms : I thought of returning the actual matching strings, but the question specifically asked for the indicies. Returning the matching strings is generally a bit cleaner though.

0 comments:

Post a Comment