The power of Groovy and Regular expressions

Recently I came across Google Code Jam problem called Alien Language. The idea is to figure out based on a given pattern if work exist in the dictionary or not. Here is the task description from Google Code Jam:


After years of study, scientists at Google Labs have discovered an alien language transmitted from a faraway planet. The alien language is very unique in that every word consists of exactly L lowercase letters. Also, there are exactly D words in this language.

Once the dictionary of all the words in the alien language was built, the next breakthrough was to discover that the aliens have been transmitting messages to Earth for the past decade. Unfortunately, these signals are weakened due to the distance between our two planets and some of the words may be misinterpreted. In order to help them decipher these messages, the scientists have asked you to devise an algorithm that will determine the number of possible interpretations for a given pattern.

A pattern consists of exactly L tokens. Each token is either a single lowercase letter (the scientists are very sure that this is the letter) or a group of unique lowercase letters surrounded by parenthesis ( and ). For example: (ab)d(dc) means the first letter is either a or b, the second letter is definitely d and the last letter is either d or c. Therefore, the pattern (ab)d(dc) can stand for either one of these 4 possibilities: add, adc, bdd, bdc.


The first line of input contains 3 integers, LD and N separated by a space. D lines follow, each containing one word of length L. These are the words that are known to exist in the alien language. N test cases then follow, each on its own line and each consisting of a pattern as described above. You may assume that all known words provided are unique.


For each test case, output

Case #X: K

where X is the test case number, starting from 1, and K indicates how many words in the alien language match the pattern.

One of the way to solve the problem is reasonable amount of time is to use regular expression. Groovy provides excellent support and some syntactic sugar for regular expressions.


import java.util.regex.Matcher
import java.util.regex.Pattern

    File fp=new File('/Volumes/Backup/Users/edvorkin/Downloads/')
    File fout=new File('/Volumes/Backup/Users/edvorkin/Downloads/A-large-practice.out')
    //read params
    def l, d, n , count=0
    // list to hold all worlds (our alien language dictionary)
    def words=[]
    fp.withReader { reader ->
        // groovy shortcut to read first line of numbers into variables
        (l, d, n)= reader.readLine().split(" ")
        (1..Integer.parseInt(d)).each {
        println words
        println "l=$l, d=$d, n=$n"
        fout.withWriter { writer->
        //read each pattern 
        (1.. Integer.parseInt(n)).each {
            searchStr = reader.readLine()
            if (searchStr!=null) {
            // create a regex
            def pattern = ~searchStr
            // test each word for pattern match
            words.each { word->
            if  (pattern.matcher(word).matches()) {
            writer.println("Case #$it: $count")
        }    else {  // case for an empty string in the input file
                writer.println("Case #$it: 0")   

Groovy make is simple to read lines from file and iterate over list objects with .each method and closure as a parameter.
Regular expression matcher is easy to use and program.
Running time of this implementation in my opinion is O(N*D) where N is the number of patterns and D is the number of words in the dictionary.
There are probably other ways of solving this problem.

Submit a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>