Thursday 8 March 2012

Create a list of JAX-RS resources using sed

There are things, you can do with sed and things you simply cannot. And there are things, you can do, but should not. The following script belongs probably to the third category. But it was fun to write and the final result is quite beautiful:

sed -r 's/\r//g;s/^\s*//g;/@Path.*URI_PATH/d;/final.*URI_PATH/{;/".*"/{;s/.*"(.*)".*/\1/;h;};d;};/^@Path[^P]/{;s/.*@Path\("(.*)"\).*/\1/;H;d;};/public/{;g;/\n@/{;s/^([^\n]*)\n?(.*)?\n(@[^\n]*)\n?(.*)?/\3\t\1\4\n\2/;p;};};/\/\*\*/{;g;s/^([^\n]*)\n.*$/\1/;s/$/\n!/;h;d;};/\*\//{;g;s/\n!$//;h;d;};/^\*/{;/^\*\s*(@.*)?$/{;g;s/\n!$//;h;};x;/\n!$/{;s/\n!$//;x;s/^\*\s*/\t\t/;H;g;s/$/\n!/;};x;d;};/@(GET|PUT|POST|DELETE)/{;H;d;};d' *.java

So what is this piece of cryptic code supposed to do? It takes some java-files, scans them for JAX-RS annotations (@Path, @GET, ...) and prints a small overview of all REST-resources, that are defined in these files together with a small description, taken from the java-comments. Take the following example-implementation of a REST-Service:

@Path(BlogPostResource.URI_PATH)
class BlogPostResource {

    public final static String URI_PATH = "/blogposts";

    /**
     * Create a new blog post
     */
    @POST
    public Response postNewPost() {
        //...
    }

    /**
     * Get blog post with given id
     */
    @GET
    @Path("/{id}")
    public Response getPost(@PathParam("id") String id) {
        //...
    }

    /**
     * Update the blogpost with given id.
     * Multiline comments are possible, too.
     * 
     * @param id
     *     the id of the blog post to update
     */
    @PUT
    @PATH("/{id}")
    public Response updatePost(@PathParam("id") String id) {
        //...
    }
}

This is what the script will return:

@POST /blogposts
          Create a new blog poast
@GET  /blogposts/{id}
          Get blog post with given id
@PUT  /blogposts/{id}
          Update the blogpost with given id.
          Multiline comments are possible, too.

There are some assumptions to be made for the script to work
correctly:

  1. The base-path of each class implementing a rest-resource must be red in a static class-variable "URI_PATH". This variable must be defined before any other annotated methods.
  2. The @GET (resp @POST, @PUT, @DELETE) annotations must always be above the @Path annotation.
  3. Every rest-implementing method must have a javadoc-comment starting with /** (even if it is empty).
  4. The javadoc comment must always come before the annotations

This said, here is the less cryptic (and commented) code of the sed-script:

sed -r '
# before we start, remove all carriage-returns (i really hate these)
s/\r//g

# also remove whitespace at the beginning of lines
s/^\s*//g

# the @Path annotation of the class itself is always used in
# conjunction with the static class-variable URI_PATH, which we catch
# separately, so we can safely ignore this line
/@Path.*URI_PATH/ d 

# the static class-variable holds the global path to this resource. we
# put it in hold-space. this path will stay in hold space (more exactly
# in the first line of the hold space.
/final.*URI_PATH/ {
    /".*"/ {
        s/.*"(.*)".*/\1/
        h
    }
    d
}

# this is a path-annotation of a method receiving rest-requests. the
# path specified is appended to the hold space (the [^P] is to not get
# confused by lines starting with @PathParam.
/^@Path[^P]/ {
    s/.*@Path\("(.*)"\).*/\1/
    H
    d
}

# a new public method (or field) is declared in this line, so we have to
# check if the hold space content is
# describing a rest-resource. this is the case if some line starting
# with a @ is found (we assume, that no comments start with a @)
/public/ {
    g
    /\n@/ {
        # now we have to print a description of the rest-resource
        # the first line in hold space is the base path (from the
        # @Path annotation of the class itself). the next lines are the
        # description from the comments. the line starting with an @ is the
        # http-method-specifier (for example @GET). and after this
        # line comes the path specified by the @Path-annotation for the
        # method (which is not required for all methods, therefor we put a ?
        # to this part). unfortunately, there exist also methods
        # without a describing comment, so the second part gets a ?, too
        s/^([^\n]*)\n?(.*)?\n(@[^\n]*)\n?(.*)?/\3\t\1\4\n\2/
        p
    }
}

# a new comment starts ...
/\/\*\*/ {
    # in any case, we have to clean up the hold space. only the
    # first line may remain
    g
    s/^([^\n]*)\n.*$/\1/

    # now we add a marker to the hold space, to tell the methods
    # parsing the comments, that we are at the beginning of a comment
    s/$/\n!/
    
    # put everything in the hold-space and delete the pattern space,
    # so that subsequent checks are not disturbed
    h
    d
}

# when we reach the end of a comment, the marker "!" must be removed
/\*\// {
    g
    s/\n!$//
    h
    d
}

# here we are inside a comment
/^\*/{
    # if this comment-line is empty or starting with a @ (which means
    # we are inside the parameter-description part), and the "!"
    # marker is still in the hold space, we remove the marker
    /^\*\s*(@.*)?$/ {
        g
        s/\n!$//
        h
    }

    # now we check if the last character in the hold-space is a
    # "!". this tells us, if we are at the beginning of a (perhaps
    # multiline-)comment, or if we are already in the part describing
    # parameters (which is of no interest to us)
    x
    /\n!$/ {
        # first, we remove the "!" character. it will be re-added,
        # when we are finished reading this comment-line
        s/\n!$//
         
        # next we refetch the current pattern line (which we exchanged
        # with hold-space before)
        x
 
        # remove the leading stars and whitespaces
        s/^\*\s*/\t\t/

        # then add the line to hold space
        H
 
        # and finally re-add our marker "!"
        g
        s/$/\n!/
    }

    # before we finish, we have to re-exchange hold- and pattern
    # space, because we swapped them before the \n! check
    x
    d
}

# this is an easy part: if a http-method annotation is present, push
# it to the hold-space
/@(GET|PUT|POST|DELETE)/ {
    H
    d
}

# we finally delete all lines, so that they are not printed by sed in
#default mode (if the script is called with "sed -n", this is not needed)
d' *.java

By the way, this is the command, i used to create the cryptic version out of the  rather lengthy commented one:

sed -n '/#/ d; s/\s*//g; /^$/ d; 1h; 1!H; ${g;s/\n/;/g;p}'