Week - 5 :- Phase-1 Evaluation

The last week concluded the First term of the GSoC coding phase, which I have successfully passed :) ! This week I got to work on developing the final product - A web application and pipelining DBpedia’s spotlight.

DBpedia spotlight

Although extracting triplets is our goal, producing dbpedia annotated triplets, is what our final product must be able to do. This would mean annotating the subject, the object and converting our predicate into a certain property. For now we deal with the annotations of the subject and predicate. We thus create our pipeline class and add this feature into our triplet extraction process.

class Spotlight_Pipeline(object):

    def __init__(self):
        self.spotlight_config = spotlight.Config()
        self.spotlight_address = self.spotlight_config.spotlight_address

    def read_annotations(self, annotations):
        return [ i['URI'] for i in annotations ]

    def annotate_word(self, word):
        try:
            annotations = spotlight.annotate(self.spotlight_address,
                                        word)
            return self.read_annotations(annotations)
        except spotlight.SpotlightException:
            print("URI not found")
            return word

spipe = Spotlight_Pipeline()
.
.
triple = textraction.treebank(sentence)
annotated_triple = ((spipe.annotate_word(triple[0][0]), triple[0][1]), triple[1], (spipe.annotate_word(triple[2][0]),     triple[2][1]))

A sample of annotated texts from this pipeline can be found at this link

The WebApp

This is final product which we will be developing. The link to the web-app can be found here. The app can be very easily be deployed onto heroku.

The application is being run using python-flask on the backend and HTML, JS on the frontend. Key things the web-application should be able to do

take in a descent size text-input
Be able to extract triplets and annotate them as well (as DBpedia properties), this would require calls to the spotlight API as well.
The triplet extraction method may also have to make calls to a stanford CORENLP server for annotation.
finally display these triplets with a confidence value.

Google Summer of Code

Week - 5 :- Phase-1 Evaluation

DBpedia spotlight

The WebApp

Related Posts