Google Summer of Code

The following page is a blog about my Google Summer of Code Project 2019, under DBpedia. My project is "Tool to generate RDF triples from DBpedia abstract".

Week - 5 :- Phase-1 Evaluation

30 Jun 2019 » GSoC

The last week concluded the First term of the GSoC coding phase, which I have successfully passed :) ! This week I got to work on developing the final product - A web application and pipelining DBpedia’s spotlight.

DBpedia spotlight

Although extracting triplets is our goal, producing dbpedia annotated triplets, is what our final product must be able to do. This would mean annotating the subject, the object and converting our predicate into a certain property. For now we deal with the annotations of the subject and predicate. We thus create our pipeline class and add this feature into our triplet extraction process.

class Spotlight_Pipeline(object):

    def __init__(self):
        self.spotlight_config = spotlight.Config()
        self.spotlight_address = self.spotlight_config.spotlight_address

    def read_annotations(self, annotations):
        return [ i['URI'] for i in annotations ]

    def annotate_word(self, word):
        try:
            annotations = spotlight.annotate(self.spotlight_address,
                                        word)
            return self.read_annotations(annotations)
        except spotlight.SpotlightException:
            print("URI not found")
            return word
spipe = Spotlight_Pipeline()
.
.
triple = textraction.treebank(sentence)
annotated_triple = ((spipe.annotate_word(triple[0][0]), triple[0][1]), triple[1], (spipe.annotate_word(triple[2][0]),     triple[2][1]))

A sample of annotated texts from this pipeline can be found at this link

The WebApp

This is final product which we will be developing. The link to the web-app can be found here. The app can be very easily be deployed onto heroku.

The application is being run using python-flask on the backend and HTML, JS on the frontend. Key things the web-application should be able to do

  • take in a descent size text-input
  • Be able to extract triplets and annotate them as well (as DBpedia properties), this would require calls to the spotlight API as well.
  • The triplet extraction method may also have to make calls to a stanford CORENLP server for annotation.
  • finally display these triplets with a confidence value.