In this week the dependencies method was improved on and the RDF generation tool was added.
Dependencies Method
Given the results of the baseline test, the dependencies method was improved on in the following week, by better identifying the subjects and objects of the triples. Hypernyms too were checked for validity while processing it. Entites extracted from the text were used for the process
def get_entites(self, sentence):
doc = nlp(sentence)
return [ ent.text for ent in doc.ents ]
def tripletsEntityCheck(self, triplets, sentence):
"""
Replaces main words with the entire entites they reperesent
"""
entity_triples = list()
entities = self.get_entites(sentence)
for triple in triplets:
triple = list(triple)
for entity in entities:
if triple[0] in entity:
triple[0] = entity
break
for entity in entities:
if triple[2] in entity:
triple[2] = entity
break
if triple[0] != triple[2]:
entity_triples.append(triple)
return entity_triples
Dependencies with coreferences too has been added as a method for the configuration page.
Configuration page edits
The configuration pages now has two input text boxes. One for adding hearst patterns and the second one for adding dbpedia properties and lexicalizations. These inputs can be used to override the already existent ones. The about
page was also updated to add context about the hearst patterns, the pre-defined hearst patterns and the DBpedia properties/lexicalizations.
RDF text
The web-app now allows you to get the RDF text for the triples being generated. This option can be found at the bottom of the results page only if spotlight is enabled in the configuration. Python’s rdflib was used for the following. The library was helpful in serializing the text
def writeRDFtoFile(annotated_triples, triples, destination):
"""
returns string with the triples in the rdf 'turtle' format. This can be written to file
"""
rdfGraph = Graph()
for i in range(len(triples)):
subject = None
predicate = None
obj = None
"""
subject checking
"""
if annotated_triples[i][0]:
subject = URIRef(annotated_triples[i][0][0])
else:
subject = BNode()
rdfGraph.add( (subject, FOAF.name, Literal(triples[i][0])) )
.
.
.
rdfGraph.serialize(format='turtle', destination=destination)