Google Summer of Code

The following page is a blog about my Google Summer of Code Project 2019, under DBpedia. My project is "Tool to generate RDF triples from DBpedia abstract".

Week - 10 :- Building the baseline

04 Aug 2019 » GSoC

In this week, we focused on working to create a baseline for our created model.

Additional Configuration

The additional confiugurations added to the web applications, now allows users to set a pre-defined configuration for the web-app. This can be modified to override, hearst patterns and DBpedia property lexicalisation. The configuration for the web application is now stored by the application in a json.

image

{
   "use_parse_tree":null,
   "use_dependencies":null,
   "dependency_level":null,
   "use_dependencies_with_coref":"",
   "use_existing_hearst":null,
   "addn_hearst_patterns":[
      [
         "NP_(\\w+).*?(broadcast).*?on.*?.*?NP_(\\w+)",
         " first",
         "broadcastOn",
         3
      ],
      [
         "NP_(\\w+).*?(air).*?on.*?.*?NP_(\\w+)",
         " first",
         "airOn",
         3
      ]
   ],
   "hearst_pattern_type":"Default",
   "language":"English",
   "addn":{
      "broadcast_on":" http://dbpedia.org/ontology/frequency",
      "air_on":" http://dbpedia.org/ontology/channel"
   },
   "use_spotlight":null
}

This can be changed by setting the configuration in the config page. The web application can then be run from the main/demo page of the website.

Building the baseline

Using a set of configurations we define a baseline for our model. The configurations include the following ones

  1. hearst patterns method
  2. Parse-tree method
  3. Dependencies method.

The baseline was made by using text snippets from the following file.

These were then fed into our web-app with the following configurations. A baseline generated from the following can be found here. A small snippet of the following results created.

A. Spectre
1. (spectre, hypernym, film); (spectre, endIn, SkyFall)
2. (Spectre, produced, James); (Daniel, marking, Daniel); (245 million dollars, madeBy, Bond); (Metro-Goldwyn-Mayer, distributed, Metro-Goldwyn-Mayer)
3. (Waltz, hypernym (low), film); (film, hypernym (low), SkyFall); (film, producedBy, Eon Productions); (film, mark_into, series); (film, hypernym, Spectre)

Using this we can improve upon our results

Improvement to the Dependencies Method.

The dependencies method, can be improved to capture the subject and object better. These include using them to find the complete entity instead a part of it and also including better hypernym detection. This will be worked on during the current week.