Use Cases
Here we talk about various use cases of the web application and the general effects of some configuration changes.
The example text being used for testing can be found here - https://worksheets.codalab.org/rest/bundles/0xb4ab264671fe4e3bae00e9367a88eaeb/contents/blob/
Using Default Hearst Patterns
Using defined hearst patterns are a set of hearst patterns which have already been internally stored into the web application. Using these hearst patterns simply means Using
extra hearst patterns, which should improve the results which are achieved by the two. Take the following text for example
New York—often called New York City or the City of
New York to distinguish it from the State of
New York, of which it is a part—is the most populous
city in the United States and the center of the New York
metropolitan area, the premier gateway for legal
immigration to the United States and one of the most
populous urban agglomerations in the world. A global
power city, New York exerts a significant impact upon
commerce, finance, media, art, fashion, research,
technology, education, and entertainment, its fast pace
defining the term New York minute. Home to the
headquarters of the United Nations, New York is an
important center for international diplomacy and has
been described as the cultural and financial capital of
the world.
Running these on the following two configurations, we get the following resulst (RDF) :-
Without Using Predefined Hearst Patterns
We get no triples
Using Predefined Hearst Patterns
@prefix ns1: .
@prefix ns2: .
@prefix rdf: .
@prefix rdfs: .
@prefix xml: .
@prefix xsd: .
a ns2:Document ;
ns1:typeOf .
ns1:attribute a rdf:Property ;
ns2:name "attribute" .
ns1:typeOf a rdf:Property ;
ns2:name "typeOf" .
a ns2:Document .
[] a ns2:Document ;
ns1:attribute [ a ns2:Document ;
ns2:name "center" ] ;
ns2:name "headquarters" .
[] a ns2:Document ;
ns1:attribute [ a ns2:Document ;
ns2:name "center" ] ;
ns2:name "headquarters" .
[] a ns2:Document ;
ns1:attribute [ a ns2:Document ;
ns2:name "part" ] ;
ns2:name "new_york" .
[] a ns2:Document ;
ns1:attribute [ a ns2:Document ;
ns2:name "part" ] ;
ns2:name "new_york" .
We thus get better results by doing so.
Adding additional Hearst Patterns
Adding hearst patterns just like the above process, helps in increasing the chances of extracting triples from the following text.
Although this cannot be guaranteed everytime, since it's dependent in the text given, the chances do increase significantly.
Using Dependencies
Dependencies information triples extraction methods relies on the dependenices obtained from the text in order to be able to
extract triples. Since unlike Hearst Patterns, the following process does not depend explicitely or directly on the sentence structute, a
generalized algorithm can be written for the following. Use the above text, we run the app on two configurations.
Predefined Hearst Patterns, Spotlight
Predefined Hearst Patterns, Dependencies, Spotlight
The triples obtained from the following
Predefined Hearst Patterns
@prefix ns1: .
@prefix ns2: .
@prefix rdf: .
@prefix rdfs: .
@prefix xml: .
@prefix xsd: .
a ns2:Document ;
ns1:typeOf .
ns1:attribute a rdf:Property ;
ns2:name "attribute" .
ns1:typeOf a rdf:Property ;
ns2:name "typeOf" .
a ns2:Document .
[] a ns2:Document ;
ns1:attribute [ a ns2:Document ;
ns2:name "center" ] ;
ns2:name "headquarters" .
[] a ns2:Document ;
ns1:attribute [ a ns2:Document ;
ns2:name "center" ] ;
ns2:name "headquarters" .
[] a ns2:Document ;
ns1:attribute [ a ns2:Document ;
ns2:name "part" ] ;
ns2:name "new_york" .
[] a ns2:Document ;
ns1:attribute [ a ns2:Document ;
ns2:name "part" ] ;
ns2:name "new_york" .
Predefined Hearst Patterns + Dependencies
@prefix ns1: .
@prefix ns2: .
@prefix ns3: .
@prefix rdf: .
@prefix rdfs: .
@prefix xml: .
@prefix xsd: .
a ns1:Document ;
ns2:hypernym_low_confidence ;
ns3:hypernym .
a ns1:Document ;
ns2:hypernym_low_confidence .
a ns1:Document ;
ns2:hypernym_low_confidence .
a ns1:Document ;
ns2:hypernym_low_confidence .
a ns1:.. ;
ns2:typeOf .
a ns1:Document ;
ns2:hypernym_low_confidence [ a ns1:Document ;
ns1:name "world" ] .
ns2:attribute a rdf:Property ;
ns1:name "attribute" .
ns2:describ_as a rdf:Property ;
ns1:name "describ_as" .
ns2:hypernym_low_confidence a rdf:Property ;
ns1:name "hypernym_low_confidence" .
ns2:typeOf a rdf:Property ;
ns1:name "typeOf" .
ns3:hypernym a ns1:Property .
a ns1:Document .
a ns1:Document ;
ns2:hypernym_low_confidence [ a ns1:Document ;
ns1:name "world" ] .
a ns1:Document .
a ns1:Document ;
ns2:hypernym_low_confidence [ a ns1:Document ;
ns1:name "headquarters" ] .
a ns1:Document ;
ns2:hypernym_low_confidence .
a .. .
a ns1:Document .
a ns1:Document .
a ns1:Document .
[] a ns1:Document ;
ns2:describ_as ;
ns1:name "center" .
[] a ns1:Document ;
ns2:hypernym_low_confidence ;
ns1:name "center" .
[] a ns1:Document ;
ns2:attribute [ a ns1:Document ;
ns1:name "center" ] ;
ns1:name "headquarters" .
[] a ns1:Document ;
ns2:hypernym_low_confidence ;
ns1:name "headquarters" .
[] a ns1:Document ;
ns2:attribute [ a ns1:Document ;
ns1:name "part" ] ;
ns1:name "new_york" .
[] a ns1:Document ;
ns2:hypernym_low_confidence ;
ns1:name "center" .
[] a ns1:Document ;
ns2:attribute [ a ns1:Document ;
ns1:name "center" ] ;
ns1:name "headquarters" .
[] a ns1:Document ;
ns3:hypernym ;
ns1:name "center" .
[] a ns1:Document ;
ns2:attribute [ a ns1:Document ;
ns1:name "part" ] ;
ns1:name "new_york" .
As demonstrated from above, the number of triples extracted increase drastically. Thus the above method is highly
useful
Using Lexicalizations
Lexicalizations mainluy determine the property for RDF generated. Since RDF must include a valid resource for
predicates lexicalizations, unlike the subject and object, there is a key role in doing so. In the above text, using dependencies
method if we were to add lexicalizations for few properties such as
describ_ad -> http://xyz.org/ontology/Description
The following RDF becomes
ns2:describ_as a rdf:Property ;
ns1:name "describ_as" .
.
.
.
a rdf:Property