View the Project on GitHub



Main Documentation

About Hearst Patterns

Use Cases

Here we talk about various use cases of the web application and the general effects of some configuration changes.

The example text being used for testing can be found here - https://worksheets.codalab.org/rest/bundles/0xb4ab264671fe4e3bae00e9367a88eaeb/contents/blob/

Using Default Hearst Patterns

Using defined hearst patterns are a set of hearst patterns which have already been internally stored into the web application. Using these hearst patterns simply means Using extra hearst patterns, which should improve the results which are achieved by the two. Take the following text for example

New York—often called New York City or the City of 
New York to distinguish it from the State of
New York, of which it is a part—is the most populous
city in the United States and the center of the New York
metropolitan area, the premier gateway for legal
immigration to the United States and one of the most
populous urban agglomerations in the world. A global
power city, New York exerts a significant impact upon
commerce, finance, media, art, fashion, research,
technology, education, and entertainment, its fast pace
defining the term New York minute. Home to the
headquarters of the United Nations, New York is an
important center for international diplomacy and has
been described as the cultural and financial capital of
the world.

Running these on the following two configurations, we get the following resulst (RDF) :-

Without Using Predefined Hearst Patterns

We get no triples

Using Predefined Hearst Patterns

@prefix ns1: <http://predicateProperty.org/> . @prefix ns2: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <http://dbpedia.org/resource/United_and_uniting_churches> a ns2:Document ; ns1:typeOf <http://dbpedia.org...> . ns1:attribute a rdf:Property ; ns2:name "attribute" . ns1:typeOf a rdf:Property ; ns2:name "typeOf" . <http://dbpedia.org/resource/List_...> a ns2:Document . [] a ns2:Document ; ns1:attribute [ a ns2:Document ; ns2:name "center" ] ; ns2:name "headquarters" . [] a ns2:Document ; ns1:attribute [ a ns2:Document ; ns2:name "center" ] ; ns2:name "headquarters" . [] a ns2:Document ; ns1:attribute [ a ns2:Document ; ns2:name "part" ] ; ns2:name "new_york" . [] a ns2:Document ; ns1:attribute [ a ns2:Document ; ns2:name "part" ] ; ns2:name "new_york" .

We thus get better results by doing so.

Adding additional Hearst Patterns

Adding hearst patterns just like the above process, helps in increasing the chances of extracting triples from the following text. Although this cannot be guaranteed everytime, since it's dependent in the text given, the chances do increase significantly.

Using Dependencies

Dependencies information triples extraction methods relies on the dependenices obtained from the text in order to be able to extract triples. Since unlike Hearst Patterns, the following process does not depend explicitely or directly on the sentence structute, a generalized algorithm can be written for the following. Use the above text, we run the app on two configurations.

  • Predefined Hearst Patterns, Spotlight
  • Predefined Hearst Patterns, Dependencies, Spotlight
  • The triples obtained from the following

    Predefined Hearst Patterns

    @prefix ns1: <http://predicateProperty.org/> . @prefix ns2: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <http://dbpedia.org/resource/United_and_uniting_churches> a ns2:Document ; ns1:typeOf <http://dbpedia.org...> . ns1:attribute a rdf:Property ; ns2:name "attribute" . ns1:typeOf a rdf:Property ; ns2:name "typeOf" . <http://dbpedia.org/resource/List_...> a ns2:Document . [] a ns2:Document ; ns1:attribute [ a ns2:Document ; ns2:name "center" ] ; ns2:name "headquarters" . [] a ns2:Document ; ns1:attribute [ a ns2:Document ; ns2:name "center" ] ; ns2:name "headquarters" . [] a ns2:Document ; ns1:attribute [ a ns2:Document ; ns2:name "part" ] ; ns2:name "new_york" . [] a ns2:Document ; ns1:attribute [ a ns2:Document ; ns2:name "part" ] ; ns2:name "new_york" .

    Predefined Hearst Patterns + Dependencies

    @prefix ns1: <http://xmlns.com/foaf/0.1/> . @prefix ns2: <http://predicateProperty.org/> . @prefix ns3: <http://purl.org/linguistics/gold/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <http://dbpedia.org/resource/City> a ns1:Document ; ns2:hypernym_low_confidence <http://dbpedia.org/resource/United_States> ; ns3:hypernym <http://dbpedia.org/resource/New_York> . <http://dbpedia.org/resource/Milecastle> a ns1:Document ; ns2:hypernym_low_confidence <http://dbpedia.org/resource/Immigration> . <http://dbpedia.org/resource/New_York_City> a ns1:Document ; ns2:hypernym_low_confidence <http://dbpedia.org/resource/New_York> . <http://dbpedia.org/resource/U.S._state> a ns1:Document ; ns2:hypernym_low_confidence <http://dbpedia.org/resource/New_York> . <http://dbpedia.org/resource/United_and_uniting_churches> a ns1:.. ; ns2:typeOf <http://dbpedia.org/resource/List_of_metropolitan_.> . <http://dbpedia.org/resource/Urban_area> a ns1:Document ; ns2:hypernym_low_confidence [ a ns1:Document ; ns1:name "world" ] . ns2:attribute a rdf:Property ; ns1:name "attribute" . ns2:describ_as a rdf:Property ; ns1:name "describ_as" . ns2:hypernym_low_confidence a rdf:Property ; ns1:name "hypernym_low_confidence" . ns2:typeOf a rdf:Property ; ns1:name "typeOf" . ns3:hypernym a ns1:Property . <http://dbpedia.org/resource/Area> a ns1:Document . <http://dbpedia.org/resource/Capital_city> a ns1:Document ; ns2:hypernym_low_confidence [ a ns1:Document ; ns1:name "world" ] . <http://dbpedia.org/resource/Diplomacy> a ns1:Document . <http://dbpedia.org/resource/Home> a ns1:Document ; ns2:hypernym_low_confidence [ a ns1:Document ; ns1:name "headquarters" ] . <http://dbpedia.org/resource/Immigration> a ns1:Document ; ns2:hypernym_low_confidence <http://dbpedia.org/resource/United_States> . <http://dbpedia.org/resource/List_of_metropolitan_areas_in_Pakistan> a .. . <http://dbpedia.org/resource/United_Nations> a ns1:Document . <http://dbpedia.org/resource/United_States> a ns1:Document . <http://dbpedia.org/resource/New_York> a ns1:Document . [] a ns1:Document ; ns2:describ_as <http://dbpedia.org/resource/Capital_city> ; ns1:name "center" . [] a ns1:Document ; ns2:hypernym_low_confidence <http://dbpedia.org/resource/Area> ; ns1:name "center" . [] a ns1:Document ; ns2:attribute [ a ns1:Document ; ns1:name "center" ] ; ns1:name "headquarters" . [] a ns1:Document ; ns2:hypernym_low_confidence <http://dbpedia.org/resource/United_Nations> ; ns1:name "headquarters" . [] a ns1:Document ; ns2:attribute [ a ns1:Document ; ns1:name "part" ] ; ns1:name "new_york" . [] a ns1:Document ; ns2:hypernym_low_confidence <http://dbpedia.org/resource/Diplomacy> ; ns1:name "center" . [] a ns1:Document ; ns2:attribute [ a ns1:Document ; ns1:name "center" ] ; ns1:name "headquarters" . [] a ns1:Document ; ns3:hypernym <http://dbpedia.org/resource/Home> ; ns1:name "center" . [] a ns1:Document ; ns2:attribute [ a ns1:Document ; ns1:name "part" ] ; ns1:name "new_york" .

    As demonstrated from above, the number of triples extracted increase drastically. Thus the above method is highly useful

    Using Lexicalizations

    Lexicalizations mainluy determine the property for RDF generated. Since RDF must include a valid resource for predicates lexicalizations, unlike the subject and object, there is a key role in doing so. In the above text, using dependencies method if we were to add lexicalizations for few properties such as

  • describ_ad -> http://xyz.org/ontology/Description
  • The following RDF becomes

    ns2:describ_as a rdf:Property ; ns1:name "describ_as" . . . . <http://xyz.org/ontology/Description> a rdf:Property