Extracting Semantic Concept Relations from Wikipedia .

P. Arnold and E. Rahm (Leipzig University), WIMS 2014

Summary: A.Aziz Altowayan (Feb. 2015)

Talk structure

Work Overview [15mins]

Work Details [35mins]

Break [5-10mins]

Demo and Hands-on [30mins]

Problem

Given an arbitrary sentence, e.g.:

Ice skates are boots with blades attached to it.

How to identify the semantic relations and extract them into a structured representation?

Something like:

Paper's solution

Given an arbitrary sentence, e.g.:

Ice skates are boots with blades attached to it

Basic idea

is-a pattern:

Semantic Patterns

Semantic relation pattern

.. a specific word pattern that expresses a linguistic relation of a certain type.

They focused on 4 types of semantic relations:

Example of semantic relations kinds (desired results):

Example semantic patterns

Now, details

Background & Motivation

Consider this scenario:

Two ontologies for representing a vehicle ...

Another example, two ontologies representing CS department: (Doan et. al, 2004)

Background Knowledge is crucial for tasks like:

Approach workflow (presented work)

4 sub-setps to be automatically performed on each Wikipedia's article:

- for each article
                    1. Preprocessing (extracting definition, tagging, simplifying)
                - for each sentence
                    2. Identify all semantic patterns n
                        if n >= 1 then sentence fragments = n + 1
                - for each fragment
                    3. Concept (term) extraction
                - with terms and patterns
              4. Build the respective relation and store

FSM Review

Arbitrary example for a Finite State Machine (FSM)

Formally, FSM is:

A 5-tuple of (Q, q0, A, , δ)

Where, 

Q: a finite set of states

q0: initial state(s)

A: accepting state(s)

: input alphabet

δ: transition function, mapping of Q x into Q

Applying FSM for ..

Identifying Semantic Relations Patterns

Example: For the input sentence:

  Ice skates are boots with blades attached to it.

after preprocessing step we get the tagged sentence:

  Ice_NN skates_NNS are_VBP boots_NNS with_IN blades_NNS attached_VBN to_TO it_PRP.

e.g. a simplified version FSM for the is-a pattern:

Applying FSM for ..

Parsing Fragments to identify the relevant concepts (terms)

Usually, the nouns directly left/right from relation pattern. However, that can be too simple with

”A wardrobe, also known as an armoire from the French, is a standing closet.” ==> (French is a closet)

”Column or pillar in architecture and structural engineering is a structural element.” ==> (architecture and structural engineering are structural elements)

Run in different configurations (with, replaced, removed)

With all that in mind, the FSM for parsing the Subject fragment alone contains:

Determine Semantic Relations

Build the semantic realtionships, where the output is a set of (1:1)-relationships.

Let

|S| number of subjects

|O1| number of objects

|O2| number of second-level objects

|F| number of field references

then,

Evaluation

Using the classic Recall and Precision Measures

Recall: fraction of relevant retrieved instances.

Precision: correct fraction of retrieved instances.

With 4 subsets of benchmark datasets 1 2 3 4

Evaluation

FSM Parsing effectiveness (pattern detection)

Term Extraction effectiveness

Final results of the approach

Conclusion

Overvations

Correct parsing: 74%-94%

70% recall and 79% precision (on the 4 benchmark).

In future

Build up a repo & use it for ontology matching

Expand to more sources i.e. Wikitionary

Q/A

Questions

. .

Demo and Hands-on

solution in pdf

Input:

Pace is a University.

Automobile, autocar, motor car, or car is a vehicle with wheels.

A wardrobe, also known as an armoire from the French, is a standing closet.

Finite State Machine, or and , are ignored

Finite State Machine, or and , are ignored

Applying the FSM in Fig.1 on each sentence of the input:

Q1) What is the corresponding FSM path?

  1. q0 --> q1 --> q1 --> q3

  2. q0 --> q0 --> q0 --> q0 --> q1 --> q1 --> q2 --> q2 --> q3

  3. REJECT (also is an ADVERB, thanks Sandra!)

Q2) How many semantic patterns n and sentence fragments can be identified?

  1. 1 pattern (is a), thus 2 sentence fragments.

  2. 2 patterns (is a and with), thus 3 sentence fragments. Synonyms patterns were not considered in this simple FSM.

  3. NONE

Q3) If, R = ${S\choose 2}$ + (S * O1) + (S * O2) + (S * F) How many Semantic Relations R can be identified for each input sentence?

  1. R = ${1\choose 2}$ + (1 * 1) + (1 * 0) + (1 * 0) = 1

  2. Applying the paper's approach S would be = 4, so R = ${4\choose 2}$ + (4 * 1) + (4 * 1) + (1 * 0) = 6 + 4 + 4 + 0 = 14

However, since we didn't consider the synonyms pattern in the above FSM, S would be = 1, so R = 0 + 1 + 1 + 0 = 2

  1. 0