Week 6 : Executing code on a DBpedia subset

Hello folks,

This week I used Freebase mapped DBpedia dataset for training the models.

There were a few challenges which came along the way. If you check my blog for Week 5, you would realize that for each triple of Freebase dataset, I found the subject, and for that subject I extracted all the DBpedia relationships for that subject. So this gave a one-to-many output. For example, suppose there was a triple in freebase train set (/m/027rn, /location/country/form_of_government, /m/06cx9). For the entity /m/027rn, I found all the triples from DBpedia. So, one entity, many relationships of DBpedia exists. So the new DBpedia dataset was much larger in size in comparison to its freebase counterpart (Freebase train size=483142 samples, DBpedia train size=4409385 samples).

In order to decrease the dataset to a manageable size to check the models, I removed the triples which contained the following most frequent predicates :

i) http://www.w3.org/1999/02/22-rdf-syntax-ns#type
ii) http://dbpedia.org/ontology/wikiPageWikiLink
iii) http://purl.org/dc/terms/subject
iv) and predicates with prefix: http://dbpedia.org/property/

This reduced the training set to a considerably smaller size (149666 samples)

I did the same for valid and test dataset.

After that I shuffled all the records for all the sets before training, and selected top 15000 records for the valid and test datasets.

I then used these datasets for training TransE, DistMult, and complex.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s