This week I used Freebase mapped DBpedia dataset for training the models.
There were a few challenges which came along the way. If you check my blog for Week 5, you would realize that for each triple of Freebase dataset, I found the subject, and for that subject I extracted all the DBpedia relationships for that subject. So this gave a one-to-many output. For example, suppose there was a triple in freebase train set (/m/027rn, /location/country/form_of_government, /m/06cx9). For the entity /m/027rn, I found all the triples from DBpedia. So, one entity, many relationships of DBpedia exists. So the new DBpedia dataset was much larger in size in comparison to its freebase counterpart (Freebase train size=483142 samples, DBpedia train size=4409385 samples).
In order to decrease the dataset to a manageable size to check the models, I removed the triples which contained the following most frequent predicates :
This reduced the training set to a considerably smaller size (149666 samples)
I did the same for valid and test dataset.
After that I shuffled all the records for all the sets before training, and selected top 15000 records for the valid and test datasets.
I then used these datasets for training TransE, DistMult, and complex.