This week is the final week before First Evaluation between 26th June to 30th June. So I took all the experiment results together and do some analysis for taking the next steps. Earlier I had noted MRR and Hits@10 metric only. But I decided to expand the table to consider Hits@1 and Hits@3 results as well for a better analysis. I also noted the training time for each model.
There were some missing pieces from the previous weeks. As I had mentioned in Week 1 post, the GitHub code for HolE uses a bin file as input. The WN11 bin file was given by th authors, but for FB15K, the bin file wasn’t there. So I had to prepare the bin (compressed) version of FB15K for running the code. I used a python code to unpickled and check the contents of the WN18.bin file. Luckily there was nothing much complicated, and I wrote a simple python code data2bin.py to convert the standard benchmark FB15K dataset (which can be found in our Wiki resource page) to FB15K.bin file which the code requires, and put the code to execution.
The experiments revealed that the results are quite consistent with those mentioned in the research papers. However, the performance of HolE and Complex were very close, as can be seen from the table. In fact, HolE took a significantly lesser training time as compared to Complex even though the order of time complexity is more for HolE. One of the reasons can be that I ran Complex on a CPU. The authors have suggested to run on a GPU which they have mentioned would speed up the training 5X times.
The order of space and time complexity has been written from the Table 1 of the research paper by authors of Complex model.
Where do we stand right now?
The accomplishments before First phase of evaluations are as follows :
1) Selection of models : After running and checking many codes, I have picked 4 different working codes for evaluations. All these four models are used in most of the recent research papers for comparisons. So this is a great feat which is achieved.
2) Input format working : Standard format of the benchmark dataset can be easily used in all the four training codes. Now we can develop DBpedia dataset in this same format, and now we don’t need to worry about the compatibility of the dataset with these 4 model codes.
3) Training time noted : The time required for training these models have been noted. This would be crucial in deciding our next steps, estimate training time and scalability for DBpedia dataset.
Where are we headed?
1) Since the codes and setup for these 4 models are complete, we can now shift our focus to evaluating these approaches on the DBpedia dataset, and check if these models work for DBpedia as well.
2) If we come up with a new model in future or someone from DBpedia community makes a new KB embeddings approach for DBpedia, it would become absolute necessary to compare with the existing approaches to make the claim. So the tasks done in these weeks would be certainly helpful for the entire DBpedia community.