GSoC 2017 : Knowledge base embeddings for DBpedia.

Hello folks,

I am very excited to share the news of my acceptance for the participation at Google Summer of Code (GSoC) for Summer 2017.

GSoC is an annual global program hosted by Google where students across various countries remotely work/intern for open source organizations for a period of 3 months during summer break under the guidance of expert mentors. GSoC 2017 is seeing the largest participation till date with 1,318 students from 72 countries accepted into the program who are working with 201 open source organizations this summer.

I am going to work for the organization: DBpedia. DBpedia is a large crowd source community project to extract data from Wikipedia and store entities and their relationships in the form of a knowledge base. In simple words, it stores “knowledge” in a large repository/database. Hence the name ‘Knowledge Base’.

Knowledge Base
A simple example from DBpedia

For example, consider a simple example from DBpedia in the above figure. The entity Barack Obama is connected with another entity Michelle Obama through the edge/relationship ‘spouse’. Likewise, there would be many other entities it may be connected with. Those entities would further be connected to many other entities, and may even share some common entities, making it a graph.

My work would be to build a project to work on algorithms to find : “Knowledge base embeddings for DBpedia”. In this project, I would be researching different algorithms to find embeddings for entities and relationships for DBpedia. These embeddings are nothing but vector representation of entities and relationships.

In recent years, a lot of research has been done to find good semantic representations for words in a continuous vector space such that semantically similar words are close to each other. One of such popular method is Word2Vec by Tomas Mikolov et. al. One of the famous example analogy using these embeddings in vector arithmetic has been : king – man + woman = queen.

In my project, I would be focusing on DBpedia data, instead of raw text data. More details about the project can be found in my project proposal here. My primary mentors for the project are Tommaso Soru, Sandro Athaide Coelho, Peng Xu and Pablo Mendes.

The coding has officially begun from 30th May, 2017. As a part of the program, I would be sharing details about the progress and discussion of the project every week. Please follow my blog for the weekly updates.

Stay tuned !!

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s