AiThority Interview with Jaewon Yang, Principal Staff Software Engineer at LinkedIn

By Sudipto Ghosh On Nov 22, 2022

Jaewon Yang, Principal Staff Software Engineer at LinkedIn

Hi Jaewon, please tell us about your current role and how you work with AI and machine learning algorithms at LinkedIn.

As a Principal Staff Software Engineer on the LinkedIn Knowledge Graph team, we build a knowledge graph by extracting relevant entities from member profiles, job postings, and company descriptions. This allows our team to improve LinkedIn’s search and recommendation products by understanding what titles and skills members have, what skills job postings require, which members work for which companies, and so on.

Could you tell us a little bit about PASS for LinkedIn? How do you leverage GNN to improve search and recommendation?

Graphs are ways to represent large amounts of relational information. At LinkedIn, our Economic Graph has over a billion data points that comprise it, including things like skills, jobs, and companies. Knowledge graphs like these are made up of many nodes and edges that represent the relationships among different data points. Graph Neural Networks (GNNs) are one technique that can be useful to run models and analyze the data contained in a graph. For GNNs to be useful as predictive models, they consider both individual data and adjacent neighbor nodes as input sources. The challenge with this approach is that it doesn’t scale well when a given node has many hundreds or thousands of neighbors, as is often the case with our Economic Graph. Performance Adaptive Sampling Strategy (PASS) is an approach that samples the most relevant neighbors for a specific AI task, avoiding the need to consider thousands of neighbor nodes as inputs for the task, while also avoiding the decreases in accuracy that you may see if you simply sample neighbor nodes to include randomly. What’s unique about PASS is that it can automatically learn what types of neighbors are most relevant, so that this sample curation doesn’t need to be done by hand (which would be prohibitively time-consuming).

In the future, we’re excited to integrate PASS at LinkedIn and there are already a lot of applications at LinkedIn where we apply GNNs, like when inferring all the skills a member has—we look at their skills and connections and are then able to infer skills that members are likely to have. We use this approach in other recommendations as well, so any time we match members and job postings, or advertisements and news articles in the LinkedIn ecosystem, we try to apply GNNs to make better recommendations.

Why should AI leaders focus more on GNNs? How do these improve the performance of any machine learning software capabilities?

GNNs are deep neural networks specialized to understand graphs. In other words, it is a neural network best suited for understanding relationships among entities. Historically, AI has made breakthroughs when the community found the right model architecture for a specific type of data; convolutional neural networks improved computer vision and transformers improved NLP. For graph-typed data, GNN is the right model architecture that will unlock AI’s ability to reason and understand graphs.

You mentioned that PASS can achieve higher prediction accuracy by using fewer numbers of neighbors than GNNs. Could you please explain this and how this is employed in your new approach – PASS for LinkedIn graphs?

One of the challenges with scaling GNNs in real-world applications is that a given data node of interest can have hundreds or thousands of direct (one-hop) or indirect (two-hop) neighbors. While taking these neighbors into account can help the GNN’s recommendation or prediction be more accurate, it becomes prohibitively slow in large-scale graphs. One existing solution has been to simply use a subset, or sample, of neighbor nodes for a given GNN task; but the drawback to this approach is that the neighbors selected may not be the most relevant for the given task at hand. For example, if we want to provide a job recommendation to a member, looking at their connections can help us infer relevant jobs for them. However, if we only sample a handful of their connections (to avoid prohibitive latency challenges) for the recommendation task, we may wind up with ones that are less relevant to the member’s likely job interests (such as family members). Using PASS, we’re able to automatically select the most relevant neighbors for the given task, which provides a more accurate recommendation to the member. In experiments on seven public benchmark graphs and two LinkedIn graphs, PASS outperformed the state-of-the-art GNN methods by 1.3%-10.4%.

We’re looking forward to continuing to innovate in this area and to finding ways to deploy PASS in production at LinkedIn.

Other than GNNs, which other ML techniques are you focusing on and why?

Aside from GNNs, our team focuses on NLP models because our inputs (member profiles, job postings, etc.) are text. We also focus on understanding structured text, such as ontologies.

What is the future of augmented intelligence and how do you see the current ongoing works in ANNs delivering results for a better future?

While augmented intelligence is not the Knowledge Graph team’s focus, I believe that GNNs can be used in augmented intelligence because they can capture relationships between the objects in the augmented world.

One area within GNNs that I think we’ll see continued automation in is the specification of parameters. Right now, an engineer still needs to specify a lot of the details of the GNN, such as how many “hops” of neighbors to analyze when running the model—meaning whether we’ll look at neighbors, neighbors of neighbors, and so on—within the graph. In the future, it would be great to be able to automatically tune this parameter when running a GNN and sampling neighbors. Further automating GNNs was one of our main goals with PASS; in our case, we were looking at selecting relevant neighbors once the number of hops had been specified.

Thank you, Jaewon! That was fun and we hope to see you back on AiThority.com soon.

[To share your insights with us, please write to sghosh@martechseries.com]

About Jaewon
About LinkedIn

About Jaewon

Jaewon Yang is a Principal Staff Software Engineer at LinkedIn. He holds a M.S. in Statistics and a Ph.D from Stanford University.

About LinkedIn

Founded in 2003, LinkedIn connects the world’s professionals to make them more productive and successful. With more than 830 million members worldwide, including executives from every Fortune 500 company, LinkedIn is the world’s largest professional network. The company has a diversified business model with revenue coming from Talent Solutions, Marketing Solutions, Sales Solutions, and Premium Subscriptions products. Headquartered in Silicon Valley, LinkedIn has offices across the globe.