a Future or a New Reality?

Key Takeaways

  • Instead of competing with humans, AI developers may attempt to use algorithms to augment programmers’ work and make them more productive: in the software development context, we’re clearly seeing AI both performing human tasks and augmenting programmers’ work.
  • Based on our research, programmers spend 35% of their time on understanding code, 5% on writing code, 10% on other coding-related activities, and 50% on other non-coding activities — even with the advanced computers, we don’t expect such tools to redefine the profession of a programmer any time soon.
  • Helping programmers perform small tasks more efficiently is a vast area for AI usage: AI can help to complete the code, to teach the user to utilize new features, and to search in the code and beyond.
  • Obstacles on the way to flawless AI include training data unavailability, resource requirements, and the interface between the AI and the user.
  • The companies working on software development tools are quickly developing the ability to productize AI-powered solutions for small tasks, so we expect more such solutions to emerge in the near future.

     

 

People are exposed to AI in both their everyday and professional lives more and more every day. We at JetBrains create tools for programmers, and we feel the industry of software development is no exception to this trend.

People use AI in two ways:

  1. Replace humans, fully automating some of their jobs.
  2. Augment humans, keeping them as the key figure in the process.

Algorithms already write code, but human developers don’t need to fear being immediately replaced.

Surprisingly, this is not because it’s impossible to teach computers the skills needed to be a programmer, but because it’s impractical.

There are three big factors limiting AI progress:

  • The limited availability of training data.
  • Limited computational resources.
  • The complexity of the interface between algorithms and people.

In augmenting the work of human programmers, many mundane tasks such as code completion, code search, and bug detection are now powered by machine learning.

Figure 1. Ways to apply AI and the difficulties on each way.

How Do People Envision AI?

When people hear the term “AI,” they often imagine a computer replacing a human, performing the same task but doing it better in some way: faster, cheaper, with higher quality, or all of these combined. Examples of such tasks include playing chess or Go, writing poetry, and driving a car.

Some people embrace the possibility of computers freeing them from their mundane work, while others are skeptical. The latter may claim that machines are far from matching what humans can do. 

Questions like “How will you teach a computer to do this?” often carry the implication that you can’t. Here are a few examples of this sort of question that were raised in the past:

  • The number of reasonable moves in Go exceeds the available computational resources. How will you replace human intuition? The experts in this article from 1997 estimated that it would take a hundred years.
  • How do you teach a self-driving car to see a wet patch of road and slow down?

Computers already play Go and drive cars, so these questions are now outdated. This gives us reason to believe that questions of this nature that are still outstanding will also be answered eventually. Whatever professional area we take, computers are closer to matching human skills than most of us think. 

However, replacing a human is not always expedient. Instead of competing with humans, the developers of AI-based technologies may choose a different product strategy and attempt to use algorithms to augment programmers’ work and make them more productive.

In the software development context, we’re clearly seeing AI both performing human tasks and augmenting programmers’ work.

Replacing the human programmer 

The announcement of GitHub Copilot powered by OpenAI gave new life to discussions about when or whether computers will replace human programmers. The skeptics who thought replacing humans was impossible always asked:

  • How do you explain to the machine what your program should do? 

The answer is pretty simple. You define what you want using a natural language, provide a name for the function, and, optionally, write a few lines to get it started. Copilot then fills in the rest, much like a real programmer would.

Some people are delighted at how clever Copilot is. Others duly note the glitches in its work and find them significant enough to suggest that human programmers will still be needed for the foreseeable future. Then there is another group of reviewers who notice the same glitches but conclude that Copilot is a terrible and dangerous tool that shouldn’t be touched with a barge pole.

What is the main flaw they point out?

Programs created by Copilot are often verbose and hard to read. 

Clarity is critical because reading code is more important for a developer than writing it. In most cases, programmers add functionality on top of code written earlier, often by other people. In this respect, writing code is like adding a new room to an old house. You have to determine whether your addition will change the balance and topple the whole building. If it’s safe, you still have to understand and modify the existing structure to build on top of it. Programmers spend most of their working time adding “new rooms” to “existing houses.”

As R. Minelli, A. Mochi, and M. Lanza found, programmers spend about 70% of their coding-related time understanding the code, while the writing effort only accounts for about 5% of it. Based on this and other studies, we see the distribution of the developer’s time at work as shown in Figure 2 below.

Figure 2. Time spent by programmers at work, breakdown by activities

Verbose and unclear machine-generated programs could make the already difficult “understanding” part even more difficult.

The cognitive load on the human part of the tandem persists: the programmer still needs to understand what the algorithm writes. How long can humans maintain the tempo set by the computer? Having AI write code may speed up small tasks, but not necessarily large projects.

Compare that to revision control, which was introduced in the 1970s. The ability to track and revert changes vastly increased the limits of what people could comprehend. It enabled the cooperation of large groups of programmers, and as a result, allowed the creation of more complex systems. That was transformational for the whole industry.

Copilot is an excellent research result demonstrating the potential of AI. It does what many people thought impossible. And yet, we don’t expect such tools to redefine the profession of a programmer any time soon.

Helping human programmers

Copilot, though a breakthrough in programming-related AI, still is neither a revolution in the industry nor a replacement for human work. Keeping in mind that such a revolution may happen at some point, we still have to continue to improve existing software development processes. Helping programmers perform small tasks more efficiently is a vast area for AI usage.

Tools for software developers usually begin with strict rules (“heuristics”) and no AI under the hood. The rules grow more complex as new functionality is built into each tool. Eventually, it becomes nearly impossible for a human to comprehend everything and understand how to modify the tools. This is where AI can help.

Code completion 

When you begin typing a search query in Google, it takes the characters you are entering and starts suggesting full query options. Source code editors provide programmers with very similar functionality.  

The first versions of code completion appeared long ago, in the XX century, and calculated the frequencies of the words in the project. They displayed the most frequent words that started with the characters the user typed. Such a frequency-based approach worked well enough to provide a productivity boost. Over the years, people improved the algorithm with several heuristics on top of the frequency idea, but the desire to provide precisely the word the user wanted drove us to use machine learning to sort the suggestions.

The information we can use to determine the best suggestion is so abundant that it is impossible to create a deterministic algorithm taking it all into account. We would have to handle too many exceptional cases. 

For example, there are a few following general rules. The closer the token is defined to the place the programmer currently edits, the more likely it is. Also, the standard language libraries can be sorted by popularity, and the tokens from least popular libraries can be deprioritized. All that being said, imagine that you develop a source code editor in Java (that’s exactly what we do at JetBrains) and start typing “Co”. Which of the two suggestions below would you prefer?

On the one hand, we do use red-black trees in the editor. On the other hand, java.awt package is very seldom used in the industry. But still, by “Color” we most likely mean java.awt.Color in our project.

We have well over a hundred factors that influence the suggestions ordering. Is the suggestion a symbol defined in the user’s project, a standard language library, or an imported third-party library? Is the place where the suggestion will be inserted in the beginning or in the middle of a line? Is there a dot in the line before this location? How much does the user work per day, on average? Do they have the suggestion definition opened right now in a different editor tab? 

Machine learning allows us to extract the patterns in a semi-automated way and take all these and many other factors into account where it is not feasible to spell all the dependencies out explicitly. 

Taking this further, it is also possible to use AI to generate minor code fragments. If the editor completes the current line instead of just one word, there’s a good tradeoff between development speed and cognitive load:

Figure 3. An example of code completion suggesting more than a single word. 

Some companies make this feature the core of their business. For example, TabNine and Kite distribute their software as editor add-ons that help programmers complete lines “the AI way.”

Teaching the user to utilize new features

A source code editor is a complex piece of software. There are hundreds of productivity-boosting operations that can be invoked. Unfortunately, it is impossible for programmers to know them all.

We can promote certain functionality by showing tips on startup, but recalling these tips when the time comes to use them might be difficult. Programmers typically have a set of around fifty favorite commands. With intelligent tips, we must present a user with two or three actions that will be especially helpful to them based on their work patterns and habits. 

AI can be used to create these personalized recommendations. For example, we may want to tell the user about the code move operation if, and only if, they frequently perform cut/paste operations within the same screen:

Figure 4. The “Code move” operation editor tip.

The most straightforward way to achieve this is known as “collaborative filtering.” Modern recommendation systems for music, video, books, and goods all use it. There are two basic steps:

  1. Find the users “similar” to the given one.
  2. Find what these users do that the given user doesn’t do yet and base our recommendation on that difference.

For the content recommendations, finding similar users is fairly simple: if our target person likes the same ten movies as a group of other people, but hasn’t seen one more that everybody in this group likes, it’s a pretty safe bet. The only caveat is to avoid the ultra-popular movies which almost everyone rates positively. Liking “The Godfather” or “Forrest Gump” doesn’t really say much about the user’s preferences.

For the source editor features, it’s a bit more difficult. There are no features of the same genre or with the same cast, so we’ll have to analyze smaller behavior patterns. How much time does the user spend debugging? How often do they edit existing code? How fast can they type? Do they write tests before or after writing the code, if at all? Taking factors like these into account will determine similarity between users and recommend tools that will be useful given the known behavior patterns.

Searching in the code and beyond

Search is a kind of functionality present in many software products, from web search engines to online stores. Source code editors have this functionality, too: developers regularly need to find something in their code, documentation, and tool configuration options. These are very different types of information, and usually, software development tools have separate locations to look for them.

We want to provide a single search function within the source code editor that can be used to find any of the above domains, taking synonyms and typos into account. Since so many people work on search algorithms, one could expect a standard reusable solution to exist, but alas, each domain has specific details that require the search functionality to be developed separately.

The complications begin when there are different item types with similar names available in the project. If the user types “format” in the search box while there is a file named Formatter.java in their project, are they looking for that file, for standard formatting library functions, or for the IDE functionality to reformat the code of their project? 

Machine learning works as a means of blending together the search results from different sources and weighing them against each other. The factors influencing the decision include the text matching, the user’s search history and their previous preferences (for example, do they ever click on the file search results?), the content of the user’s project, and what the user was editing immediately before issuing the search query. Writing a deterministic algorithm taking all these into account doesn’t look feasible, while the machine learning methods extract the patterns automatically.

The tradeoff of introducing AI 

The sum of all the minor AI-powered improvements to user productivity can result in an impressive overall boost. However, it does come at a cost.

AI-based systems work well in most cases, but there are some situations where they can provide weird results. Providing such results to the users costs us some of their trust. Each time we replace strict rules with an AI-powered decision-making system, we have to decide whether to make a tradeoff. We can improve our average decision quality, but we may lose some of the users’ trust. 

It would be nice to create flawless systems where trust will not be lost due to poor suggestions, but there are several obstacles to this.

Obstacles on the way to flawless AI 

Training data unavailability

Many machine learning algorithms need example data for the training phase; the dataset quality is critical. We often already know what data we need to obtain, but obtaining it is either costly or illegal. 

Code generation tasks, be it a completion in IDE or generation of a whole function in Copilot, need source code for training, and it seems natural to use open-source repositories available on GitHub. However, these repositories come encumbered with licenses that may impose additional requirements on the derivative works. 

This leaves us with two big questions: Is our AI-power algorithm a derivative work from the code we used to train it? And is the text that this algorithm writes a derivative work? 

On the one hand, AI authors don’t copy anything into the algorithm. On the other hand, the neural network is incapable of independent thinking. All the code it produces is a combination of fragments it has seen during the learning phase. It may even create pieces of code that look like exact copies from the training dataset. The point is that even pieces that look independent are no more independent than the copies.

The problem is pretty new, and we haven’t seen any court decisions yet. This uncertainty slows down the progress of product developers: people don’t want to make significant investments into something that might become illegal tomorrow.

We faced the same issue when creating our code completion system. In addition to the potential legal limitations, there were technical difficulties as well. The code we can find in an open-source repository is in some sense “complete”. It usually compiles, passes simple tests, has clear formatting, doesn’t contain duplicate blocks or temporary debug sections. However, the code we have to work with in the editor is not “complete” most of the time. Therefore, the training data we can obtain from the open-source repositories will not match the exploitation conditions.

We have worked around these limitations by utilizing the completion usage statistics from our products. Making this data fully anonymous took a lot of effort, but in the end it all worked out well. You can find more technical details in this JetBrains blog post

These are just a couple of examples that people face when trying to collect the training data from the AI-powered algorithms.

Resource requirements

AI algorithms are hungry for resources. This hunger has different implications for the learning and exploitation phases. Resource limitations impose extra costs on the learning phase where the algorithm developers may decide whether or not to invest in hardware. But during the exploitation phase, it is not the algorithm developers’ resources but their users’ resources that are the limiting factor.

A standard solution to the problem of resource demand during exploitation is to move the resource-intensive computation to a remote cluster. Our clients usually want to keep their source code inside their protected network, so using a remote server is often impossible. This means our algorithms have to work inside the source editor on the user’s laptop. The editor has a lot of features and consumes a considerable amount of resources already, so every newly allocated byte counts. 

There is a chasm between a normal software developer’s view on resources and the view that a machine learner has. The following story can illustrate how deep and wide this chasm is.

When we first replaced rule-based code completion with machine learning, the team responsible for the task reduced the additional memory requirement to 1.5 MB. The team thought such a reduction bordered on the impossible and were all quite proud of it. It’s funny how different people react when they learn about it:

The academic AI researchers we were collaborating with said: “1.5 MB? You must be joking! We would have started with about 1 GB!”

On the other hand, the traditional tool developers who reviewed the change said: “1.5 MB? Why do you need so much?” 

Researchers often assume infinite resources, whereas production has to run on the available hardware. That’s one of the reasons why breakthrough research results in AI are hard to productize.

The interface between the AI and the user

But even when remote execution is possible, running the AI on a remote server instead of the client’s machine may present a significant usability issue.

Any delay can be intolerable to users, especially if it is for something quite small. Therefore, a round trip to the server is forbidden for code completion, feature recommendations, and other AI-powered improvements.

Even with sufficient speed, AI-powered features must almost seamlessly blend into users’ workflow and not be a distraction.

For example, we can find some bugs in the user’s code. At first sight, this seems like the kind of functionality users would be likely to embrace. The challenge, though, is to find the right time to report a bug. Alerting the user when they are knee-deep in their coding will only distract them and probably cause them to disable the feature. We must catch the moment the programmer has completed a chunk of work and is in a state of mind to check it for problems before moving further.

Perhaps here we could do with an AI algorithm that would decide when to present the results of another AI algorithm?

Looking forward

AI is coming to software development, just as it is coming to other domains, attempting to both mimic humans and augment their work.

Computers can now write code, but we don’t see this as an industry transformation. The ability to write code is not sufficient to replace a human programmer. It turns out that the ability to consider complex interactions of software components is critical, and AI is not there yet.

In the meantime, there are numerous opportunities to use machine learning to deliver minor improvements, and the accumulated sum of these improvements can make software developers much more productive. The companies working on software development tools are quickly developing the ability to productize AI-powered solutions for small tasks, so we expect more such solutions to emerge in the near future.