When a programming task has been completed by a candidate a report is generated. It may seem daunting and cluttered with information at the beginning, so to help you understand what it means, we will go through each part of the report explaining where the results come from. 

This is what a report will look like.

Automatic Evaluation

This consists of two parts: the Functionality score and the Code quality score.  

The Functionality score (90% ) takes into account the verification tests. It will show how well the code is working, the higher the score the better the functionality.

Code quality is 10% of the overall score with any Devskiller task (It will be individually set if you create your own ) It will score the code before the candidate starts and after the candidate has finished against an outside source code analyzer. It finds common programming flaws like unused variables, empty catch blocks, unnecessary object creation, and so forth. 

The best way to think of these two tests is to compare them to learning a language. The functionality score is whether a person is able to communicate what they want, so if a sentence gets the desired result. The code quality score is the grammar, a sentence might get the desired result it may not be grammatically correct.

I want red ice cream  - it is functional (Not that well) but the quality is bad
Hello, I would like the strawberry ice cream, please. - it is functional and the quality is good

Build Status

This section will tell you if the candidate has managed to build a working piece of code. In this example, we can see that they haven't. If desired, it is possible to go in further and see their build log and the exact reason why it failed. The button is located in the bottom right-hand corner of the box.

Verification tests

At Devskiller we have  two types of tests, candidate tests which are visible during the test and verification tests which are hidden from the candidate and are only visible to the recruiter. They will test the edge cases and try to find out if a candidate is a problem solver. A basic example is the calculator task that we have, here is the task:

Build a calculator that can divide, multiply, subtract and add. The candidate tests are as follows:

  • Can divide 
  • Can Add
  • Can Subtract 
  • Can multpily. 

The verification test that we have is:

  • Candidate has written a rule to say you cannot divide by zero.

This will help you to find a candidate who isn't just focused on the output but on the code as a whole. 

Technical debt

The technical debt is an important metric which measures how long it would take a professional developer to fix the violations in the code and to make it 'Perfect'.

The more violations that have been made during the programming task, the longer the time it will take to fix them. In this section, we would expect the time to go down. If it has gone up, then the candidate has made the code worse than it originally was. 


Violations encompass a wide range of mistakes or errors that can be made during the programming task. The report will give the original number of violations and the piece of code afterward. There are many examples of violations and they will all be weighted differently depending on the severity. Here are some examples:

  • Not naming variables correctly
  • White spaces after a piece of code
  • The method is too long
  • The class has too many fields 
  • Spelling mistakes 

Rules Compliance

Rules compliance is an infographic, which is produced from the violations and the weight of the severity of the violations. If a candidate was to solve all the violations it would get a score on 100%. We can also see if the code has been made worse as well. 

Did this answer your question?