Batch checker

The batch checker gets the contestant's output, the input, and the correct output. It should determine whether the contestant's output is correct.

There are various types of checkers you can use:

tokens – fast, versatile file equality checker
shuffle – similar to tokens, but allows permutations of tokens
diff – file equality checker based on the diff command line tool (avoid this option, as it has quadratic time complexity)
judge – custom checker

If there is only a single correct output (e.g. the minimum of an array), tokens is strongly recommended. Otherwise, when there are multiple correct outputs (e.g. the shortest path in a graph), writing a judge is necessary. Set out_check accordingly.

Tokens checker

A fast and versatile equality checker. Ignores whitespace, but not newlines. (Ignores newlines only at the end of a file.)

Tokens are separated by (possibly multiple) whitespace characters. For the output to be correct, the tokens need to be same as in the correct output file.

You can customize the tokens checker with tokens_ignore_newlines or tokens_ignore_case. For comparing floats, set tokens_float_rel_error and tokens_float_abs_error.

Shuffle checker

Similarly to the tokens checker, the shuffle checker compares the output with the correct output token-by-token. Allows permutations of tokens (configure with shuffle_mode). Use shuffle_ignore_case for case insensitivity.

Diff checker

An equality checker based on the diff tool. Runs diff -Bbq under the hood. Ignores whitespace and empty lines.

This out_check is not recommended

In some cases, diff has quadratic time complexity, leading to unexpectedly slow checking of outputs.

Custom judge

If there can be multiple correct solutions, it is necessary to write a custom judge. Set out_judge to the path to the source code of your judge, judge_type to the judge type (see below), and judge_needs_in, judge_needs_out to 0/1, depending on whether the judge needs the input and the correct output.

When writing a custom judge, you can choose from multiple judge types:

cms-batch judge
opendata-v2
opendata-v1

CMS-batch judge

The CMS batch judge format as described in the CMS documentation.

It is run as follows (having filenames given as arguments):

./judge <input> <correct output> <contestant output>

The judge should print a relative number of points (a float between 0.0 and 1.0) to its stdout as a single line. To its stderr it should write a single-line message to the contestant. Unlike what the CMS documentation specifies, the files should be single-line only. There will be a warning otherwise.

Example cms-batch judge

For a task of printing N positive integers that sum up to K, the judge may look like this:

#!/usr/bin/env python3
import sys
from typing import NoReturn


def award(points: float, msg: str) -> NoReturn:
    print(points)
    print(msg, file=sys.stderr)
    exit(0)


def reject() -> NoReturn:
    award(0, "translate:wrong")


input_, correct_output, contestant_output = sys.argv[1:]

with open(input_) as f:
    n, k = map(int, f.readline().split())

with open(contestant_output) as f:
    try:
        numbers = list(map(int, f.readline().split()))
    except ValueError:
        reject()  # The contestant did not print integers
    except EOFError:
        reject()  # The contestant output ends

    if f.read().strip():
        reject()  # Contestant output doesn't end when it should


# Be careful to check **ALL** constraints
if any(map(lambda x: x <= 0, numbers)):
    reject()

if sum(numbers) == k:
    award(1.0, "translate:success")
elif sum(numbers) >= k / 2:
    award(0.5, "translate:partial")
else:
    reject()  # Not enough sand used

Opendata-v2 judge

The opendata-v2 judge is run in this way:

./judge <test> <seed> < contestant-output

Where test is the testcase's test number and seed the testcase's generating seed. (The arguments are the same as those given to the opendata-v1 generator this input has (probably) been generated with.) If the input was not generated with a seed (static or unseeded), seed will be -.

If judge_needs_in is set, the judge will get the input filename in the TEST_INPUT environment variable. Similarly, if judge_needs_out is set, the correct output filename will be in the TEST_OUTPUT environment variable.

If the output is correct, the judge should exit with return code 42. Otherwise, the judge should exit return code 43.

Optionally, the judge can write a one-line message for the contestant to stderr (at most 255 bytes), followed by a sequence of lines with KEY=value pairs. The following keys are allowed:

POINTS – Number of points awarded for this test case (used only if the exit code says "OK").
LOG – A message that should be logged.
NOTE – An internal note recorded in the database, but not visible to contestants.

Values are again limited to 255 bytes.

Example opendata-v2 judge

For a task of printing N positive integers that sum up to K, the judge may look like this:

#!/usr/bin/env python3
import os
import sys
from typing import NoReturn

SUBTASK_POINTS = [0, 20, 30, 50]


def award(points: float | None, msg: str) -> NoReturn:
    assert "\n" not in msg
    print(msg, file=sys.stderr)
    print(f"POINTS={points}", file=sys.stderr)
    exit(42)


def reject(msg: str) -> NoReturn:
    assert "\n" not in msg
    print(msg, file=sys.stderr)
    exit(43)


def main(subtask: int, seed: str) -> NoReturn:
    # Read the test input
    with open(os.environ["TEST_INPUT"]) as f:
        n, k = map(int, f.readline().split())

    # Read the contestant output
    try:
        numbers = list(map(int, input().split()))
    except ValueError:
        reject("The output should contain integers.")
    except EOFError:
        reject("The output is empty.")

    try:
        input()
        reject("The output should contain only one line.")
    except EOFError:
        pass

    # Be careful to check **ALL** constraints
    if any(map(lambda x: x <= 0, numbers)):
        reject("The output contains negative integers.")

    if sum(numbers) == k:
        award(SUBTASK_POINTS[subtask], "All of the sand used.")
    elif sum(numbers) >= k / 2:
        award(SUBTASK_POINTS[subtask] // 2, "At least half of the sand used.")
    else:
        reject("Not enough sand used.")


if __name__ == "__main__":
    main(subtask=int(sys.argv[1]), seed=sys.argv[2])

Opendata-v1 judge

The opendata-v1 judge is the same as opendata-v2, with the exception of using different return codes, return code 0 for a correct output and return code 1 for a wrong output.

This judge_type is not recommended

Return with exit code 1 is very common and is for example trigger by any exception in Python. This can lead to internal judge bugs disguising themselves as wrong answers.