CompareExpressions¶
This function utilises the SymPy
to provide a maths-aware evaluation of a learner's response.
Architecture overview¶
The execution of the evaluation function follows this pattern:
- Determine context
- Parse response and answer data
- Parse criteria
- Store input parameters, parsed responses in a key-value store that allows adding new fields, but not editing existing fields
- Execute generation feedback procedure provided by the context to generate written feedback and tags
- Serialise generated feedback and tags in a suitably formatted dictionary
Evaluation function¶
The main evaluation function is found in evaluation.py
as has the following signature:
evaluation_function(response : str, answer : str, params: dict, include_test_data=False : bool) -> dict
Input¶
This is the function that should be called to evaluate a response expression.
- response
is the response expression submitted by the learner
- answer
is a reference expression provided by the task author
- params
is a dictionary with optional parameters, for available parameters and their intended use, see the user documentation
- include_test_data
is a boolean that controls whether some extra data useful for testing or debugging is returned
Output¶
The function returns result dictionary with the following fields:
- is_correct
is a boolean value that indicates whether the response is considered correct or not
- feedback
is a string that provides information about what the evaluation function found when evaluating the response that is intended to be shown to the learner
- tags
is a list of strings that encode some information about what the evaluation function has found out about the response, more consistent across similar tasks than the string output in feedback
The returned dictionary will be referred to as the result
in this documentation.
Overview¶
The overall flow of the evaluation procedure can be described as follows:
- The function uses the parameters given in
params
to determine the context of the evaluation. What context means will be discussed in more detail in section TODO: Add section name here. - After the context is determined the response, answer and criteria (either supplied via
params
or from the context) are analysed an necessary information is stored for future use in a dictionary with frozen valuues, i.e. a dictionary where new items can be added but existing items cannot be changed. - The feedback generating procedure supplied by the context is used to generate feedback based on the contents of the frozen value dictionary.
- If all criteria are found to be satisfied the response is considered correct, i.e. the
is_correct
field in the result is set to true and the feedback string and list of tags generated by the feedback generation procedure are added to their respective fields.
TODO Describe what further information is supplied when include_test_data
is set to true.
Context¶
The context is a data structure that contains at least the following seven pieces of information:
- default_parameters
A dictionary where the keys are parameter names and the values are the default values that the evaluation function will use unless another value is provided together with the response. The required fields are context-dependent, currently all contexts use the default parameters found in utility\expression_utilities.py
and the physical_quantity
context adds a few extra fields, see the default parameters defined in context\physical_quantity.py
.
- expression_parse
function that parses expressions (i.e. the response
and answer
inputs) into the form used by the feedback generation procedure.
- expression_preprocess
function that performs string manipulations that makes ensures that correctly written input expressions follows the conventions expected by expression_parse
.
- expression_preview
is a function that generates a string that can be turned into a human-readable representation of how the evaluation function interpreted the response.
- feedback_procedure_generator
function that generates a function for each criteria that can be used to evaluate if the criteria is satisfied or not. The output from this function should be a list of tags that the feedback string generator can use to produce human readable feedback.
- feedback_string_generator
function that takes tags and outputs human readable feedback strings.
- generate_criteria_parser
function that generates a parser that can be used to turn the criteria (given in string form) into a form that the feedback generation procedure can use to determine if they are correct or not.
The context can also contain other fields if necessary.
Remark: The current implementation uses a dictionary rather than a dedicated class for ease of iteration during the initial development phase.
There are currently two different contexts:
- symbolic
: Handles comparisons of various symbolic expressions. Defined in context\symbolic.py
.
- physical_quantity
: Handles comparisons of expressions involving units. Defined in context\physical_quantity.py
.
Remark: Handwritten expressions are sent as latex, which requires extra preprocessing before the right context can be determined in some cases. It should be considered whether a new context, perhaps called handwritten
, should be created for this purpose.
TODO Describe currently available contexts in detail
symbolic
- Comparison of symbolic expressions¶
Remark: The symbolic
context should probably be split into several smaller contexts, the following subdivision is suggested:
- numerical
: Comparison of expressions that can be evaluated to numerical values (e.g. expressions that are already numerical values or expressions only containing constants). Focuses on identifying if numerical values are greater than, less than, proportional to the expected answer or similar.
- symbolic
: Comparison of symbolic expressions that cannot be reduced to numerical values.
- equality
: Comparison of mathematical equalities (with the extra complexities that come with equivalence of equalities compared to equality of expressions).
- inequality
: Same as equality
except for mathematical inequalities (which will require different choices when it comes to what can be considered equivalence). It might be appropriate to combine equality
and inequality
into one context (called statements
or similar).
- collection
: Comparison of collections (e.g. sets, lists or intervals of the number line). Likely to consist mostly of code for handling comparison of individual elements using the other contexts, and configuring what counts as equivalence between different collections.
symbolic
Criteria commands and grammar¶
Criteria
The criteria commands uses the following productions
START -> BOOL
BOOL -> EQUAL
BOOL -> ORDER
BOOL -> EQUAL
BOOL -> EQUAL
BOOL -> RESERVED written as OTHER
BOOL -> RESERVED written as RESERVED
BOOL -> RESERVED contains OTHER
BOOL -> RESERVED contains RESERVED
EQUAL_LIST -> EQUAL;EQUAL
EQUAL_LIST -> EQUAL_LIST;EQUAL
EQUAL -> OTHER = OTHER
EQUAL -> RESERVED = OTHER
EQUAL -> OTHER = RESERVED
EQUAL -> RESERVED = RESERVED
EQUAL -> OTHER ORDER OTHER
EQUAL -> RESERVED ORDER OTHER
EQUAL -> OTHER ORDER RESERVED
EQUAL -> RESERVED ORDER RESERVED
OTHER -> RESERVED OTHER
OTHER -> OTHER RESERVED
OTHER -> OTHER OTHER
START
: Formal token used to indicate the start of an expression (in practice: any expression that can be reduced to a singleSTART
is a parseable criterion).END
: Formal token that indicates the end of a tokenized string.NULL
: Formal token that denotes a token without meaning, should not appear when an expression is tokenized.BOOL
: Expression that can be reduced to eitherTrue
orFalse
.EQUAL
: Token that denotes symbolic equality between the mathematical expressions.EQUALITY
: Token that denotes the equality operator=
.EQUAL_LIST
: Token that denotes a list of equalities.RESERVED
: Token that denotes a formal name for a reserved name for an expression. Reserved names includeresponse
andanswer
.ORDER
: Token that denotes an order operator. Order operators include>
,<
,>=
and<=
.WHERE
: Token that denotes the separation of a criteria and a list of equalities that describe substitutions that should be done before the criteria is checked.WRITTEN_AS
: Token that denotes that syntactical comparison should be done.CONTAINS
: Token that denotes that a mathematical expression is dependent on a symbol or subexpression.SEPARATOR
: Token that denotes which symbol is used to separate a the list of equalities used byWHERE
.OTHER
: Token that denotes any substring that will be passed on for more context specific parsing (e.g. explicit mathematical expressions for symbolic comparisons).
Examples of commonly used criteria¶
TODO Add examples
physical_quantity
- Comparison of expressions that involve units¶
physical_quantity
Criteria commands and grammar¶
The criteria commands uses the following productions
START -> BOOL
BOOL -> EQUAL
BOOL -> ORDER
BOOL -> EQUAL where EQUAL
BOOL -> EQUAL where EQUAL_LIST
BOOL -> RESERVED written as OTHER
BOOL -> RESERVED written as RESERVED
BOOL -> RESERVED contains OTHER
BOOL -> RESERVED contains RESERVED
EQUAL_LIST -> EQUAL;EQUAL
EQUAL_LIST -> EQUAL_LIST;EQUAL
EQUAL -> OTHER = OTHER
EQUAL -> RESERVED = OTHER
EQUAL -> OTHER = RESERVED
EQUAL -> RESERVED = RESERVED
EQUAL -> OTHER ORDER OTHER
EQUAL -> RESERVED ORDER OTHER
EQUAL -> OTHER ORDER RESERVED
EQUAL -> RESERVED ORDER RESERVED
OTHER -> RESERVED OTHER
OTHER -> OTHER RESERVED
OTHER -> OTHER OTHER
START
: Formal token used to indicate the start of an expression (in practice: any expression that can be reduced to a singleSTART
is a parseable criterion).END
: Formal token that indicates the end of a tokenized string.NULL
: Formal token that denotes a token without meaning, should not appear when an expression is tokenized.BOOL
: Expression that can be reduced to eitherTrue
orFalse
.QUANTITY
: Token that denotes a physical quantity, that can be either given as both a value and units, only value (i.e. a dimensionless quantity) or only units.DIMENSION
: Token that denotes an expression only containing physical dimensions.START_DELIMITER
: Token that denotes a list of equalities.INPUT
: Token that denotes any substring that will be passed on for more context specific parsing (e.g. explicit mathematical expressions for symbolic comparisons).matches
: Token for operator that checks in two quantities match, i.e. if they are rewritten using the same units, are their values equal (up to chosen tolerance).dimension
: Token for expression only involving dimensions (i.e. no values or units).=
: Token for operator that checks equality (i.e. compares if value and units are identical separately)<=
: Token for operator that checks if a quantity's value is less than or equal to another quantity's value (after both quantities are rewritten on the same units)>=
: Token for operator that checks if a quantity's value is greater than or equal to another quantity's value (after both quantities are rewritten on the same units)<
: Token for operator that checks if a quantity's value is less than another quantity's value (after both quantities are rewritten on the same units)>
: Token for operator that checks if a quantity's value is greater than another quantity's value (after both quantities are rewritten on the same units)
Examples of commonly used criteria¶
TODO Add examples
Code shared between different contexts¶
Expression parsing¶
TODO Describe shared code for expression preprocessing and parsing
TODO Describe shared code for expression parsing parameters
Other shared code¶
TODO Describe shared default parameters
Feedback and tag generation¶
- Generate feedback procedures from criteria, each procedure return a boolean that indicates whether the corresponding criterion is satisfied or not, a string intended to be shown to the student, and a list of tags indicating what was found when checking the criteria
- For each criterion; run the corresponding procedure and store the result, the feedback string and the list of tags
- If all criteria are found to be true, then the response is considered correct
Tag conventions¶
The feedback procedures consists of a series of function calls, the specifics are determined by the particular criteria, that each return a list of strings (called tags). Each tag then indicates what further function calls must be performed to continue the evaluation, as well as what feedback string (if any) should be generated. When there are no remaining function calls the feedback procedure is completed. The tags are formatted according as criteria_
name of function call outcome. For tags that are not connected to a specific criteria (e.g. tags that indicate an issue with expression parsing) the criteria name and underscore is omitted.
Returning final results¶
The function returns result dictionary with the following fields:
- is_correct
is a boolean value that is set to True
is all criteria are satisfied
- feedback
is a string that is created by joining all strings generated by the feedback procedures with a line break between each string.
- tags
is a list of strings that is generated by joining all lists of tags generated by feedback procedures and removing duplicates.
Preview function¶
When the evaluation function preview is called the code in preview.py
will be executed. Since different contexts interpret responses in different ways they also have their own preview functions. The context-specific preview functions can be found in preview_implementations
.
Remark: Since it is likely that there will be significant overlap between the response preview and the response evaluation (e.g. code for parsing and interpreting the response), it is good practice if they can share as much code as possible to ensure consistency. For this reason it might be better to move the preview functions fully inside the context (either by making a preview
subfolder in the context
folder, or by moving the implementation of the preview function inside the context files themselves). In this case the preview.py
and evaluation.py
could also share the same code for determining the right context to use.
Tests¶
There are two main groups of tests, evaluation tests and preview tests. The evaluation test can be run by calling evaluation_tests.py