"

3 CheckIt OER Assessment Generator

Steven Clontz; Drew Lewis; and Sharona Krinsky

Abstract

The CheckIt framework is an open educational technology for authoring randomized exercises, banks, and assessments for mathematics. Technological means facilitate the easy sharing and modification of open-source templates and generators authored in the framework to be used both in classrooms as well as to support educational research; case studies in this paper illustrate its use in both contexts, and describe its use in both small and large classrooms.

Keywords: OER, technology, mathematics, assessment

Introduction

Prior to 2020, there was growing interest in a loose collection of grading practices that is now most often referred to as “alternative grading”. In the authors’ field of mathematics, for example, the practitioner-focused journal PRIMUS was already at work on a special issue on “Mastery Grading” (Campbell et al., 2020), while the first Grading Conference was in the planning stages.  After the onset of the COVID-19 pandemic, this interest exploded, as evidenced by the Grading Conference drawing over 500 attendees after shifting to a virtual format (Owens et al., 2020). While not an entirely well-defined term, Clark and Talbert (2023) provided the following four pillars to describe this broad swath of practices:

  1. Students’ work is evaluated using clearly defined standards.
  2. Students’ work is marked with an indication of progress, rather than an arbitrary number.
  3. Students are given helpful feedback.
  4. Students can have their work reassessed without penalty.

While we (and others) might quibble at the margins of this definition, it succinctly captures a large majority of practices that fall under this umbrella of alternative grading. Some specific examples likely to be familiar to readers include specifications grading (Nilson, 2014), standards-based grading (Elsinger & Lewis, 2020), and ungrading (Blum, 2020).

Clark and Talbert’s (2023) fourth pillar often turns out to be a significant barrier for instructors desiring to implement alternative grading; allowing students to have their work reassessed requires that instructors create additional assessment opportunities for them. Some specific implementations (e.g., specifications grading) allow for this to be done by revising existing work; however, other implementations (e.g., standards-based grading) typically require instructors to create new assessment tasks, which can prove onerous for instructors as class sizes increase.

In mathematics and other computational disciplines, it is often possible to algorithmically describe and procedurally generate tasks that are suitable for assessing students’ knowledge about a particular learning objective. There are several tools that allow instructors to generate assessment tasks with randomized elements; however, most of these are proprietary tools owned by publishers that require students to pay a fee, with the notable exception of WeBWorK (Gage et al., 2002). Further all existing tools that we are aware of are designed around the feature of automated answer checking; students are prompted to enter an answer, and the software will record whether it is correct or not. Automated answer checking has its pitfalls as it focuses students’ attention on obtaining a correct “answer” in a format that will convince the software of its correctness, rather than centering students’ full attention on the learning goals at hand. Any numerical score produced by these systems thus confounds student learning with students’ ability to convince the software to accept their answer.  Moreover, these systems often have very limited choice in output formats, typically only displaying the questions in an HTML format (i.e. in a web browser). This output display reflects the origin of many of these software tools, as online homework systems, rather than more general assessment generators supporting other methods of assessment delivery (e.g., print).

To address these problems, the first author developed the open-source software CheckIt (Clontz, 2023). This software allows instructors to algorithmically generate an unlimited number of assessments for a given learning outcome alongside outputs in various formats (e.g., PDF or the open QTI format), which can be imported into many common learning management systems (LMS). In contrast to existing software solutions that are sometimes repurposed for use with alternative grading, CheckIt was designed from the outset to aid in the implementation of alternative grading practices. Randomized problems can be generated automatically, but they must be aligned with learning outcomes and marked by an instructor rather than the computer. In the next sections we describe the CheckIt authoring process, the infrastructure that supports CheckIt authors, and case studies highlighting the use of CheckIt at a mid-sized southern regional university and a large urban west-coast Hispanic-serving institution. Finally, we conclude by noting a few limitations of the framework in its current manifestation.

Considerations for Designing Outcomes

Content written for CheckIt is organized into banks. Following principles of backwards design (Wiggins & McTighe, 2005), each bank first defines a list of learning outcomes. There is no required organization for these outcomes; however, in practice most authors choose to organize their outcomes into related groups (e.g. modules, chapters). Each outcome has a short title (e.g. “Comparing Fractions”), and a description defining the evidence of learning that should be demonstrated by the student’s response (e.g. students who produce a satisfactory response to the generated question have shown they can “Use common denominators to compare two fractions”).

As noted in Clark and Talbert’s (2023) first pillar above, careful design of this description is essential. For this example, a well-designed outcome must always produce an exercise that prompts a student to demonstrate their ability to use common denominators to compare two fractions. It is important an outcome never produces an exercise that could be reasonably solved, without demonstrating a technique that can be used in general to compute common denominators. For example, the prompt “show how to determine which of the fractions 9/15 or 4/5 is larger”  as an oversight as the student could simplify the first fraction into 3/5 and then directly compare numerators. While there is value in a student being able to use this problem-solving technique, it would not demonstrate their understanding of the learning outcome; simplification of fractions is not a generally viable strategy for finding common denominators. To alleviate this, mathematical techniques may be used within the code (e.g., ensuring that all the numerators and denominators in the given problem have no factors in common). Another common approach is to create multiple-task questions, where one task might always be solvable using the slick technique of simplifying one of the fractions, but the other must still use the generally applicable technique for finding common denominators. This dual has the advantage of providing an opportunity for students to demonstrate flexible problem-solving techniques while ensuring a satisfactory response still demonstrates the desired generally applicable skill.

Another important consideration is that each produced exercise should be of comparable difficulty. For example, an outcome that occasionally asks a student to compare 1731 and 1219 but asks another student to compare 12 and 35 would likely not be appropriate. The goal for a well-designed outcome is to allow students to engage with generated exercises while also being able to demonstrate their learning.

Finally, one must consider edge cases. If the context of our running example is a class where students are only expected to be able to handle two-digit whole numbers, then care must be taken to ensure the common denominator does not grow larger than 99.  Limiting denominators to less than 10 could guarantee the product of two denominators remains below 100; however, this would also eliminate potentially appropriate examples such as 1724 and 2336, which have a common denominator of 72.

In our experience, these issues are often found after the questions are used in the classroom, either brought to the author’s attention by their students, or found during the evaluation of student responses. But, through an interactive process of revision and reimplementation, an instructor can take this feedback and improve their bank over time until it becomes quite stable from course to course.

CheckIt’s Socio-technical Infrastructure

To produce such exercises, each outcome requires authors to write two files with the first file being a template. Templates are written in a markup language called SpaTeXt, an adaptation of the popular PreTeXt language used to author interactive and accessible textbooks supporting multiple output formats (Beezer, 2018). In particular, templates support the production of both HTML (to support both standard webpages and learning management systems) and PDF documents (via the LaTeX markup language commonly used in mathematics publishing). SpaTeXt purposely provides little flexibility in terms of typesetting and formatting; in addition to ensuring that content written in SpaTeXt will appear as intended in multiple output contexts across the web and print, a central goal of the language is accessibility. By giving instructors a limited number of markup tags to describe the semantics of their content, the SpaTeXt software can reliably convert written content into output that is either screen reader friendly or ready to be embossed as tactile braille code.

To produce randomized exercises, the popular Mustache templating system (Wanstrath, 2009) is used; the notation {{variable_name}} is used as a placeholder for a randomized variable to be inserted into the exercise. Likewise, block tags {{#variable_name}} {{/variable_name}} are available either for looping through iterable objects, or showing/hiding certain content based upon boolean values. This templating system balances simplicity with flexibility for a wide range of randomized exercises, as we will see in the case studies below.

The values for these variables are then produced by way of programming a generator, a short script that describes how to randomly generate the variable content for each version of the exercise. Given CheckIt’s primary (but not exclusive) application for creating question banks in computational disciplines, most generators are written in SageMath, a variation of the popular Python programming language designed for mathematics. A design consideration for CheckIt is that to disseminate a bank, generators are pre-processed on the author’s computer. As a result, the randomized elements are distributed as a static file that can be read by any web browser, whether or not the end user has the author’s chosen programming language installed on their computer. In contrast with other frameworks for writing code to generate randomized exercises built for specific programming languages, this creates the potential for authors to write their generators using the programming language most appropriate for their discipline.

There is no centralized host for content written for CheckIt; the homepage https://CheckIt.clontz.org links to the software, a demo bank, and a few featured examples. The software provides maximum freedom for authors to create and deploy banks for their individual classrooms, without requiring any moderation or approval, which allows CheckIt to be adopted for use in novel and unanticipated applications (see (Ekenseair, 2022 for an example in chemical engineering). Due to the minimal requirements for deploying a static website to the web, authors can freely distribute their banks on the web host of their choice; many choose to use GitHub’s built-in Pages service (Visconti, 2016). Once a bank is deployed to a web host, any instructor with its web address can use the Assessment Builder feature to generate customizable PDF quizzes using only a web browser. CheckIt also exports to standard learning management systems (e.g., Canvas, D2L Brightspace, and Moodle) via the QTI protocol, eliminating the need for instructors to add-on extra software or infrastructure to their digital learning environments for online assessments.

Figure 1

Figure 1. An example product rule outcome included in the CheckIt demo bank, and its appearance in the bank website, a printable quiz, and the Canvas LMS. See the appendix for listings of relevant code snippets.

As a free and open-source technology maintained by a single developer, there is no formal support mechanism for bank authors or instructors. Instead, a self-supporting community of users and contributors has developed as a sub-channel of the Alternative Grading Slack community. Additionally, the development of CheckIt is done in public via its GitHub repository (Clontz, 2023), which includes an issue tracker and publicly editable Wiki where community members can make requests and collaboratively document its use. The open-source MIT license for CheckIt’s software allows any instructor to freely create and use CheckIt banks. This socio-technical infrastructure connecting CheckIt’s humans and technologies is essential for maximizing the software’s impact within classrooms and education research.

Case Studies

Case Study 1 – Regional Public University (In this section, “I” refers to the first author.)

CheckIt was originally developed for use in rapidly generating outcomes-aligned exercises in my sophomore level linear algebra and differential equations courses at a regional public university in the southern United States. During my initial implementation of outcomes-based assessment, I invited students to attend weekly office hours and request re-assessment of individual learning outcomes. To give students maximum flexibility for demonstrating their learning, I did not put any limits on when students could request reassessment for learning outcomes. This led to several office hour sessions with almost a dozen students filling the waiting area; waiting for their opportunity for me to requiz them on topics they were unable to demonstrate learning of during regularly scheduled assessments.

For many of these outcomes, I found it troublesome to find examples that did not already appear in students’ textbooks, were sufficiently aligned with the desired learning outcomes, and were reasonable to create on demand for each student individually. Furthermore, while an erroneously posed question can be fixed on the fly during office hours, when I included such mistakes within regularly scheduled quizzes, it was problematic. Such mistakes included creating a question that did not have a reasonably found answer in the context of a timed quiz, or a question that could be solved without demonstrating the desired learning outcome. In a points-based grading system, such questions might be skipped, or bonus points might be awarded to compensate for the mistake. But in my outcomes-based grading system, the result was a waste of everyone’s time, as I still needed to collect evidence of student learning that was not possible given the invalid prompt.

To address this, I wrote an ad hoc script (the first iteration of what would become CheckIt) for each learning outcome that generated LaTeX markup for each exercise when executed on the command-line. These exercises were saved to a file and then imported into a custom gradebook web application I developed to track student progress towards each learning outcome. This provided me the ability to generate individual LaTeX quizzes for each student based upon their progress; the custom app contained a feature where I would choose the learning outcomes I intended to assess, check against each student whether they had already demonstrated sufficient progress for each outcome, and if they had not, included an appropriate question on a personalized quiz. These LaTeX files were then processed into PDFs and printed with each student’s name automatically listed at the top of their page.

When colleagues at my institution requested the ability to similarly generate their assessments, the limitations of such an approach became quickly apparent. For one, these colleagues did not share my software engineering background, requiring me to improve the user interface and handle certain tasks for them. Furthermore, while I could responsibly handle my colleagues’ FERPA-protected student data as a fellow faculty member, I did not have the infrastructure to securely support student data external to my institution, limiting the impact of my efforts. To that end, CheckIt was eventually packaged into the form it exists in today. This form includes a web-based graphical user interface that does not require knowledge of the command line or LaTeX to generate printable outcomes-aligned assessments from existing banks, and which is functional without needing to store or transmit any FERPA-protected student data.

In 2020, the COVID-19 pandemic quickly revealed the necessity for these randomized assessments to be deliverable in forms other than a printed page. While several solutions for delivering randomized mathematics exercises exist, these solutions infrequently have an outcomes-focused approach and generally are not part of the basic learning management system offered to mathematics faculty. To address this need, I added QTI exercise bank support to CheckIt, which allowed instructors to deliver randomized exercises through learning management systems (e.g., Canvas, D2L Brightspace, or Moodle quiz) without needing to integrate additional services or plugins. These quizzes can be made available to students on-demand by logging into their LMS, with a new random version of the exercise delivered with each attempted quiz.

CheckIt also provides support for research studies on active learning such as (Lewis & Estis, 2020). To allow participating instructors across the country to generate unique yet strongly aligned exercises for each of their students, which could be compared for use in measuring student learning across different institutional contexts, CheckIt banks were authored and provided for use in participant assessments. Student responses to these questions were collected as part of these studies. While each exercise is unique to the assessment it was included on, the response data collected remains aligned to the given outcome, which allows for a reasonable analysis of student responses across exercises.

Case Study 2 – Large Public Hispanic-Serving Institution (In this section, “I” refers to the second author.)

A large public university in the western United States uses the CheckIt system to do assessments for a Quantitative Reasoning with Statistics course. This coordinated course with over 1,000 students per year, is broken into individual sections of up to 25 students. The course is taught by an instructional team of 10–15 instructors, led by a coordination team of 2–3 instructors (including the second author). As an urban Hispanic-serving institution with a student population containing high levels of first-generation and Pell grant eligible students, there is a deep institutional commitment to equity and meeting students where they are upon entering college. Most students take this course during the first semester of their freshmen year. The instructional team for this course designed the course from the ground up around Clark and Talbert’s (2023) four pillars of alternative grading: clearly defined learning outcomes, helpful feedback, marks indicating progress, and reattempts without penalty. As an instructional team, we were also concerned with balancing the needs of student equity and academic honesty. The adoption of CheckIt to deliver assessments was a critical component that allowed us to accomplish all of these goals.

To utilize CheckIt, my coordination team begins by writing an assessment problem that is timely and relevant in its content, reflecting current events in areas such as politics, world events, and human health. The assessment problem is then coded into CheckIt with variations not only in the numerical components of the generator, but also in real world language in the template. For example, a question about a recent poll on American attitudes about guns could have versions that include “support for extended background checks” or “do not support raising the minimum legal age to own a gun to 21”.

This course is designed with the goal of helping students learn to critically consume the data they are given, particularly data given in the media. Therefore, we strive to avoid introducing misinformation in the assessments. The blending of CheckIt’s ability to utilize SageMath to generate randomized mathematical values with its language generation capabilities allows us to use simple coding to match the random mathematical values to language options. For example, we often use current polling information from major polling organizations such as Reuters or Quinnipiac University. If the current president has an approval rating of around 45%, we will not generate random mathematical values that vary too much from the actual approval rating. CheckIt first selects from the given language options such as “% approve of the president’s handling of a certain topic” or “% disapprove of the president’s handling of a certain topic” and then will randomly generate a mathematical value within a given range, based on which of the language options was randomly chosen. If the actual value is 45% approve (and therefore 55% disapprove) then the range available might be 42% – 48% for approve and 52% – 58% for disapprove. These options are built directly into the generator and require very little coding skills.

Given that underlying information about current events change as time passes, instructors must spend a few minutes each semester adjusting the assessment problem in the CheckIt generator. These adjustments allow us to maintain the relevance of the problem and avoid issues that often plague problems where data and information rapidly becomes outdated.

Once the generator and template are created, the course coordinator then has the CheckIt system generate many versions of the problem (we typically use about 1000) and the coordinator exports the entire group of versions to a file compatible with LMS uploads. In our case, the coordination team uploads the bank into Canvas to create an assessment. This workflow supports the following needs of the course:

  1. The system integrates with the Canvas learning management system (and several other LMSs) to provide asynchronous options for taking the assessments to accommodate students with complex needs such as work schedules, family responsibilities, large commute times and heavy course loads.
  2. The system provides instant access versioned answer keys to instructors by marrying the specific answer keys to each version of the question in Canvas’s grading tool (“speedgrader”), allowing instructors to see the specific answer key for each version in the context of grading a single student’s submission.
  3. The “classic” quizzes in Canvas allow for proficiency scales to be used instead of points-based feedback as well as having room to provide written feedback by the instructor.

Each year the instructional team updates existing assessment questions to ensure they remain timely and relevant and adds new problems to the system. As assessment questions age and fall out of relevance, they are converted to become “practice problems” for students to use to become accustomed to the style and content of the assessments. The high number of versions for the practice problems encourages students to work to understand their version, rather than just copying from the instructor or another student. They also learn about the need to do a very close read of assessment problems, as versions are often different in very subtle ways.

Since the implementation of the use of CheckIt in the Fall semester of 2022, over 3,000 students have taken over 75,000 quizzes generated from CheckIt problem banks. This has been a tremendous help in minimizing the grading load for the entire instructional team, providing additional meaningful feedback to students in an automated fashion, and limiting academic misconduct in an asynchronous online quizzing environment. We have found that using CheckIt provides consistency in the difficulty of problems across multiple assessments and is a cohesive tool that is accessible and usable by all the instructors on the team.

Conclusion

In summary, the use of the flexible CheckIt framework has enabled the implementation of the assessment and reassessment workflows recommended for outcomes-based grading models, without requiring any particular LMS or paid software service. However, it does not come without its limitations.

While generating exercises from existing banks hosted online is straightforward, one shortcoming of the platform is the non-trivial comfort with technology required to set up and use the framework to code generators and templates as an author. This will be addressed in future versions of CheckIt, by developing infrastructure that allows authors to generate and deploy their banks on GitHub.com using only their web browser.

Additionally, current technology does not allow for rich randomization of the narrative prompts associated with each generated exercise (particularly word problems) without an amount of author labor on par with manually authoring such exercises. It is possible that a Generative Pre-trained Transformer (akin to ChatGPT) could generate variations on these prompts and minimize the formulaic presentation of generated exercises, but this has legal and ethical implications (O’Brien, 2024); as such, it has not yet been explored.

A final point of friction is the general lack of infrastructure for developing small open-source educational technologies. While there are open-source software products in the education space (e.g., Sakai and Canvas), they are not amenable in practice to the development of small “widgets” by independent developers. Even among instructors who have sufficient knowledge to write scripts that achieve an educational goal, there is minimal training or support for developing such scripts into full-fledged applications. Without such a conversion, these scripts are not appropriate for use outside of a particular instructional context, and instructors have limited ability to discover these solutions and implement them in their own classrooms. This situation is exacerbated by the essential but non-trivial requirements for securely handling FERPA-protected student data. To this end, Runestone Academy and its PROSE Consortium initiative (Runestone Academy, 2023) are working towards building an ecosystem that enables the participation of a distributed community of educational technology contributors to build up free and open-source solutions for education. CheckIt is not part of this consortium, but CheckIt’s maintainer is on the PROSE Advisory Council. Future work of the consortium will explore how new technologies like CheckIt can and should be officially incorporated into this ecosystem.

References

Beezer, R. A. (2018). PreTeXt. Balisage: The Markup Conference.

Blum, S. (Ed.). (2020). Ungrading: Why Rating Students Undermines Learning (and What to do Instead). West Virginia University Press.

Campbell, R., Clark, D., & OShaughnessy, J. (2020). Introduction to the Special Issue on Implementing Mastery Grading in the Undergraduate Mathematics Classroom. PRIMUS, 30(8–10), 837–848. https://doi.org/10.1080/10511970.2020.1778824

Clark, D., & Talbert, R. (2023). Grading for Growth: A Guide to Alternative Grading Practices that Promote Authentic Learning and Student Engagement in Higher Education (1st ed.). Routledge. https://doi.org/10.4324/9781003445043

Clontz, S. (2023). CheckIt [Computer software]. https://github.com/StevenClontz/checkit

Ekenseair, A. (2022). Randomly Generated Assessments with Achievement-Based Grading in Chemical Engineering. 2022 AIChE Annual Meeting. https://www.aiche.org/academy/conferences/aiche-annual-meeting/2022/proceeding/paper/107b-implementation-achievement-based-grading-chemical-engineering-core-classes

Elsinger, J., & Lewis, D. (2020). Applying a Standards-Based Grading Framework Across Lower Level Mathematics Courses. PRIMUS, 30(8–10), 885–907.
https://doi.org/10.1080/10511970.2019.1674430

Gage, M., Pizer, A., & Roth, V. (2002). WeBWorK: Generating, delivering, and checking math homework via the Internet. Proceedings of 2 International Conference on the Nd Teaching of Mathematics, Hersonissos, Greece.

Lewis, & Estis, J. (2020). Improving Mathematics Content Mastery and Enhancing Flexible Problem Solving through Team-Based Inquiry Learning. Teaching & Learning Inquiry, 8(2). https://doi.org/10.20343/teachlearninqu.8.2.11

Nilson, L. B. (2014). Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time (1st ed.). Routledge. https://doi.org/10.4324/9781003447061

O’Brien, M. (2024, January 10). ChatGPT-maker braces for fight with New York Times and authors on ‘fair use’ of copyrighted works. AP News. https://apnews.com/article/openai-new-york-times-chatgpt-lawsuit-grisham-nyt-69f78c404ace42c0070fdfb9dd4caeb7

Owens, K., Krinsky, S., & Clark, D. (2020). How We Moved A Conference Online. MAA Focus, 40(4), 14–17.

Runestone Academy. (2023). PROSE Consortium. https://prose.runestone.academy

Visconti, A. (2016). Building a static website with Jekyll and GitHub Pages. Programming Historian, 5. https://doi.org/10.46430/phen0048

Wanstrath, C. (2009). Mustache Logic-less templates [Computer software]. https://mustache.github.io/

Wiggins, G., & McTighe, J. (2005). Understanding by design. Ascd.

 

Appendix

<?xml version=’1.0′ encoding=’UTF-8′?>
<knowl mode="exercise" xmlns="https://spatext.clontz.org" version="0.2">
<content>
<p>Explain how to find the {{d_synonym}} <m>f'(x)</m>.</p>
<p><me>f(x)={{f}}</me></p>
</content>
<outtro>
<p><me>f'(x)={{dfdx}}</me></p>
</outtro>
</knowl>
Listing 1. Template file for a product rule learning outcome.
class Generator(BaseGenerator):
def data(self):
x = var("x")
factors = [
x^randrange(2,10),
e^x,
cos(x),
sin(x),
log(x),
]
shuffle(factors)
f = choice([-1,1])*randrange(2,5)*factors[0]*factors[1]
variant = choice(["derivative", "rate of change"])
return {
"f": f,
"dfdx": f.diff(),
"d_synonym": variant,
}
Listing 2. Generator file for a product rule learning outcome written in SageMath (Python).

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Open Educational Resources for and as Assessment Copyright © 2025 by Utah State University is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.