Final project

Due by 11:55 PM on Friday, December 15, 2023

Background

Evaluation research is tricky and costly. If you begin an intervention or launch a study prematurely, you can waste time and money—and potentially come to conclusions that negatively affect students’ lives.

Even if you have a well-designed program with an impeccable Theory of Change, you might discover (too late!) that you forgot to collect some critical variables or realize that your identification strategy will not work.

From a more cynical perspective, you might (unethically) engage in the practice of p-hacking—running all sorts of different model specifications until you find the results you want, and then claim in your report that you had intended to run that model all along.

One increasingly popular method for (1) ensuring that your data and methods work before launching a study or intervention, and (2) declaring and committing to your hypotheses and methods, and models before analyzing your data is to pre-register your research or evaluation. A pre-registered study contains all the background work—an introduction, literature review, theory, hypotheses, and proposed analysis—but without the actual data. Authors post their expectations and hypotheses publicly so they can be held publicly accountable for any deviations from their proposed design.¹

Once you finalize your plan and know all the data you need to collect, and once you’ve written out the different analyses you’ll conduct, all you have to do is collect the real data and run the analysis to generate the results. In the results section of your report, you get to either say “As predicted, we found…”, or “Contrary to expectations, we found that…”.

Your assignment

For your final project in this class, you will write a plan for a quasi-experimental evaluation of an educational program that you’re interested in. Depending on your program, you may choose one of the following quasi-experimental methods you learned about: instrumental variables, regression discontinuity, difference-in-difference, or a strategy that exploits panel data.

You will work in groups of two to three students (I will assign you to your group). At the same time, do not hesitate to reach out to other classmates via Slack. Also, absolutely do not hesitate to ask me questions. I’m here to help!

Make sure everyone on your team is convinced the program warrants an evaluation. The program you would like to evaluate must be educational in nature, but you may define “educational” broadly. It must have a start date after September 2021. Also, the program should target at least 1,000 beneficiaries per year (e.g., 1,000 children), and you should be able to describe rules that determine how beneficiaries become eligible for the program. Finally, you must not copy a program’s existing evaluation plan or research grant proposal.

Throughout the quarter, you will submit three milestones via Canvas. Each milestone has its own due date. Then, for your final assignment, you will submit two things via Canvas:

A PDF of your evaluation plan (see the outline below for details of what this should contain). You may compile this with R Markdown if you like, but you can use any system you choose. You might want to write the prose-heavy sections in a word processor like Word or Google Docs and copy/paste the text into your R Markdown document since RStudio doesn’t have a nice spell checker or grammar checker. Either way, your document should have no visible R code, warnings, or messages in it (set echo = FALSE at the beginning of your document before you knit).
To the extent that you conduct any data analysis, such as statistical power calculations, a copy of your R project as a .zip folder. You can either run the analysis in RStudio locally on your computer or use an RStudio.cloud project.

Please do not pre-register your evaluation (e.g., on OSF) unless you have the opportunity, funding, and ethical clearance to actually conduct the study.

Your project is due by 11:55 PM on Friday, December 15, 2023. No late work will be accepted.

Resources

You might find the following registered report helpful. This is an example of an experimental evaluation, and it does not fully respect the outline I recommend below. Still, I hope you find this document helpful.

registered-report-2023

Suggested outline

Here’s an outline of what you’ll need to do. All word counts are suggestions, only. By the time you work on your final document, you will have already completed three related milestones. Still, don’t just copy/paste these assignments as is—you’ll want to polish them up for this final project. Also, consider addressing the following points your milestones did not cover.

How your evaluation relates to concerns of generalizability
Information on program costs
Multiple hypothesis testing
An introduction and a conclusion

Introduction

Describe the motivation for this evaluation, briefly describe the program to be evaluated, and explain why it matters for society. Go back to Chapter 1 of Impact Evaluation in Practice (from our first class) and justify why the program warrants an evaluation.

(≈150 words).

Context and intervention

Context

Provide information on the program context. Focus on (a) why this is a good context to run the evaluation you are proposing and (b) the main needs the program seeks to address.

As you review what you wrote for Milestone 1, take another look at our reading about the “Generalizability Puzzle”. If I’m someone who will read the results of your study, what “local conditions” do I need to be aware of to judge whether its findings may (or may not) generalize to a different context?

(≈300 words)

Intervention

Provide in-depth background about the program. Include details about (1) when it was started, (2) why it was started, and (3) what it was designed to address in society. If the program has yet to start, explain why it’s under consideration.

Also, make sure to cover how program beneficiaries are “assigned” or “invited” to participate in the program. For example, is program eligibility assigned in groups (e.g., entire schools, or all the students taught by a given teacher), vs. is it individual?

As you review your work from Milestone 1, try to also add a cost estimate per program beneficiary.

(≈300 words)

Program theory

Explain and explore the program’s underlying theory. Sometimes programs will explain why they exist in a mission statement, but often they don’t, and you have to infer the theory from what the program looks like when implemented. What did the program designers plan on occurring? Was this theory based on existing research? If so, cite it.

(≈300 words)

Theory of Change

Describe the program’s inputs, activities, outputs, and outcomes. Pay careful attention to how they are linked—remember that every input needs to flow into an activity and every output must flow out of an activity.

(≈150 words)

Use flowchart software to connect the inputs, activities, outputs, and outcomes and create a complete Theory of Change. Include this as a figure.

Hypotheses

Make predictions of your program’s effects. Declare what you think will happen. (≈100 words)

Provide a list and a hierarchy of hypotheses you want to test. Usually, you will prioritize overall program effects over potentially heterogeneous effects. Also, you will prioritize main outcomes over secondary outcomes. A table of hypotheses may be handy (but if you include one, make sure you discuss and explain the table in your text).

Measurement and data

Main and secondary outcomes

Select one of the program’s outcomes to evaluate. Explain why you’ve chosen this (is it the most important? easiest to measure? has the greatest impact on society?) (≈100 words)

What are secondary outcomes you consider along with the program’s Theory of Change? This will be activities, outputs, and intermediate outcomes. (≈100 words)

Measurement

Make a list of all the possible attributes of the main outcome. Narrow this list down to 3-4 key attributes. Discuss how you decided to narrow the concepts and justify why you think these attributes capture the outcome. Then, for each of these attributes, answer these questions:

Measurable definition: How would you specifically define this attribute? (i.e., if the attribute is “reduced crime”, define it as “The percent change in crime in a specific neighborhood during a certain time frame” or something similar)
Ideal measurement: How would you measure this attribute in an ideal world?
Feasible measurement: How would you measure this given reality and given limitations in budget, time, etc.?

(≈150 words)

Data

Given your measurement approach, limits on feasibility, and identification strategy, describe the data you will use. Will you rely on administrative data collected by a government agency or nonprofit? Will you collect your own data? If so, what variables will you measure, and how? Will you conduct a survey or rely on outside observers or do something else? What does this data look like? What variables does it (or should it) include?

(≈100 words)

Sample

Sampling and sample

What will be the study sample? How large will this sample be and how will you select it?

(≈100 words)

Power calculations

Ideally, conduct power calculations and report on your evaluation’s minimal detectable effect (MDE) given the sample size. Comment on whether this MDE is reasonable to expect. State additional assumptions you used for your calculations.

Independent of the analytical strategy you describe just below, here, you can simplify your analysis and conduct power calculations for a randomized experiment.

(≈100 words)

Analytical strategy

Identification strategy

How will you measure the actual program effect? Will you rely on Differences-in-differences? Regression discontinuity? Instrumental variables? How does your approach account for selection bias and endogeneity? How does your approach isolate the causal effect of the program on the outcome?

Describe how you will analyze the data using your identification strategy. For instance:

If you’re doing diff-in-diff, specify a regression model with an interaction term to show the diff-in-diff.
If you’re doing regression discontinuity, specify a model to check for a jump in the outcome variable at the cutoff in the running variable.
If you’re using instrumental variables, specify how you will check the validity of your instrument and run a 2SLS model.

Remember: Your plan should propose a quasi-experimental evaluation. You must not propose a (simple) regression-adjusted model, matching, or a purely qualitative evaluation strategy. Also, you must not propose a randomized controlled trial (unless you have received my written approval).

Provide at least one additional specification to plan for a (secondary) analysis of effects among a subgroup of program beneficiaries (e.g., effects among girls, if the program targets both girls and boys).

(≈300 words)

Threats to internal validity

Briefly describe what kinds of threats to internal validity you face in your study. How will your evaluation mitigate these threats?

(≈150 words)

Ethical considerations

How will your evaluation satisfy ethical considerations? Assume you are an academic researcher who needs to comply with university regulations (as you would have to if you conducted the evaluation as a UCI student).

(≈150 words)

Conclusion

What would the findings from this analysis mean for your selected program? What would it mean if you found an effect? What would it mean if you didn’t find an effect? Why does any of this matter?

(≈75 words)

Footnotes

See the Center for Open Science’s directory of preregistrations, AsPredicted list, and the AEA RCT Registry for examples of this in real life. Here’s one by me!↩︎