Thursday, June 25, 2015

Problems with Common Core and EngageNY -- statistics edition

Last week we came back to the topic of Common Core standards thanks to this extraordinary post by Gary Rubinstein which uncovered some appalling quality control issues.

The problems involved EngageNY (arguably the gold standard in CC based lesson plans). Rubinstein focused on algebra; I decided to check out the sections on statistics. What I found was uniformly bad. I’m going to focus one section [Lesson 30:  Evaluating Reports Based on Data from an Experiment], but the general concerns apply to all of the sections I looked at.

When explaining a highly technical subject to younger students, we sometimes go too far in an effort to smooth off the edges. We lose precision trying to stick with everyday language and we leave out important details because they greatly complicate the picture. When we try to communicate scientific concepts, there will always be a trade-off between being accurate and being understandable.

This is invariably a judgment call. What's more, it is a judgment call that varies from subject to subject and from audience to audience. We can argue about where exactly to make the cut, but we can’t really say one position is right and the other is wrong.

That’s not what we’re talking about with EngageNY. The authors like to throw in impressive-sounding scientific language and wordy constructions but not in a way that makes the writing more precise.

For example:
Students should look to see if the article explicitly states that the subjects were randomly assigned to each treatment group.  This is important because random assignment negates the effects of extraneous variables that may have an effect on the response by evenly distributing these variables into both treatment groups.
“[N]egates the effects of extraneous variables that may have an effect” is not a phrase that the typical high school student will find particularly informative, but this paragraph also manages to be not-quite right. That “evenly” seems to suggest that the distribution (rather than the expected distributions) of non-treatment variables will be identical, while the part about “distributing” variables just seems odd.

At best, these lessons are sloppy; at worst, they’re wrong. Take this for example:

Suppose newspaper reporters brainstormed some headlines for an article on this experiment.  These are their suggested headlines:

A. “New Treatment Helps Pericarditis Patients”
B. “Colchicine Tends to Improve Treatment for Pericarditis”
C. “Pericarditis Patients May Get Help”

7. Which of the headlines above would be best to use for the article?  Explain why.

Headline A would be the best because this is a well-designed experiment.  Therefore, a cause and effect relationship has been established.  Headlines B and C talk about a tendency relationship, not a cause and effect relationship.

“Tends to improve” implies a causal relationship, as does “help” in this context. The authors appear to have confused “causal” with “deterministic.”

The quality issues we see associated with the implementation of Common Core bear a striking resemblance to the problems noted by Richard Feynman when critiquing the New Math reforms of the Sixties.
The reason was that the books were so lousy. They were false. They were hurried. They would try to be rigorous, but they would use examples (like automobiles in the street for "sets") which were almost OK, but in which there were always some subtleties. The definitions weren't accurate. Everything was a little bit ambiguous -- they weren't smart enough to understand what was meant by "rigor." They were faking it. They were teaching something they didn't understand, and which was, in fact, useless, at that time, for the child.


  1. (cross-posted at West Coast Epidemiology blog)

    One perspective that I haven't seen yet in this debate (and the Feynman quote is a good example of what I'm criticizing) is whether these are fixable errors (which they are), and whether they will be fixed (which we can't know yet).

    For some reason in technology we're prepared to accept that initial products are often very flawed and only get usable/interesting after a few revisions.

    In the case of curricula and educational innovations, the conclusion seems to be: these are wrong and so the people who made them are stupid and so we need to throw them out and start over.

    Looking at the arc of educational reform in the last 50 years, I can't say that this approach has worked well.

    So I'd like to encourage in these posts to be slightly more constructive and consider: What would it take to fix this?

    1. Kevin,

      I am trying to work in more constructive posts (check this space for an upcoming MOOCs/MOO?s thread).

      That said, it is important to realize that we are not talking about betas here. EngageNY and CC in general are the product of a top-down, rapidly implemented system with no effective feedback mechanisms. This is why mistakes like "negative times a negative is a negative" can go uncorrected for so long.

      In other words, we need to get to the acknowledge-the-problems phase before we can get to the fix-the-problems phase.

    2. You seem to miss the fact that there are real children being harmed by this - those kids in school right now. Kids get turned off education if they slave away to learn stuff that gets them a "fail" when they take the test, that they lose trust in educators if the educators don't know what they are talking about. Education and the contest to get into the "right" college just becomes even more of process based on luck, of being on the right side of measurement error.

      So we forget about the kids who got the raw end of the stick by being born at a time when they got these poor materials because ... it might be able to be fixed, sometime, maybe, and for those schools who are able to afford the updated materials.

      (I seem to be failing at negotiating the "publish" process. Excuse the multiple similar posts, if the appear.)

  2. The analysis of randomised trials tends to be based on averages - the average in one group compared to the average in another. It means that a treatment found to be better in a randomised trial could be bad for some people but really, really good for others (e.g. men/women, different progression of the same disease).

    So my headline would be “Some Pericarditis Patients May Get Help”.

    (And I think I am saying something different to your causal/deterministic argument.)