What is the purpose of Cert?

Can I ask a broad question?

What is Cert for? When we zoom all the way out to ask not, “how can we improve Cert?” but “why do we have Cert?” what is our answer?

I know this is the issue at least partially addressed by the survey that went out last year, but I want to raise the question in light of recent converstions.

In thinking about it, I can come up with three reasons, which overlap somewhat.

  1. Cert exists to assist staffing of games, by allowing organizers to have a calibrated way to compare officials.
  2. Cert is a motivational tool that gives officials graduated goals to work for.
  3. Cert is a way for WFTDA to reward dedicated officials with recogniztion of their efforts.

That’s what I’ve got. There’s other ways to arrange these, like “improved standardization via education requirements,” but that’s essentially just a combination of 1 and 3.

But at its core, I think Cert exists to assist staffing, provide goals for motivated officials, and recognize performance.

In theory, the more levels Cert has, the better it is at those tasks - people doing staffing have more information, officials have new Pokemon to collect, and WFTDA can recognize champs-level officials differently than folks just climbing on the ladder.

BUT… all of those benefits of a graduated system only hold if the ratings are accurate. If the ratings aren’t accurate, then whatever sources of innacuracy exist will introduce systemic bias. If you’re LUCKY that systemic bias isn’t also correlated to gender identification, income, geographic location, etc… but of course it is.

Cert is clearly aware of, and attempting to address, this issue. Every announcement, including the latest one, is an attempt to feed the beast with ever more information, in ever more granular detail, so that the goal of perfectly accurate, unbiased, and consistent evaluations can be reached in all cases.

However… I don’t think it can work. I think even THREE levels demands so much data that the organziation and its members simply don’t have the ability to keep up.

And fundamentally - it doesn’t matter HOW much data you feed into Cert. It’s still data provided by, and evaluated by, people. It’s going to be massively subjective, it’s going to be biased, and each new level exponentially increases the effect of that bias.

So what’s the answer?

Well - here’s ONE suggestion, which is definitely different than how I felt four or eight years ago:

Dump cert levels entirely. Officials are certified, or they aren’t.

Cert should be simple and not terribly onerous to obtain - officials would still have to pass the LMS courses, and they’d have to officiate a certain number of regulation games.

That’s it. Cert would mean “basic competence” and that’s all.

What about losing the benefits of being able to tell a level 2 official from a level 3?

I would argue that we don’t have those benefits NOW.

The system is not percieved as accurate. And we likely won’t able to MAKE it accurate (or percieved as such) in the future either, for the reasons outlined above: it would take too much data, and it’s inherently too subjective.

To some extent, wasn’t this the motivation for dropping different colored patches in the first place? If the rationale behind that decision holds for patches, (and I am increasingly convinced that it does) then it holds for the certification itself even MORE so.

Dropping to only one level reduces the amount of information available to those doing staffing, but increases the ACCURACY of that information. It doesn’t help at all with ongoing motivation or recognition, sadly, but I’m not sure there’s any way to fairly do so.

I want to close by saying that I genuinely appreciate ALL the work that has been put into cert over the decades. Old cert, new cert, cert oversight, the committee that designed new cert, the panelists, the feedback editors, the clerks, the hundreds of officials, skaters, and coaches who have spent tens of thousands of hours writing feedback. That was, and is, a LOT of work. Thank you. But I think it may be time to say, “enough.”

21 Likes

I love this idea. As someone who spent multiple tens of hours writing evals and OOS’s in 2024 (not to mention all the upfront references for Champs and World Cup), and recieved a single eval in return the current reliance on volunteer labour is beyond frustrating. I also have better things to do with my limited free time than hassle and chase up friends and acquaintances after events who also have their own lives and responsibilities to contend with, and whom I need to maintain relationships with for future events.

I recognise all the work that has gone into improving cert already, and as someone stuck in the antipodes the current system is already more fair and accessible than the previous system. However if it’s not delivering the results we want, especially given the advertised expecations which will result in a dramatically increased workload for an already limited volunteer group (both Oversight and officials in general) then I agree further consideration of purpose is needed.

3 Likes

I also agree this is an excellent idea. I’m not certain that levels themselves are particularly useful pieces of information. As mentioned above, the ratings are not accurate. People often get Level 1 due to there being not enough evidence, rather than it being a true reflection of what level they are operating at. I’ve been a Level 1 NSO for over five years now, and in all that time I’ve only ever received just one evaluation. Between that and the financial barrier to access the advanced learning courses (which you must pay for knowing you may not achieve higher levels due to lack of evaluations) I’m unlikely to even apply for higher levels any time soon. The recent clarification does nothing to help the situation, as I rarely work with those who that clarification applies to.

Finding out someone is level 2/3 is of limited value anyway. Level 2 requires “advanced understanding and consistent execution of at least one role”, while Level 3 requires them to be “exemplary in at least one position”. The certification however gives no indication of what that focus is in. I’d suggest that if Levels 1-3 were combined into a single Certified status, new Specialism Awards could be added instead with there being one for each role. This would be much more useful information, and it would streamline the amount of evals and other feedback would be needed per Official as it would be focused.

2 Likes

Just to add some current and historical context, about where Cert has been and where it’s going. These are very important questions, and Oversight routinely discusses these issues and where we stand and where to go with it. The BoD put out the survey recently to help get opinions, and we invite more opinions in this thread.

A couple historical notes, which aren’t really responses to anything, but might provide some context:

  • We did discuss going to “just one level” when we wrote version 3 of cert back in 2016. It WAS the motivation for dropping different colored patches, and it was the motivation for reducing the number of levels from 5 to 3. We stuck with 3 because we felt it was important to differentiate officials in order to support “staffing officials you haven’t met.” Then, as now, we see the two biggest determinators of “do you get staffed” as “how much have you already done” and “who do you know.” So without cert, most staffing was (is?) either rich-get-richer or old-boys-club. But is Cert adding anything? Only if staffers trust it (see below).

  • We are currently discussing being more explicit about roles, to say something more specific about whether an official is adequate, excellent, or exemplary in a role. This used to be called an “endorsement” and it could be a way to make certification more useful. And probably help staffers trust it. However…

  • “Accuracy” has two components. To be precise with language, “accuracy” is different from “precision.” A level is accurate if the person meets the qualifications for that level, and is inaccurate if they don’t. So Cert is “inaccurate” if we over-certify people. A level is precise if the person meets the qualifications for that level but not the next level. So Cert is “imprecise” if we under-certify people. My view is that Cert is viewed as highly accurate but woefully imprecise. I also think that’s the truth when I see someone on the track and look at their level. “They don’t deserve their level” is almost never a thought I have. “they should get a higher level” is a thought I often have.

    But for the strong accusation of, “the system is not perceived as accurate.” Do you mean the system is perceived as inaccurate, or imprecise, or both?

I’m not sure that I agree with your definition of accurate vs precise (edit: I agree with Smasher’s definitions below), but yes I agree with this assessment of the current situation. I think it’s rare for someone to receive a higher level than they can actually perform at, but quite common that they receive a lower one.

There is of course also a huge number of officials that are completely uncertified acting at all levels. Given the number of favours you need to ask in terms of OOSs and Evals, even getting cert in the first place can still be a little bit of a “rich get richer or old boys club”. It’s the same with upgrading levels once you’re in. I feel that improving accessibility of cert overall should be a priority, and things like levels and the amount of feedback needed do not help at all.

2 Likes

But for the strong accusation of, “the system is not perceived as accurate.” Do you mean the system is perceived as inaccurate, or imprecise, or both?

I understand what you are asking, but I will disagree with your definitions of “accuracy” and “precision.”

Your definitions as I understand them:
Accurate: An official is accurately certified if they are not over-certified
Precise: An official is precisely certified if they are not under-certified.

That is not how I understand these words, either colloquially or in a scientific context. PRECISION is the degree of specificity given for an evaluation. If we had ten levels, a level 2 would be a much more precise evaluation than if we had 2 levels. “80% Fresh” is more precise than “4/5 stars.” The more levels you have, the more precise a given evaluation is.

ACCURACY is how closely an evaluation measures the person’s real skill. If we granted level 3 cert to a referee who could barely fasten their skates and couldn’t explain how points are scored, that would be inaccurate. If we refused to award cert at all to, for example, Ump, that would ALSO be inaccurate.

Neither inaccuracy nor imprecision is necessarily correlated to rating people too high or too low. You can have inaccurate high ratings.

“Precision” is a fixed characteristic of the rating system - it doesn’t apply to individual ratings. Every official in a three level system is rated with the same precision.

So to answer your question - the system is perceived as inaccurate for certified officials. There is little confidence that officials who are in the system are rated at the correct level.

Also a problem is that since the system is (by definition) useless for uncertified officials, the more officials who refuse to participate in the system, the less useful it is for derby overall. If a critical mass of officials look at the system and think “This system is too much work, so I’m not going to bother” or “This system is inaccurate, so I’m not going to bother”, then its utility degrades even further.

So to restate my original post using this language: In order for a system defined to the current level of precision to be accurate, it would require far more time, effort, and buy-in that the members of the organization have available. And an inaccurate system is arguably worse than no system.

1 Like

Just realized I didn’t answer the OTHER question you asked, which is “Do I think the inaccuracies in the current cert system are generally skewed to under- or over- certification?” I agree with both you and Twixxi - I think inaccurate certifications are far more likely to be too low than too high.

And every uncertified champs level official is an example of this. (Of which I can name quite a few.)

Assuming your definitions, I agree with this. But a system with the level of precision that we can reasonably expect Cert to reach will always be of quite limited use for staffing decisions:

So the situation where Cert can help people to get staffed is if someone’s actual level is higher than what their CV would suggest and no references are available to add that information. But officials in this position are also the least likely to have the correct Cert level for exactly the same reasons that hinder them being staffed - the games and people that could get them evals and OOSs are the same games and people that get into their CV and write them references.
And in the other direction it is so much more likely that an official’s level being below what their CV suggests is an inaccuracy of the level rather than an indication of the official’s actual performance that it will just be discarded.

In general: The concept of Cert is doomed to fail for the purpose of helping staffing decisions on a large scale. Cert collects a lot of detailed information about an official and then condenses this into 2 bits of information. But when making staffing decisions we will need some subset of the detailed information. Which subset varies based on the staffing decision. It is mathematically impossible to get the information needed for all the various staffing decisions encoded into 2 bits. (And if endorsements turn this into 16 or so bits that doesn’t change much.)

What I believe would be more useful and less work for all parties involved would be maintaining a (moderated) repository of evals/references. (Moderated meaning that entries get removed once they are no longer useful and that access is controlled.)
This can be combined with a baseline Cert that is issued when someone passes the test and has a first eval/reference that doesn’t amount to “do not staff”.

2 Likes

Cert collects a lot of detailed information about an official and then condenses this into 2 bits of information (@speedy)

  1. Cert exists to assist staffing of games, by allowing organizers to have a calibrated way to compare officials.
  2. Cert is a motivational tool that gives officials graduated goals to work for.
  3. Cert is a way for WFTDA to reward dedicated officials with recogniztion of their efforts. (@adamsmasher)

Something huge is missing here – Cert also provides the only channel through which an official can seek and receive feedback from other officials via confidential channels. It’s not “two bits of information,” it’s two bits plus two or three pages of summarized feedback.

The level granted is an “outcome” of the feedback received and passed back to the officials. The feedback is the bulk of our panelists’ work, since the level is usually pretty obvious once the feedback is collected and synthesized. (When we hire new panelists, this is the first thing we tell them – granting levels isn’t the job, the job is to write summaries and then explain the level based on the summary.) We routinely get positive feedback from officials about their feedback, even when it doesn’t result in the level they applied for, so I think this aspect of Cert is pretty successful.

I’ll also note that the WFTDA is trying to bootstrap a new committee to manage that aspect, or to sort of move the “feedback” part out of Certification, so that feedback is devoid of “judgment” (levels). I truly hope they succeed, and that will change Certification pretty fundamentally, and we’re ready for that. Eager, even! But in the meantime, this is probably the most important service we provide.

an attempt to feed the beast with ever more information

I can’t really tell if this is meant to be an insult, but it’s a pretty gross misrepresentation of the history of the committee. It has been eight years since we did anything to change the “amount of information” we receive, and the reason we did it is that the amount of feedback coming in has decreased post-covid. (We also tried to improve the quality without changing the quantity, by releasing our best-practices webpage and by releasing Google Docs for OOSes.)

We have operated pretty effectively as-designed but as derby recovers, many people have @spanners experience in that they aren’t getting the evals they expect. We think it’s because the expectations have become murky, and it does reduce the feedback we are able to provide.

The “beast” is hungry because it’s grown thin from being fed slimmer rations than it was designed for. It wants a normal dinner. If that’s not possible then yes we will need to figure out other solutions, but it is not clear to us right now that it is “not possible.”

But like… I hear everyone else too – people do not enjoy providing written feedback for each other. Evals are not “fun” to fill out. Feedback is not “fun” to give. Especially when it’s not positive. It’s literal work to do this, and it’s literal work to synthesize three days of behavior by twenty-seven humans. And that makes it even harder to ask or to repeatedly remind people about. This is a huge concern and we are absolutely aware of it (even though it is a core expectation for leadership.)

The feedback provided here, about the disutility of awarding the “minimum” level we’re certain of, and the fact that Certification isn’t available for officials that don’t apply, is heard – Oversight will discuss this in our next meeting as well.

In general: The concept of Cert is doomed to fail for the purpose of helping staffing decisions on a large scale.

I think I am misunderstanding this concept since Cert has been effectively used this way for nearly fifteen years. TOSP’s 2024 P&P states that they used Cert. Uncertified officials at Champs does not conflict with this, either, because Cert is only meant to be one part of the equation. So…if it was doomed to fail, surely it would have failed by now? Or failed to have succeeded many years ago?

We are quite short of perfection, but imperfect is not failure. In many cases we have fallen short of our goals, but that is also not failure. If you see us as being on a downward trajectory, I’m not sure I understand why from your comments.

more useful and less work for all parties involved would be maintaining a (moderated) repository of evals/references

I personally love this idea – it is in line with what the Feedback Committee is being created to generate. Even sooner, we could start making Cert’s two-to-three-page summaries public, or public if an officials requests it. Of course, there are serious concerns with this as well, but, in addition to the disutility of minimum-cert levels, we also hear that Cert would be more helpful to the world if we produced more public information than we currently do.

(Thanks all and please continue to comment here with your thoughts!)

Cert was also promised access to at least part of the detailed information, which is a very different situation from the 2 bits that everybody else has.

I fail to see scenarios where knowing someone’s Cert level would help a staffing decision, beyond some edge cases. But I am open to being shown that I am just oblivious here.

Maybe a low-effort way of doing this is to put each summary into a google-doc owned and exclusively writable by a Cert account (so authenticity can be verified) and readable for everyone with the link, but the official is in control of distributing the link to parties they think would be helped by the info (so they control access to the data relating to them). (I may be missing other possible problems due to never having seen one of those summaries.)