Summary statistics and the Involvement Evaluation Cube

Andy Gibson has developed an Evaluation framework as a way of thinking about and evaluating public involvement. The tool is designed to be an easy, participative, and enjoyable way to evaluate the quality of public involvement.

Participants are asked to evaluate PPI (in a project) by positioning 4 sliders to answer questions relating to 4 axis. The questions and 4 axis (that participants need to choose a slider position for) are:

Q1) Is there just one way to be involved e.g. face to face meetings or can you contribute in different ways?)
One way to be involved → Many ways to be involved

Q2) A strong voice participates in discussions and influences decision-making. A weak voice participates in discussions but has little chance of influencing decisions? (Andy could you clarify if this is the recommended wording for this)
Weak voice → Strong voice

Q3) Who sets the agenda? To what extent are public concerns valued relative to organisational concerns?
Organisations concerns → Public Concerns

Q4) To what extent has the organisation decided to, and has actually, changed as a result of involvement?
Organisation resistant to change → Organisation has changed

In addition to positioning a slider for each of these 4 continua participants have the option of adding notes relating to each question.

The position (and colour) of the point in the cube relating to each slider and the notes can be used to enrich the conversation about the PPI project in question.

PPI and research project leads are likely to report on the evaluation data from PPI participants.

Reporting on notes and verbal comments provided by participants is straightforward, however particular consideration is needed when reporting on the individual and aggregate positioning/data of participants’ choice on the slider relating to each of the 4 axis.

The slider position creates a numerical value that is available to researchers. Given that the data collected relates to ordinal data on relatively abstract and subjective scales, these numbers could be more misleading than useful, particularly if aggregated.

It would be technically possible to create an aggregate score on each axis and plot a group aggregate / average point in the cube. This discussion space is for consideration about how, if at all, to aggregate / produce an average from the data; how to express this, if it is possible (without losing the integrity of the model); and any other thoughts about how to create summary statistics relating to the 4 dimensions.

Thoughts welcomed…


just to give some background on the creation of this channel, ARC West have graciously presented some funding to support the development of this digital tool, and as we went through the view process for that work package, Zoe suggested that it could be of significant value to present some kind of summary statistics around the numeric data collected in a cube and perhaps through sequences of cubes applying to the same domain.

This space has been created to facilitate an async conversation over three months or so to determine what kinds of summary statistics may be of use.

Hi John,

Thanks for the initial post. Just a couple of points from me. As John says the data is ordinal i.e., there is a kind of ordering e.g. from strong to weak, or few to many etc. but there are no measurement units and the ordering is subjective. Therefore I think it would be incorrect to calculate means, but I think calculating the range of scores, the mid point and the most popular scores would all be OK. Also seeing if there is any correlations between scores on differing dimensions would be interesting. Even so I’d be careful about using things like percentages with small numbers as this can be misleading. The cube can work with large numbers, so in these cases these functions would be more valid. I’d be interested to hear what others think.

The other point to raise is that I think, in some ways, what is most important is the discussion that the recorded comments and scores give rise to, i.e. the exchange of perspectives and reaching agreement on ways to move forward that is important. .

Interested to hear what others think…

Hello everyone

Interesting to read these other posts. So it sounds like summary stats wouldn’t be useful for smaller evaluations - like PPI evaluation which I guess was the original intention for the system. But I can see the framework being useful for other types of evaluation - for example, events feedback or basically anything where you’d want to measure three things on a sliding scale. In this context, summary stats would be super useful… Or maybe I’m trying to make the framework do something it wasn’t intended to in the first place?


Hi Zoe,

I think you are right. I think currently we are moving towards a lot more community involvement as opposed to involvement from groups of individual patients. I think the Cube could be used to evaluate community events which involve larger numbers of people and where summary statistics would be useful

Oh well that’s good to know! I think there’s so much potential for the Cube to be an evaluation tool beyond public involvement. So it makes sense to build in functionality that supports those other uses now I guess.

What would Chris and John need to know from us to make it a reality?

I would definitely agree with you Andy, that as this is ordinal data it’s not possible to calculate means or measure the scale of change or difference, as you can’t say categorically a score of 8 is twice as high as a score of 4, it’s just higher. But yes, mode and median can be given. But actually at the moment, I would say it’s more qualitative as there are no numbers included are there? (I’m not that familiar with it so perhaps I’m wrong). So if you wanted to calculate any kind of data about popular scores etc you would need to include a numeric scale on each axis of the cube and that might change the nature of the whole thing for users (or not! – is it worth getting some feedback on that?) I can see there might be benefit in using it to generate summary statistics for evaluation but I do wonder if this would change the way people interact with it and I’m not sure if that’s what you intended when you designed it, especially within PPI. And there’s a danger that once you put a scale on it people will make assertions about things like a score having doubled etc even if you warn them that this can’t be claimed. Without numbers, people will stick to more descriptive claims like ‘at the second meeting the voice of the group was felt to be stronger’ – or whatever which might be sufficient in this context. All this doesn’t stop different cubes being designed for other types of evaluation that have a numeric scale on them if that suits the purpose.

Thank you.

Creating a ‘mean’ value in the spreadsheet is super straightforward, but there are quite a few design questions about how to display an overall mean point in the Cube visualisation, with or without range (diameter of the point and colour saturation could show the range - the actual points could be also visible under or alongside the mean point.)

– anyway while this is interesting and doable, for now, and with this Involvement Evaluation framework it seems not important, verging on a dangerous way to look at the value of the framework.

Seems worth waiting for other 3 or 4 dimensional frameworks the Cube could be used for before taking time to design/pilot/agree agregation frameworks.

I think the key near future challenge with reporting on the data from the tool is to be clear on what should not be done with the ordinal data, and how to make the best use of it while not compromising the integrity of reporting.

@AbbyS provides a good example - "descriptive claims like ‘at the second meeting the voice of the group was felt to be stronger’ – or whatever which might be sufficient in this context."
Recomending something like this and perhaps encouraging researchers to weave in qualitative comments from the notes part of the tool, and conversation seems advisable.