Three measures are commonly used to evaluate intervention methods: effectiveness, usefulness and the quality of the scientific evidence (see e.g. Flay et al. 2005; Kellam & Langevin 2003). In the Finnish classification model, these three aspects are broken down into further smaller categories, each with their own scale. The overall evaluation or classification is a combination of all of these. Of course, some of the featured interventions may not have been subjected to previous scientific scrutiny. They may still be user-friendly, effective and high in quality, at least to some extent (Marklund et al. 2012). As a result, we consider it important to include interventions and working methods that have not yet been scientifically assessed in the evaluation and development process for this resource. However, these interventions must meet the minimum criteria for the Early Intervention resource.
Fig. 2 Early Intervention classification and inclusion criteria
Under the minimum requirements set for this resource, all interventions must be ethically sound, have a specified target audience, an evidence-based theoretical basis, and a clearly defined training programme. These four aspects will be used as the basis for our initial assessment. For the purposes of the evaluation, a target audience is defined as families with children under the age of 18. An evidence-based theoretical basis is defined as a theoretical framework based on current theoretical and scientific data on child and adolescent development. The training requirement means that clearly defined and high-quality training provision suitable for the Finnish context must be available for the intervention. In addition to evidence and effectiveness, it is also important that the intervention is based on values that are closely aligned with broader ethical considerations essential to all care and service provision. Our evaluation of the values and ethical framework will be based on a number of considerations, including how the intervention is delivered and what measures are in place to ensure that the voice of the target audience is heard. We will also look for evidence that clients are treated with respect and as equal participants in the process. In other words, the evaluation process is concerned with identifying to what extent the intervention can be considered customer-centred, dialogue-oriented, participatory and empowering. The evaluation of the scientific evidence will focus on a number of areas, including study design and research methodologies. Effectiveness is measured by the extent to which the intervention has been successful in generating the desired outcomes in the target audience. As we assess the usefulness of the intervention, our focus will be on a series of practical considerations, such as the method by which information is disseminated, how training is made available to practitioners and to what extent the intervention is suitable for the Finnish context. The evaluation process is set out in more detail in Chapter 4.
Study design is an important focus for the evaluation process, while other evidence-related criteria are used to complement the wider picture. The Early Intervention classification system combines the four-part Current Care Guideline scale (Strong, Moderate, Weak, No evidence) with criteria that has been found to work well in other countries, including the Nordic countries and the Netherlands. To be awarded the highest grading (Strong) the intervention must be based on a randomised controlled trial (RCT), which is considered the scientific gold standard (see Flay et al. 2005; Jané-Llopis at al. 2010). This is considered the most reliable form of study design when the objective is to discover whether the desired outcome can be directly attributed to the intervention itself. In an RCT, one group will have the intervention administered to them, while another group, which acts as the control, will receive “treatment as usual”, with known and unknown confounding factors balanced between the groups. It is also important to ensure that the chosen study design and the research methodologies employed match the topic under investigation and are capable of addressing the hypothesis. There will be instances where it is not possible or even sensible to conduct research using a randomised controlled study. Where an RCT is not appropriate, cohort studies and case-control studies are a useful alternative, although it is important to note that they do not allow researchers to be confident to the same degree of accuracy as to whether the desired change within the cohort is due to the intervention itself. Studies can be either prospective or retrospective, meaning that they look either forward or backward in time. Evidence derived from prospective studies is rated more highly, as retrospective studies are deemed to be more susceptible to bias. The study design will never be the only factor used to evaluate the quality of the scientific evidence. The evaluation is always based on a series of other significant factors, such as the ethical safeguards in place, the quality of the research itself (validity and sample size), the number of studies carried out, the consistency of the results as well as their significance and relevance (applicability and external validity). These criteria also form the framework for the Finnish model.
Fig. 3 Evidence quality and study design
Of the many other evidence databases already in existence, the Norwegian Ungsinn model is of particular interest here, as, like in Finland, Norway’s third sector organisations play a particularly strong role in delivering promotion and prevention services. The Ungsinn model (Mørch et al. 2008) is based on a Dutch model (Veerman & Yperen, 2007) where evidence and effectiveness are assessed on a four-point scale. The Dutch model features four levels of scientific evidence: descriptive, theoretical, indicative and causal. Effectiveness is split into effective, functionally effective, likely to be effective and potentially effective. In the Norwegian model, the grading is also linked to effectiveness and is split into potentially effective, probably effective, functionally effective and evidence of effectiveness. Although evidence and effectiveness are closely linked, they are treated as separate categories under the Finnish model. In the Finnish model, two separate scales have been created for the purpose of evaluating evidence and effectiveness. This was done for a number of reasons. Strong scientific evidence alone is not a guarantee of effectiveness; the effect of a given intervention may range from weak to strong and, in some cases, despite the strong scientific evidence backing it, an intervention may have no effect or even a negative effect. It is also important to make null, inconclusive and negative results available in the public domain to prevent the use of ineffective or harmful interventions.
For an intervention to be deemed effective, it must be shown to achieve the desired outcomes in the target groups. Additionally, the target groups must be considered sufficiently representative to allow for the outcomes to be considered replicable in larger target populations. It is also important to ensure that the appropriate indicators have been used to study the effects to ensure that these indicators are relevant to the target population in terms of their well-being (see e.g. Jané-Llopis et al. 2010, Finnish Current Care Guidelines; Silverman & Hinshaw 2008). Key measures for evaluating effectiveness include effect size and trend (see e.g. Finnish Current Care Guidelines) The Finnish four-point scale is set out below.
Fig. 4 Assessing effectiveness and the criteria used
In addition to evidence and efficacy, it is also important to ensure that the intervention is useful and suited to the local context and requirements. These external criteria include training-related considerations, cost-effectiveness and the potential offered by the intervention to meet local needs. Implementation is also a key factor when assessing the usefulness and suitability of an intervention. For the time being, all considerations concerning implementation and related quality evaluation methods have been excluded from the process. The criteria for assessing usefulness have been chosen on the basis of the relevant literature (e.g. Laajasalo & Pirkola, 2012, Jané-Llopis et al. 2010, Marklund, et al. 2010; Weisz et al. 2005) and other evidence-based Nordic models. Aspects of the three-point scale are based on a similar Swedish scale (http://www.socialstyrelsen.se/evidensbaseradpraktik/metodguide).
Fig. 5 Usefulness and evaluation scales
When we assess the quality of the training offered, we focus on the following criteria relating to content, delivery and availability:
- Does the training comprise a range of different methods, including theoretical and practical approaches, group meetings, supervision and placements?
- Are the content and materials clearly set out?
- Do the training providers possess the necessary qualifications and experience?
- Is an evaluation and feedback process included?
- Is follow-up training available?
Assessing the training description: 0 = It has not been possible to assess the training provision as the information is incomplete or unavailable. 1 = There are clear shortcomings with regard to content, delivery and purpose. These areas will need to be revised and clarified. 2 = There are issues regarding content, delivery and purpose. These areas will need to be revised. 3 = High-quality training content and delivery. Availability is also good.
When we assess the availability of the relevant training provision, we scrutinise the following criteria related to delivery and availability particularly closely:
- Is training readily available and easy to arrange?
- Is the cost of training reasonable?
Assessing training availability: 0 = It has not been possible to assess the availability of training provision as information is incomplete or unavailable. 1 = There are clear issues regarding availability. These will need to be revised and clarified. 2 = There are some issues with regard to training availability. 3 = Training is readily available at a reasonable cost.
Cost-effectiveness is assessed as a separate category. We make use of existing Finnish, Nordic and international calculations when assessing cost-effectiveness. Assessing cost-effectiveness: 0 = Not cost-effective or it has not been possible to assess cost-effectiveness as information is incomplete or unavailable. 1 = Potentially/likely cost-effective (in Finland, other Nordic countries or elsewhere). 2 = Somewhat cost-effective (in Finland and other Nordic countries). 3 = Cost-effective (in Finland and other Nordic countries)
d) Assessing suitability for Finnish context
In addition to being based on solid scientific evidence, the intervention must also be suitable for Finnish conditions. In broad terms, we assess whether there is demand for this type of intervention in Finland, how practitioners are likely to receive it and/or how the method complements existing practice. The intervention must also be acceptable in sociopolitical terms. In addition, we may investigate whether the intervention has previously been used in Finland or the other Nordic countries. In some countries, such as Norway, this forms part of the evaluation designed to assess the quality of the relevant scientific evidence. Here in Finland, the decision was made to assess the intervention’s suitability for the Finnish context as a separate, stand-alone feature unconnected to the evaluation of the relevant evidence. Assessing suitability: 0 = Not possible to assess as the intervention has not been used/studied in Finland or other Nordic countries. 1 = Poor suitability as the intervention has not been used/studied in Finland but it has been used and/or studied in other Nordic countries. 3 = Reasonably suitable for the Finnish context (some previous use/studies carried out in Finland and other Nordic countries). 3 = Well-suited to the Finnish context (extensively used/studied in Finland or other Nordic countries).
e) Measurability and assessing effectiveness in practice
The intervention incorporates an assessment of effectiveness in terms of training provision, practical implementation and dissemination. The evaluation focuses on how trainees, practitioners and target groups perceive the intervention method and what arrangements are in place to ensure that the method will remain under continuous evaluation, development and other quality control measures. Assessing the evaluation of intervention effectiveness: 0 = It has not been possible to carry out an evaluation as a description has not been provided or the assessment methods are incomplete. 1 = The assessment methods will need to be revised and clarified. 2 = The assessment methods require some amendment. 3 = The assessment methods are relevant and thoroughly described.
At the end of the evaluation process, the contributors may wish to collate the intervention’s strengths and weaknesses and to issue recommendations for improvement and further development.
2.5. Classification and evaluation summary
Once the process is complete, the evaluations carried out on the available scientific evidence, effectiveness and usefulness will be collated and the intervention will be assigned to a category on a scale of 1 to 3 as per the table below. The classification is based on a) an overall research evaluation incorporating an assessment of the evidence quality and effectiveness and b) usefulness, which is assessed on a scale of 0–15. In the event of a discrepancy between the total number of points awarded as part of the overall research evaluation and the usefulness assessment, the lowest number of points will apply. For example, if an intervention is awarded six points in the overall research evaluation and effectiveness but goes on to score 8 on usefulness, it will be awarded a Level 2 (**) classification. The classification system can also be represented by stars. High-quality interventions receive three stars, ***. Moderately high-quality interventions receive two stars, **. Interventions below this level will receive one star *. Interventions that have no basis in evidence and/or cannot demonstrate effectiveness yet, as well as interventions that have not been able to demonstrate usefulness or suitability for the Finnish setting, will be awarded no stars. If a study is currently ongoing, the intervention can also be classed as “not for evaluation, research ongoing”.
Fig. 6 Evaluation and classification