Article Text

Download PDFPDF

Methodological standards, quality of reporting and regulatory compliance in animal research on amyotrophic lateral sclerosis: a systematic review
  1. Joana G Fernandes1,2,
  2. Nuno H Franco1,2,
  3. Andrew J Grierson3,4,
  4. Jan Hultgren5,
  5. Andrew J W Furley4,6,
  6. I Anna S Olsson1,2
  1. 1 Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal
  2. 2 IBMC-Instituto de Biologia Molecular e Celular, Universidade do Porto, Porto, Portugal
  3. 3 Department of Neuroscience, Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, UK
  4. 4 Bateson Centre, University of Sheffield, Sheffield, UK
  5. 5 Department of Animal Environment and Health, Swedish University of Agricultural Sciences, Skara, Sweden
  6. 6 Department of Biomedical Science, University of Sheffield, Western Bank, Sheffield, UK
  1. Correspondence to I Anna S Olsson; olsson{at}


Embedded Image

Objectives The amyotrophic lateral sclerosis (ALS) research community was one of the first to adopt methodology guidelines to improve preclinical research reproducibility. We here present the results of a systematic review to investigate how the standards in this field changed over the 10-year period during which the guidelines were first published (2007) and updated (2010).

Methods We searched for papers reporting ALS research on SOD1 (superoxide dismutase 1) mice published between 2005 and 2015 on the ISI Web of Science database, resulting in a sample of 569 papers to review, after triage. Two scores—one for methodological quality, one for regulatory compliance—were built from weighted sums of separate sets of items, and subjected to multivariable regression analysis, to assess how these related to publication year, type of study, country of origin and journal.

Results Reporting standards improved over time. Of papers published after the first ALS guidelines were made public, fewer than 9% referred specifically to these. Of key research parameters, only three (genetic background, number of transgenes and group size) were reported in >50% of the papers. Information on housing conditions, randomisation and blinding was absent in over two-thirds of the papers. Group size was among the best reported parameters, but the majority reported using fewer than the recommended sample size and only two studies clearly justified group size.

Conclusions General methodological standards improved gradually over a period of 8–10 years, but remained generally comparable with related fields with no specific guidelines, except with regard to severity. Only 11% of ALS studies were classified in the highest severity level (animals allowed to reach death or moribund stages), substantially below the proportion in studies of comparable neurodegenerative diseases such as Huntington’s. The existence of field-specific guidelines, although a welcome indication of concern, seems insufficient to ensure adherence to high methodological standards. Other mechanisms may be required to improve methodological and welfare standards.

  • amyotrophic lateral sclerosis
  • ALS
  • guidelines
  • methodology
  • reporting
  • quality
  • compliance
  • animal welfare
  • reproducibility

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:

Statistics from

Strengths and limitations of this study

  • The approach for this systematic review is unique in covering methodological quality, regulatory compliance and severity or animal welfare.

  • We built two comprehensive scores (for methodological standards and for regulatory compliance) which were subjected to multivariable regression analysis to investigate how these scores were related to publication year, type of study, country of origin and journal, simultaneously accounting for all these factors.

  • Our large sample (N=569 papers) included half the total population of published papers between 2005 and 2015.

  • While more models of amyotrophic lateral sclerosis are now available, only studies using the SOD1 (superoxide dismutase 1) mouse were included.

  • The protocol was defined prior to data collection but was not registered prior to the study.

  • Information retrieval and assessment were not blinded.


Amyotrophic lateral sclerosis (ALS) is a rapidly progressing neurodegenerative disease typically resulting in death 2–5 years after the onset of symptoms. There is no known cure, and the most widely used treatment—riluzole—extends survival by just 2 months.1 ALS research using animal models focuses primarily on two main interconnected goals: understanding the underlying mechanisms involved in motor neuron death in the brain and spinal cord, and development and testing of potential drug therapies.2 This research relies substantially on genetically modified animals, particularly transgenic mice expressing mutant forms of the human superoxide dismutase 1 (SOD1) gene, which manifest several important characteristics of the human disease.3 4

While the use of animal models is relevant for advancing knowledge and considered essential for testing putative treatments, it also presents ethical challenges and thus may be a reason for public concern. As a result, a common legal requirement in many countries is that animal research projects undergo an evaluation process intended to ensure that protocols are designed and carried out in compliance with the 3Rs principle: replacement of animal use by non-animal methods, reduction of animal numbers needed to achieve the scientific objectives, and refinement of procedures to reduce or prevent harm to animals and improve their well-being. Systematic reviews of animal use in both neuroscience5 and infection6 research indicate that self-reported regulatory compliance, including of ethical approval of protocols, has steadily increased over the last decade, but that significant progress could still be made to minimise and prevent avoidable suffering of laboratory animals. One key measure for accomplishing this is the termination of experiments during less severe stages of disease development where it is scientifically valid to do so. Endpoints based on early obtainable and scientifically sound indicators of phenotype progression can improve the ethical acceptability of animal studies and prevent the confounding influence of secondary factors; in the case of animal models of neurodegenerative diseases, starvation and dehydration arising from difficulties in eating and drinking due to progressive motor impairment can affect the phenotype and the read-out of survival studies.7–9 Simple refinements, such as adding mashed food and longer bottle spouts, can however help reduce the influence of such factors.10–12

Of related concern are reports that a number of published animal studies fail to uphold basic standards regarding experimental design—for example, random assignment of animals to treatment groups, blinding of observers—or use too few animals, often leading to irreproducible results of limited translational value.13–18 This also holds true for neuroscience,19–22 with concerns over the overall quality and reproducibility of published results being raised for several neuroscience subfields, including multiple sclerosis,23 stroke,24 spinal cord injury,25 Alzheimer’s,26 Parkinson’s,27 Huntington’s12 and ALS28 research. This has led major science funders, including the National Institutes of Health29 and Research Councils UK,30 to demand that future grant proposals attest to the likelihood of providing reliable results, by including details of experimental design and adequate justification of sample sizes. Reproducibility is further hindered by insufficient provision of information on methodology in published research,31 including failure to account for key variables such as sex, genotype, age and weight of animals, anaesthetics used, or methods of euthanasia. Omitting information also makes it impossible to evaluate the study quality, and there is evidence that papers that do not report randomisation or blinding exaggerate biological effects.32–34

Broadly, the public conditionally approves of animal studies on the assumption that the harm caused is offset by the benefits achieved and that scientists strive to minimise the former and optimise the latter.35 36 Doing so requires scientists to critically revise their methods to maximise translational relevance.18 37 Scientists are rightly concerned and, within the self-correcting process of science, must rely on themselves to both identify the main obstacles hindering its progress and find adequate solutions. To address the issue of methodological standards and quality of reporting of basic and applied ALS studies, the ALS research community held two meetings in 2006 and 2009, resulting in the publication of guidelines for animal studies in this field.2 38 These guidelines aim to improve and standardise research methodology, and encourage authors and journals to publish negative results in order to avoid publication bias. The actual impact of such guidelines on how the ALS community carries out and reports research has however not been assessed.

The present systematic review of animal studies of ALS uniquely aimed to assess, over an extended period, the attention given to relevant methodological parameters (as a proxy for the likely reliability of the study) and to examine how the principles of refinement and reduction (measures to minimise animal harm) were considered. Both proof-of-concept and preclinical studies were included in order to assess the influence of the type of study.


Database search

An advanced search was conducted on the ISI Web of Science database with the query TS = ((mice OR mouse) SAME (ALS OR ‘amyotrophic lateral sclerosis’)). The database choice followed the protocol established for our previous reviews,5 6 based on considerations of access, search function and wide coverage of life sciences research. Results were refined to include only original research articles written in English and published in 2005, 2007, 2009, 2011, 2013 and 2015. Years of publication were selected to include papers reporting research planned and carried out prior to and after the publication of guidelines for ALS research in 200738 and 2010,2 resulting from two international meetings held in 2006 and 2009, respectively (figure 1).

Figure 1

Timeline of relevant events. The bottom arrows signal the years for which papers in our sample were retrieved, and the top arrows indicate the years when workshops on best practice in ALS animal research were held, as well as when guidelines stemming from these were published. The grey bar illustrates the period of 1–4 years over which ALS animal studies reported in 2005 were likely to have been designed and carried out, an estimation that can also be applied for the other years reviewed (2007, 2009, 2011, 2013 and 2015). ALS, amyotrophic lateral sclerosis.

The choice to focus on SOD1 mice was based on the predominant role of this model in animal-based research into ALS (see figure 2).

Figure 2

Trends in animal model chosen in ALS research, based on the number of hits from a Clarivate Analytics Web of Science advanced search applying the search queries: (1) TS = ((‘ALS’ OR ‘amyotrophic lateral sclerosis’) AND ‘SOD1’ AND (‘mouse’ OR ‘mice’)); (2) TS = ((‘ALS’ OR ‘amyotrophic lateral sclerosis’) AND ‘TDP-43’ AND (‘mouse’ OR ‘mice’)); and (3) TS = ((‘ALS’ OR ‘amyotrophic lateral sclerosis’) AND ‘FUS’ AND (‘mouse’ OR ‘mice’)). ALS, amyotrophic lateral sclerosis; SOD1, superoxide dismutase 1;TDP-43, Transactive response DNA binding protein; FUSF,Used in Sarcoma.

The search was performed in February 2013 for scientific articles from 2009 and 2011 (after the first and second conferences, respectively), in August 2013 for scientific articles from 2005 (before the two conferences), in September 2014 for scientific articles from 2013, in November 2016 for scientific articles from 2015, and in February 2017 for scientific articles from 2007. After the triage process, illustrated in figure 3, 569 full-text articles remained for analysis: 77 from 2005, 81 from 2007, 84 from 2009, 106 from 2011, 115 from 2013, and 106 from 2015 figure 4.

Figure 3

Triage process. The first triage step involved reading each of the 1993 abstracts and excluding all papers that were not related to ALS. The second triage step excluded all papers that did not report original research with SOD1 models of the disease. ALS, amyotrophic lateral sclerosis; SOD1, superoxide dismutase 1; TDP-43,Transactive response DNA binding protein

Figure 4

Flow diagram. From Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group (2009). Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med 6(6): e1000097. doi:10.1371/journal.pmed1000097.

Data collection

Each published study was categorised as either a ‘preclinical’ (ie, carried out ‘to evaluate a drug for use in humans’) or ‘proof-of-concept’ (ie, aiming ‘to elucidate the mechanism of the disease’), according to the suggested classification for animal studies on ALS.2 38 Thus, papers reporting outcomes of drug tests in animal models to inform of their therapeutic value for humans were classified as ‘preclinical’, whereas those reporting studies with the primary goal of deciphering a mechanism of the disease without an immediate application to therapeutic approaches in humans—regardless of using a drug as an investigational tool—were classified as ‘proof-of-concept’. Table 1 describes the information retrieved regarding regulatory compliance, animal models, experimental design and animal welfare. This information was retrieved through careful reading of the full papers and logged into a spreadsheet.

Table 1

Data retrieved

Table 2

Severity scale for ALS studies on transgenic mice with a mutant SOD1 gene

The review protocol was defined prior to data collection. No modifications to data collection methods were made during the research, but the period to be covered was extended to include publication year 2015. Data extraction was carried out by JGF, with support from NHF, AJG and IASO for disambiguation. Blinding was not possible as access to the full paper was required in order to retrieve information.

For severity assessment, a scale was devised based on the specific characteristics of the ALS models and their progressive disease phenotype (table 2). The ALS models used in the reviewed studies express diverse mutant forms of the SOD1 gene. The onset of disease for these models is generally characterised by weakness and tremors of the hind limbs, together with a mild loss of body weight. Disease progression leads to paralysis of the hind limbs, followed by complete paralysis (eg, figure 3 in ref 39), accompanied by increased difficulty in eating, drinking and swallowing.40 41 Mice die of respiratory failure due to paralysis of the diaphragm.8 Age of onset and death, as well as the interval between them, vary depending on the mutation of the amino acid and codon, for example in ref 42, number of copies of transgene, for example in ref 43, and genetic background.4 For instance, the overexpressing SOD1G93A Line Gur 1H (B6SJL hybrid) presents with an early onset of overt motor symptoms (3–4 months) and moderate rate of progression (3 weeks from onset to death),44 whereas the highly expressing SOD1G85R Line 148 presents with later onset (7.5 months) and faster disease progression (2 weeks from onset to death).45 Also, factors such as the animal supplier (eg, refs 46 47), inhouse breeding48 and crosses with other non-SOD1 models (eg, SOD1 mice crossed with gene-specific knockout mice49) are further sources of variability.

Maximum estimated severity was classified according to a five-level scale (table 2). Scoring was based on the estimated clinical state of animals at the most advanced stage of disease progression they were allowed to reach. Studies in which information was insufficient to draw conclusions about the level of severity were classified as ‘undetermined’. This severity scale was developed building on previous work from members of this team (NHF, IASO), developed for classifying studies on mouse models of Huntington’s disease (table 2 in ref 5), together with our own (AJG) experience with mutant SOD1 mouse models and literature. For purposes of statistical analysis, the severity scale was reduced to a binary scale (‘low’=severity up to level 4; ‘high’=level 5 severity). The choice for above level 4 severity as a cut-off point was based on its status as a ‘standard endpoint’ in published ALS guidelines,2 38 whereas full paralysis or spontaneous death exceeds this standard endpoint, as well as the legally recommended endpoints in many countries, including the European Union Member States.

Methodological Standards Reporting and Regulatory Compliance Reporting scores

For each reviewed publication, data were collected on a number of items which all contributed with information about the reporting quality of the paper. For the analysis, we brought these items together into two scores, hence generating for each paper two comprehensive measures for reporting quality, one on methodological standards and one on regulatory compliance. We then used regression analysis to investigate how the two scores (dependent variables) were related to publication year, type of study, country of origin and journal (explanatory or predictor variables), as outlined in detail in the following. Based on the regression models, it is possible to predict how the dependent variables would have changed with changes in the explanatory variables. In contrast to, for example, correlation, the regression analysis takes into account all the explanatory variables that were included in the models, that is, the estimated association between a score and one of the explanatory variables is independent of the values of the other explanatory variables considered. In that way, spurious associations caused by the relationships between the explanatory variables in the data can be avoided.

The two scores were formed as weighted sums of separate sets of items. The Methodological Standards Reporting (MSR) score was constructed as the weighted sum of the items sampsize, climate, cagesize, nmice, sex, copies and genetic (which refer to important research parameters in animal experimentation and in ALS research in particular) and the items random, blinded, control, sibsplit and exclus (associated with general good practices in the design of animal experiments and published recommendations for ALS studies). Greater weight (1.5 vs 1) was attributed to items which are also part of the ALS guidelines. Table 3 describes these items, their attributed weight in the MSR score and the absolute number and percentage of papers reporting this information, divided by the type of study.

Table 3

List of items integrated in the MSR and RCR scores for preclinical (n=108) and proof-of-concept (n=461) animal studies on ALS reporting this information

The Regulatory Compliance Reporting (RCR) score was originally constructed from the items comply, protocol, severity (turned into a binary classification) and refine. For purposes of statistical modelling, the final version of this score (RCRb) included comply, protocol and refine and was coded as 1 when the sum of these was 2–3, and as 0 when the sum was 0–1.

MSR and RCRb were modelled statistically to estimate the effects of publication year (2005, 2007, 2009, 2011, 2013 or 2015), study type (preclinical or proof-of-concept), country of origin (15 categories), journal (17 categories) and severity (low or high), simultaneously accounting for all the explanatory variables in the models. Countries contributing with less than 12 papers and journals contributing with less than 6 papers were combined into separate categories, denoted ‘Other’. MSR was modelled using linear regression and RCRb by logistic regression. Logistic regression is appropriate for binary dependent variables (assuming a linear relationship of the log-odds of the dependent variable with the explanatory variables). The results of a logistic regression can be expressed as the odds of a positive value of the dependent variable at one level of a categorical explanatory variable relative to the odds at another level (the ORs), or the probability of a positive dependent variable at any given level of the explanatory variables. All first-order interaction effects (combined effects of two explanatory variables at a time) were tested and included if significant.

Predictive marginal means were calculated, showing predicted values of MSR and probabilities of RCR being above 1 for different publication years, study types and countries of origin. In each case, the marginal means assumed remaining variables in the models to have their observed values. Both models were checked using the Pregibon link test50 and by examining standardised residuals, looking for model misspecification and extreme values. The MSR model was also checked with the Breusch-Pagan/Cook-Weisberg test for heteroscedasticity51 (variability differing between parts of the data), the Ramsey regression specification error test for omitted variables,52 and the RCRb model by examining delta-betas to identify particularly influential observations. The proportion of the total variation in MSR and RCRb that could be explained by differences between countries or journals was determined by running empty mixed models with country and journal, respectively, as a random effect, and calculating the intraclass correlation coefficients. The justification for weighting the items composing MSR was checked by modelling an alternative score formed without weighting. The differences between years and countries remained virtually unchanged, although the unweighted score values were generally lower.

The association between MSR and RCR scores was estimated using Spearman rank correlation, which is suitable for non-normally distributed data. A total of 490 observations could be used. Overall MSR mean±SD was 5.69±2.39. RCR assumed the values of 0 (n=48), 1 (n=103), 2 (n=309) or 3 (n=30), resulting in 69% of the observations having values above 1. The number of observations per level of year, study type, country, journal and severity is shown in table 4.

Table 4

Distribution of observations across levels of independent variables included in models of Methodological Standards Reporting and Regulatory Compliance Reporting indices in 490 amyotrophic lateral sclerosis studies

The data were analysed in Stata/IC V.13.1 and IBM SPSS V.23.0. Each article was regarded as the experimental unit and the level of significance for all tests was 0.05.


Quality of research and reporting

The quality of methodological standards and of reporting is crucial to avoid bias and achieve reliable, repeatable and translatable research results. We measured this through the MSR score and also looked at specific research parameters individually.

MSR score

The 12 items that comprise the MSR score represent 7 relevant experimental variables and 5 measures for reducing bias in animal experiments. Higher scores mean better reporting and implementation of good practices in the design of ALS animal studies.

MSR was significantly affected by year and study type (joint F-test p=0.0015 and p<0.0001, respectively). Compared with 2005, the logistic regression model predicted a lower MSR for 2007. However, the subsequent years (2009, 2011, 2013 and 2015) were all predicted to be higher than 2007, with a consistent and unbroken increasing trend until 2013 (figure 5). In 2013, MSR was predicted to be 1.5 units higher than in 2007 (p<0.0001). The model also predicted a higher MSR for preclinical studies than for proof-of-concept studies (marginal mean of 7.28 and 5.26, respectively). Model diagnostics showed that linear regression was justified and the model fit was excellent. Table 5 shows the complete MSR model results.

Figure 5

Predictive marginal means (predicted score values) ±95% CI of publication year (A) and country (B) based on a model of an MSR score in 487 ALS studies. According to the linear regression model, MSR could be expected to be lower in 2007 than in 2005, but higher in 2009, 2011, 2013 and 2015 than in 2007. No significant interactions were found (eg, between country and year). According to the R-square statistics, the model explained 25% of the total variation in MSR. ALS, amyotrophic lateral sclerosis; MSR, Methodological Standards Reporting.

Table 5

Model estimates of an MSR index, from the 487 ALS studies that could be used

Reporting of relevant research parameters

Some research parameters were very seldom reported, for example, numbers of animals per cage (7.2%, 41/569), cage size (0.5%, 3/569) and exclusion of animals (1.4%, 8/569). Measures in guideline recommendations to reduce bias in ALS research were mostly not reported, including splitting littermates to treatment groups (10.4%, 59/569), use of non-transgenic littermates as controls (33.2%, 189/569), as well as measures of broader application, such as random assignment of animals to treatments (13.2%, 75/569) or blinding of observers (25.7%, 146/569). By contrast, numbers of transgene copies and genetic backgrounds of animals were reported in the majority of papers.

Of papers reporting sex (n=297), 54.2% (161/297) described studies using mice of both sexes, while 29.0% (86/297) used only males and 16.8% (50/297) used only females. Reporting of sex rose steadily from 2005 (39.0%, 30/77) to 2015 (69.8%, 74/106).

Regarding the chosen genetic background of animals used for preclinical studies (n=108), 76% (70/92) of those reporting this parameter generated experimental animals using a cross between mice hemizygous for the SOD1 mutant gene and C57/SJL outbred strains.

Only 10 studies (6 proof-of-concept studies and 4 preclinical studies) from 2007, 2009, 2011, 2013 and 2015 justified the number of animals used per group. However, of these, only six gave clear justifications (five justified the group size by a power analysis and the other by the size of groups proposed in ALS guidelines).2 38 On the other hand, group size was reported in 83.3% (474/569) of ALS papers, and more so in the preclinical studies subsample (figure 6).

Figure 6

Group size. Histogram of mean group size in 105 preclinical studies reporting this parameter (A) and for each of the years analysed (yearly mean±1 SD) (B).

Of the 569 papers reviewed, 38% (214/569) did not report the method for killing animals despite the fact that in 91% (195/214) of these terminal procedures requiring anaesthesia for ethical and practical reasons were identified (eg, transcardial perfusion fixation). The most commonly used euthanasia method—of the papers reporting this information—was anaesthetic overdose or the use of another method under anaesthesia (86%, 317/367), while other methods such as carbon dioxide asphyxiation (7%, 26/367) or others such as decapitation or cervical dislocation (7%, 24/367) were seldom used. Very few studies (15/569) were identified as not performing euthanasia of any kind. The remaining 21 articles were deemed ‘inconclusive’, for neither reporting euthanising animals at any point nor reporting deaths.

Regulatory compliance and estimated severity

For public confidence in research, it is important that research with animals is carried out according to standards set by legislation and in line with the principles of the 3Rs. We measured such compliance through the RCR score and also looked at specific research parameters individually.

RCR score

The RCR score, which measures to what extent compliance with legislation and approval of animal experiments are reported in published papers, shows an overall improvement in the reporting over the time period under study (joint χ2 p<0.001; figure 7). The estimated odds of RCR >1 was 7.1 times higher in 2015 than in 2005 (p<0.0001). RCR did not differ between journals or between proof-of-concept and preclinical studies, but was affected by country (figure 7). Studies with high severity seemed to have higher odds of high RCR values (p=0.027). Model diagnostics showed that logistic regression was justified. Table 6 shows the RCR model results.

Figure 7

Predictive marginal means (predicted probabilities of values >1) ±95% CI of publication year (A) and country (B) based on a model of an RCR score in 490 ALS studies. The probability of an RCR score above 1 was higher in 2013 and 2015 than in 2005. China, France, Italy and South Korea appeared to have comparatively low probabilities, while for example Spain, Belgium and Canada had somewhat high probabilities. No significant interactions were found. The pseudo R-square statistics indicated that the model explained 16% of the total variation in the data. ALS, amyotrophic lateral sclerosis; RCR, Regulatory Compliance Reporting.

Table 6

Model estimates of an RCR index in 490 ALS studies

Over the entire period, most papers (67.0%, 381/569) reported that studies had been appraised and approved by a third party (eg, ethics committee, competent authority), with only 10.9% (62/569) not reporting any kind of regulatory compliance. By 2015, all papers were found to have some type of statement on regulatory compliance, most of which (83%) referring to prior ethical approval of research protocols.

The correlation between MSR and RCR was weak but highly significant (Spearman r=0.21, p<0.0001), indicating that papers with high scores for methodological standards were somewhat more likely to also score highly for regulatory standards.

Severity and refinement measures

We have found in previous systematic reviews5 6 53 that self-reported compliance with regulations may not necessarily affect the severity of the experiments being conducted. To test whether actual experimental practice has changed over the study period, we classified the severity of each study according to the criteria in table 2. The majority of publications (60.7%, 346/569) included experiments at level 4 severity (figure 8A). Of the 64 studies classified as level 5 (allowing animals to die of disease progression or to reach complete paralysis), 89% reported regulatory compliance (70% ethical approval from a national authority or institutional ethics committee and 19% compliance with relevant legislation or animal use guidelines). However, between those studies that reported regulatory compliance and those that did not, there was no difference in the proportion that were level 5 (χ2 (5 df)=2.855, p=0.722) (figure 8B).

Figure 8

Severity classification of studies (n=569). (A) Percentage of studies, by year, classified into each of the five levels of our severity scale, as well as those of ‘undetermined’ severity due to insufficient information (n=77 in 2005; n=81 in 2007; n=84 in 2009; n=106 in 2011; n=115 in 2013; n=106 in 2015). (B and C) Percentage of studies classified into each of the five levels, according to, respectively, reported regulatory compliance status (n=62, not reported; n=126, guidelines followed; n=381, protocol approval) and type of study (n=461, proof-of-concept studies; n=108, preclinical studies).

On the other hand, we did observe a difference between preclinical and proof-of-concept studies: preclinical studies included a higher proportion of studies within the highest severity categories (77.9% (81/104) classified as level 4 and 19.2% (20/104) as level 5) than did proof-of-concept studies (68.7% (265/386) classified as level 4 and 11.4% (44/386) as level 5). Moreover, no preclinical studies were given a level 1 or level 2 severity (χ2 (5 df)=19.593, p=0.001) (figure 8C).

Of studies classified between level 3 and level 5 severity (ie, from which it could be ascertained animals presented overt locomotor impairments), only 9.1% (42/456) described any refinement measures to alleviate suffering (eg, provision of mashed food and adaptation of bedding in later stages of disease progression), which occurred almost exclusively (39/42) in level 4 studies.

Differences in the regulatory landscape between countries imply that how animals are treated in biomedical research may depend on where these experiments are carried out. The proportion of high severity (level 5) studies differed significantly (χ2 (13 df)=35 561, p=0.001) between the 14 most represented countries in our sample, ranging from 40% (8/20) and 41% (7/17) in South Korea and Israel, respectively, to 4% in Canada and China, and even none in Belgium (0/14) and the UK (0/23).


Our analysis, the first of its kind to use specially devised scores encompassing both methodological standards and regulatory compliance reporting (MSR and RCR, respectively) over a 10-year period, suggests three main findings: The first is an overall improvement in both regulatory compliance and methodological and reporting quality across the period assessed. Also, and somewhat as expected, studies classified as ‘preclinical’ scored higher for methodological and reporting quality as compared with more ‘proof-of-concept’ studies. The third finding is that these scores varied widely according to the country in which the first author was based, but not according to the journal publishing the paper.

The improved reporting of regulatory compliance, as expressed in the increase in RCR score across time, is an indicator of widespread increase in reported adherence to animal welfare regulatory requirements. However, this was not reflected in any significant change in the proportion of highly severe (level 5 in our classification scheme) studies or the reporting of refinement measures (in studies where animals showed overt clinical signs). This is in agreement with results from previous systematic reviews of animal research on Huntington’s disease (papers published in 1997–2009)5 and tuberculosis (1997–2011).6 Also, while ‘preclinical’ studies were more likely to be classified in the higher severity categories, there was no relation between the level of severity and whether papers reported approval of protocols or compliance with regulations, the latter also reflecting previous findings.5 53

Only 11.2% of ALS studies were classified at the highest severity level (level 5, ie, including experiments with spontaneous death or euthanasia at a near-death stage, ie, complete paralysis), which is much lower than that found in research using mouse models of Huntington’s disease (38%)5 and tuberculosis (66%).6 Moreover, most endpoints applied in ALS studies adhered to the same basic criterion for euthanising animals, namely the point at which animals are unable to resume their position if laid recumbent within 10–30 s. This is the primary endpoint proposed in existing guidelines for preclinical ALS2 38 and the ALS Treatment Development Institute’s recommendations28 (level 4 severity on our scale), suggesting researchers to a great extent act in accordance with published guidance in this respect. However, this endpoint was already broadly used before the publication of the guidelines, suggesting that these reflect common practice at the time of publication.

Applying predefined endpoints is important to prevent the loss of biological samples from animals found dead and for which time of death therefore cannot be defined,5 hence maintaining numbers of animals and avoiding loss of statistical power and subsequent inconclusive results. However, from an animal welfare perspective, the current standard endpoint for ALS studies corresponds to an end stage where euthanasia may prevent deaths from respiratory failure, but since they seldom anticipate death by more than a day, or even just a few hours, late-stage endpoints only curtail a small part of animal suffering.7 Very late endpoints increase the likelihood that at least some animals will die unsupervised (eg, overnight), while the confounding effect of starvation and dehydration in survival data increases as animals become progressively less able to reach the bottle spout or the food hopper.54 At advanced clinical stages, refinements such as providing mashed food on the cage floor, long-spouted water bottles or fluid administration are therefore crucial to avoid unnecessary animal suffering and to improve validity by bringing the model closer to the clinical setting, where late-stage human patients are provided palliative care.55 Defining endpoints also needs to take the research purpose into account. In ALS, the mechanisms operating at different stages of the disease are known to be different, principally affecting distal axons at the onset of symptoms, but developing an immune/inflammatory phenotype during the end stages.56 Therefore, endpoints relevant to the treatment strategies must be used, particularly when targeting neuroinflammation.

MSR improved over the time period under study. Studies classified as ‘preclinical’ reported methodology in more detail than those deemed ‘proof-of-concept’, consistent with the view that a more rigorous design and execution should be demanded for preclinical studies.57 Nevertheless, the checklist provided in the 2010 edition of the guidelines for ALS research sets high methodological standards for both types of studies.2 Throughout the period under study, the MSR scores remain below 50% of the maximum score, showing that the overall level of reporting of methodological detail remains substantially below the recommendations in the guidelines.

Only three parameters (genetic background, number of transgene copies and group size) were reported in more than half of the sample, whereas other relevant information, such as housing conditions, randomisation of animals into treatment groups or blinding of researchers, was absent in well over two-thirds of the papers analysed, in line with previous reviews of animal research in the neurosciences.5 54 58 Other biological and methodological parameters such as sex (only reported in the majority of papers in the ‘preclinical studies’ subsample) and method of choice for euthanising animals were also largely under-reported. The method used for euthanising animals has both animal welfare implications and scientific relevance, as the method affects biological and histological parameters differently, which can impact the postmortem data collected.59 60 The increase in the proportion of articles in our sample reporting sex of the animals is positive, as sex differences4 61–63 in the phenotype or response to therapeutic drugs may influence results and be of clinical relevance. However, although ALS guidelines propose the use of both male and female mice, little over half of the studies providing this information reported doing so. Overall, making these and other details on animals and protocol available is central to allowing an adequate interpretation of results and a critical evaluation of their validity, as well as allowing study replication and proper integration of results in systematic reviews and meta-analyses.31 64

Sample size was generally well reported, but of those reporting this parameter only a small minority used the 24 per group recommended in the 2010 guidelines.2 Furthermore, only three studies clearly justified group size, in agreement with previous reports that this is frequently overlooked, for example refs 31 65. Adequate sample size is paramount to ensure that animals, time and resources are not wasted as a result of underpowering experiments by using too few animals.66 67 Noise reduction by genetic standardisation could also help reduce the number of animals needed per study, as the reduced interindividual variability of isogenic strains allows increasing power without requiring more animals68 and is indeed mentioned in the 2007 guidelines as a way of reducing variability in drug testing.38 Mead and colleagues,69 for instance, have shown great consistency of results by using SOD1G93A transgenic mice on an inbred C57BL/6 genetic background, with the added advantage of presenting early indicators of disease progress, allowing for faster and more humane drug screening. Only 11% of the preclinical studies reviewed, however, used a fully inbred background. The use of a single well-characterised model for initial studies can be supported further by independent replication studies in a different disease model.

Most articles did not report random assignment of animals to groups or blinded outcome assessment. This reflects similar data from reviews on the methodological quality of preclinical research on ALS28 58 70 and other fields.31 33 71–73 This lack of attention to measures to avoid noise and biases in animal experiments is cause for concern, given their role in improving the reliability of results, as well as the translational value of preclinical research.16 24 33 67 71 While it cannot be excluded that in some cases blinding and randomisation were applied but not reported, one might expect that researchers carrying out well thought out and planned experiments would state such measures, since this strengthens their results and conclusions. There is ample evidence for many areas32 33 73–75 that published studies which do not report measures to minimise bias (ie, blinding, randomisation and allocation concealment) tend to present an exaggerated estimate of the therapeutic effect of experimental drugs. This is particularly relevant in the light of the ongoing discussion of why promising preclinical results of candidate drugs for ALS have not translated into the clinic. Although the disappointing outcomes of clinical trials apparently contradict the promising preclinical results that elicited them, they may actually mirror the results obtained from adequately designed animal studies carried out to high methodological standards.28 70

MSR and RCR scores were not influenced by the journal in which the results were published. Other researchers who have investigated the effect of journal on methodological standards and reporting quality have found a statistically significant but very small effect of whether or not the journal had endorsed the Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines.76 77

In contrast to previous research, this study indicated a gradual improvement in the methodological standards and regulatory compliance reporting scores over time. However, it is difficult to say to what extent this is the result of field-specific guidelines, as there is an overall increasing trend in these scores. Our study, of course, is limited to the period and model under study, and some improvements may have occurred as a result of the informal discussion leading up to the formal workshops and guidelines (and more recently, the appearance of other transgenic models means that the study does not cover the entire field of ALS research for later years). Also, a surprisingly low number of papers (1/84 in 2009, 10/106 in 2011, 10/115 in 2013 and 14/106 in 2015) referred to the Ludolph et al guidelines.2 38 Given the slow adoption of the ARRIVE guidelines,78 it seems likely it may also take some time for the ALS guidelines to have a detectable effect.

While reporting of relevant parameters such as blinding and randomisation was higher in our ‘preclinical’ subsample than what has been reported in other systematic reviews,16 31 76 78–81 the results for the overall sample were generally comparable. Also, and similarly to what was found in these systematic reviews, justification for sample size was rarely reported.

One way of addressing the problems with study quality could be for preclinical researchers to adopt the standards of randomised controlled trials in humans,82–85 including trial preregistration.86 87 Compliance with existing guidelines would seem a more readily achievable goal; however, other self-regulatory mechanisms may be warranted to improve compliance, such as changes to the publishing requirements of biomedical journals88–90 or more demanding requirements by science funders, both of which are clearly on the horizon.30 91


The ALS research community pioneered the development of field-specific guidelines, setting science community-based standards for animal research methodology and reporting.2 38 Whereas we found significant improvement over time, it is less clear to what extent this is linked to the guidelines, which are rarely referred to. Animal research in the field of ALS does however differ from comparable research in other reviewed fields in one aspect: the implementation of predefined endpoints in studies of advanced disease stages. This practice is important both for research quality and animal welfare and is indeed coherent with the field-specific guidelines. We propose that future guidelines should address measures to raise standards in the design, conduct and reporting of experiments, as well as to reduce the impact on animal welfare, as part of a concerted effort to make biomedical research using animals more ethically and socially acceptable and effective.


We thank Gilly Griffin for her input on current practice regarding humane endpoints in Canada.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.
  52. 52.
  53. 53.
  54. 54.
  55. 55.
  56. 56.
  57. 57.
  58. 58.
  59. 59.
  60. 60.
  61. 61.
  62. 62.
  63. 63.
  64. 64.
  65. 65.
  66. 66.
  67. 67.
  68. 68.
  69. 69.
  70. 70.
  71. 71.
  72. 72.
  73. 73.
  74. 74.
  75. 75.
  76. 76.
  77. 77.
  78. 78.
  79. 79.
  80. 80.
  81. 81.
  82. 82.
  83. 83.
  84. 84.
  85. 85.
  86. 86.
  87. 87.
  88. 88.
  89. 89.
  90. 90.
  91. 91.

Review history and Supplementary material

  • Data Supplement

    Peer review history and previous versions


  • Prepublication and Review History is available online at

  • Contributors Original idea for this study: NHF, IASO. Conception and design of the work: NHF, IASO, AJWF, AJG. Data collection: JGF, NHF. Data analysis and interpretation: NHF, JGF, JH, AJWF, AJG, IASO. Drafting the article: NHF, JGF. Critical revision of the article: AJG, AJWF, JH, IASO. Final approval of the version to be published: NHF, JGF, AJG, AJWF, JH, IASO.

  • Funding NHF was a recipient of a Postdoctoral Research Fellowship from the Portuguese Foundation for Science and Technology (FCT), grant reference SFRH/BPD/85978/2012. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7-HEALTH-2013-INNOVATION-1) under grant agreement no 602616 (Project ANIMPACT). Analysis and revision were supported by the project Norte-01-0145-FEDER-000008 - Porto Neurosciences and Neurologic Disease Research Initiative at i3S, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (FEDER) and FEDER - Fundo Europeu de Desenvolvimento Regional funds through the COMPETE 2020 - Operational Programme for Competitiveness and Internationalisation (POCI), Portugal 2020, and by Portuguese funds through FCT - Fundação para a Ciência e a Tecnologia/Ministério da Ciência, Tecnologia e Ensino Superior in the framework of the project ’Institute for Research and Innovation in Health Sciences' (POCI-01-0145-FEDER-007274).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned, externally peer reviewed.

  • Open data Data available in a public, open access repository (

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.