Critical review of the Admiralty Code

In this fourth (and long overdue) blog post on information grading (scoring) systems, I will dive into some of the criticism that the Admiralty Code – or in fact the different NATO and national systems as the practical application of the Admiralty Code – has received. I will do so alongside the arguments put forward in three articles, which are by Baker, McKendry and Mace (1968), Besombes and Revault d’Alonnes (2008), and most importantly, Irwin and Mandel (2019). I realise that other authors have critically discussed the system as well, however for the sake of brevity in this post I will concentrate on these key articles as they each represent different types of criticism.

I assume for this post that readers are already familiar with the Admiralty Code and will not further introduce the subject. If you’re not, please first read part 1, part 2 and part 3 of this series.

Lack of independence between scores and multiple dimensions in a single score

Early criticism on the NATO standard (as applied by the US Army) was formulated by Baker et al. (1968). They analysed over 1400 Army field intelligence reports and found a strong correspondence between the source rating and the information credibility rating. In fact, 87 percent of the ratings fell along the diagonal A1, B2, C3 etc., which according to Baker et al. implies that the two scales are not independent (1968: 13). Those applying the grading may (inadvertently) let their judgement of the accuracy of the information be influenced by the reliability of the source. That kind of use of the grading would indeed defy the idea behind the Admiralty Code. While it is not clear whether the study objects (i.e. army field intelligence reports) may have contributed to these findings, the potential cross-influence of the scores on source reliability on information credibility is something certainly to be taken into account.

The second point Baker et al. make, relates to the fact that the source reliability scale is unidimensional while the information probability scale is multidimensional, using the metrics of a) consistency with other information and b) plausibility (1968: 15). They argue that this multidimensionality could certainly be responsible for the misuse of the scale. I discussed the use of different dimensions (variables) in one grade also in part 2 when describing the old Dutch law enforcement information grading system. In that system the probability of the information was determined by the two variables ‘distance to the origin of the information’ and ‘corroboration’, in one grade. Such use of multiple elements in one scale is indeed likely to complicating matters, so another point to keep in mind.

One small point, I tend to disagree with the observation by Baker et al. that the scale by which the reliability of a source is measured is unidimensional. Going back to the origin of the Admiralty code (see part 1) we can see that the elements trustworthiness and competence should make up the reliability (credibility) of a source.

Readability and limited criteria

Another set of criticism is voiced by Besombes and Revault d’Alonnes (2008) who proceed from STANAG 2022 on intelligence reports (which however contains the same tables as in the current AJP -2.1 standard) and point out two faults they see in the NATO information evaluation system.

The first point they make is that using two axes is actually decreasing the readability of the score, making it more obscure to understand the communicated value. As an example, they ask the question what information should be regarded as the most probable: one rated B3 or rated C2? (2008: 1635). I’m not convinced though that this point really represents a big issue as the rating system was never devised for exact comparison between pieces of information.

In this respect I would like to quote McLachlan who warns against taking the notion of a hierarchy of information too seriously: “…. no piece of information is normally of great value on its own. When first received it is like a sentence without its context. Signal intelligence for example, in its raw state is seldom intelligible on its own to anyone but the expert who extracts it or deciphers it… it cannot be read and understood, even when translated, in isolation’ (McLachlan 1968: 25).

Besombes and Revault d’Alonnes proceed from the premises that “The reliability characterises the source, independently of the considered information. Therefore, every information delivered by a source is credited with the same reliability.” That is, or at least should be, incorrect. Under the Admiralty Code, as we already saw in the discussion on its origin, the proficiency (or competence) of the source should certainly taken into account when assessing the source’s reliability which was discussed in detail by McLachlan. Your aunt Betsy may be a very credible source on gardening, but much less credible when it comes to COVID-19 vaccinations, unless of course she does hold a PhD in medicine.

The second fault in the system as Besombes and Revault d’Alonnes argue, is that the criteria used to determine plausibility of the information – being reduced to the confirmation/denial of the information – is a too limited perspective and that additional criteria could be useful to express the confidence that an information deserves. They propose (2008: 1636) proficiency and likelihood as additional criteria to evaluate information.

I do agree with the fact that additional criteria perhaps could be useful when evaluating information, however I’m not convinced that the additional criteria Besombes and Revault d’Alonnes propose are the best suited. As already noted, proficiency is reflected in the reliability score on the source. I doubt whether separating this element from reliability in a distinct score adds any value. It certainly does not make matters easier(er). As for the additional element likelihood, they define it as a criterion that qualifies information based on our global take on the state of the world. A specific focus on that element I believe, however, increases the risk of confirmation bias and should not lightly be introduced in an evaluation system.

After their analysis Besombes and Revault d’Alonnes propose to use a scoring chain to come to a confidence indicator which expresses, in a single digit, a combination of all the four criteria. I’m not convinced whether this approach will lead to any better information evaluations. First, I believe that the proposed scoring chain is relative complex. That could work for sensory data where the scoring could be largely automated. However, I doubt that such a scoring chain could work for HUMINT and OSINT.

Secondly, condensing the final score into a single digit significantly reduces the communication value of the rating. Information from a very reliable (and proficient) source which however does not fit in the ‘global take on the sake of the world’ and which would otherwise be scored as ‘A4’ information, in their system would end up with a score of just ‘4’. With that single score, chances that weak signals are being recognised decrease dramatically. 

Communication and criteria

The last critique on the application of the Admiralty Code that I would like to discuss here are the points summarised in the very comprehensive article of Irwin and Mandel (2019) who argue that information evaluation methods mask, rather than effectively guide subjectivity in intelligence assessment. Their critique on the NATO evaluation system consists of three parts: (a) communicative, which relate to how ratings are communicated, (b) criterial, which relates to the rating determinants used and (c) structural, which relates to the position of information evaluation within the intelligence process (Ibid. 504). I will discuss the first two points here as the third falls a bit outside my intended scope for this blog post series.


In relation to the communicative value of the ratings, Irwin and Mandel argue that subjective interpretations of the boundaries between the ratings are likely to vary among users, as are interpretations of the relevant rating criteria. They argue that the distinction between a reliable (‘A’) source which is said to have a history of complete reliability, and a usually reliable (‘B’) source which is said to have a history of valid information most of the time, will be interpreted differently by different analysts. They key problems they see are that these descriptions do not come with numeric values (i.e., ‘batting averages’) and that there are (slightly) different applications across NATO countries. As a result, these terminological variations may contribute to miscommunication (Ibid. 506).

Further opportunities for miscommunication arise according to Irwin and Mandel because ‘information accuracy’ is used as a synonym for information credibility. And although information credibility often includes considerations of accuracy, it is usually conceptualised as a multidimensional construct. Similarly they argue that liberal use of terms conveying certainty (e.g., ‘confirmed’) also can result in a communicative issue. In particular in intelligence contexts where the information is seldom complete and often ambiguous or vague, these expressions could lead to overconfidence on the part of consumers (Ibid. 506).


Secondly, Irwin and Mandel point to a set of criterial issues which according to them come from the rating determinants incorporated by current evaluation methods. As a particularly problematic feature of the Admiralty Code they argue that it lacks situational considerations and implicitly treats source reliability as constant across different contexts. (2019: 507). On this point they reference Besombes and Revault d’Alonnes (2008), however as I previously discussed that I believe that this argument is flawed. Of course in practice source reliability may be treated as constant across different contexts, however that would then be a flaw in the execution because the original idea behind the Admiralty Code as described by McLachlan (1968) absolutely takes different contexts into account.

Irwin and Mandel thereafter argue that most of the methods they examined highlight reliability determinants such as ‘authenticity’, ‘competency’ and ‘trustworthiness’. However, they point our that these methods fail to formally define or operationalise those concepts and their inclusion is therefore likely to increase subjectivity and therefore may undermine the internal consistency of source reliability evaluations (2019: 507). Additionally, Irwin and Mandel point to the failure to distinguish between subjective sources versus objective sources, for example between human sources versus sensors, or primary sources versus secondary/relaying sources. Of course originally the Admiralty Code was devised to be applied in the context of HUMINT, so here is a valid point. In particular the distinction between primary sources versus secondary/relaying sources seems to be a recurring theme.

Again another valid point by Irwin and Mandel is that information credibility generally incorporates confirmation ‘by other independent sources’ as a key determinant. However, how many independent sources must provide confirmation for information to be judged credible? And should relationships of affinity, hostility or independence between the sources be considered as that seems absolutely relevant? Lastly, as I noted before, an emphasis on consistency with existing evidence may encourage confirmation bias hence there may be a delicate balance between confirmation and independence. All together, Irwin and Mandel do raise a number of valid points in relation to the criteria used to come to the reliability and credibility ratings, something that really needs further thought.


The criticism as discussed above shows that the Admiralty Code and derived information evaluation systems as currently in use in the Westen Military and Intelligence agencies are likely not perfect. In particular, the system was devised 80 years ago with a view on brining order to the chaos of the then information anarchy and was primarily aimed at evaluating information from HUMINT sources so may not be fit for purpose anymore in the 21st century.

That said, the strength of the Admiralty Code is certainly that it forces analysts to explicitly think about the different dimensions of the information they work with. Moreover, I have not come across an alternative that solves all potential points of criticism. More about potential (practical) solutions will be covered in one of the following posts in this series. In the post next-up, however, I will first map how other organisations such as NGOs and media aim to organise the evaluation of credibility of information from (open) sources.


Baker, J., J. McKendry and D. Mace (1968) Certitude Judgements in an Operational Environment. Technical Research Note 200. US Army Behavioral Science Research Laboratory

Besombes, J., and A. Revault d’Alonnes (2008) An Extension of STANAG2022 for Information Scoring. In ‘Proceedings of the 11th International Conference of Information Fusion’ p. 1635-1641

Irwin, D. and D. Mandel (2019) ‘Improving information evaluation for intelligence production’, Intelligence and National Security, Vol. 34(4): pp. 503-525.

(photo credit: @m23 via