Thierry Moreau

Document Number C005131

2011/03/01

Copyright (C) 2011 CONNOTECH Experts-conseils inc.

This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

PUDEC and the PUDEC logo are trademarks of CONNOTECH Experts-conseils inc.

Specifications subject to change without notice.

Table of contents

2.1 A Proposed Arrangement in the Field of Applied Cryptography

2.2 About Standards and Innovation

4. Two Side Lessons from Information Theory

4.1 Differentiation from NIST Publication SP-800-90

4.2 "Asymptotic Equipartition Property", Reconciling Intuition and Probabilities

5.1 The BBS Component in the PRNG Design

5.1.1 Entropy Preservation Upon BBS PRNG Seeding

5.2 The MTGFSR Component in the PRNG Design

6. Software Engineering Issues

6.1 Options: Abstract View and Taxonomy

6.1.2 Origin of BBS Modulus Parameter

6.1.4 PRNG State at Termination

6.1.5 Source of True Randomness

6.2 Options: Envisioned Profiles

6.2.1 Supplying a Named Pipe with Random bytes

6.2.2 Server Application with Small Random Number Usage

6.2.3 Server Application with Large Random Number Usage

C-Number |
Date |
Explanation |
---|---|---|

C005131 |
Current version | |

C005131 |
2011/03/01 |
Initial release |

This document addresses the software and system design of a secret random data source for use by cryptography-based information security applications. The intended field of use is high assurance systems, but with a bias towards innovation where standard or prevailing practice appear to have limitations.

The proposed arrangement is implementaion oriented and tied to a prevailing software building methodology in the Linux (or POSIX compliant systems) developer community. It is described with design options allowing a secre random source function for server systems and client applications.

Basically, the arrangement relies on a true random source (not strictly part of the arrangement provisions) to seed a PRNG (Pseudo Random Number Generator) of a hybrid design made of the BBS (Blum Blum and Shub) cryptographically strong generator and an LFSR-based combined generator. Each instance of this hybrid PRNG can be tailored with parameters to balance a very long period length, relative efficiency, and theoretical security. This parameterization occurs at the software build time.

The seed size determined by the PRNG parameters is correlated to the PRNG period length. The relationships between the parameters, seed size, and period are investigated in the perspective of entropy preservation from the true random source up to the PRNG output sequence delivered to cryptographic applications. The proposed arrangement allows a true random source with a non-compact encoding of its entropy contents, provided sufficiently large PRNG parameters. Encoding compactness and distribution uniformness are independent characterizations of a true random source. In cases where only a non-uniform distribution true random source is available, the proposed arrangement entropy preservation assurance may be relied upon for making good use of a not so good true random source.

The document discusses various aspects of theoretical or academic work that can support the proposed arrangement. This extends to a more abstract analysis of randomness in the information theory, where we were unable to see how the more abstract work can guide implementation oriented design decisions. The document remains focused on addressing practice oriented details that may impact the overall system or software security.

A secret random source is very often a security-critical portion of cryptographic systems, cryptographic software utilities running on ordinary computers, and cryptographic software functions which may be part of larger software packages such as a web browser. A suitable secret random source can not be entirely based on software: some unpredictable phenomenon must somehow provide unpredictable input to the software. From a sensor mechanism for the unpredictable phenomenon, a first software component makes a true random source that provides random data in digital format. The true random source interface is an important concept for the present document. A second software component takes the output from the true random source and delivers random data to cryptographic applications with the proper format, throughput, statistical quality, and assurance that an independent guess is unlikely.

The reader is assumed to be familiar with the terminology of this field; little emphasis is put on explaining generally accepted cryptographic system design principles.

The present proposal is intended for fail safe operations. Generally, a fail safe system is designed to react to a component failure by degrading its operating mode in a way that maximizes remaining safety margin. It might mean alternate modes of operations (e.g. in a flying aircraft) or the opposite, a system halt (e.g. the same aircraft during the procedure prior to take off). The failsafe assumption in the present context is in favor of a system halt. Computer server applications may be checked and restarted by operations personnel, a failed client application would trigger a call to a system administrator; these incidents are deemed preferable to an IT security breach caused by a failure of the secret random source. This observation pertains more to the dedication with which a secret random source failure is to be avoided than to the need for explicit failure detection in the high level design. A half good secret random source is no good, by the standard derived from the application of cryptographic system design principles. Note that the fail safe criteria applies to the secret random source as a whole; it is actually a design goal to provide a good secret random source from a not so good (by some criteria linked to entropy estimates) true random source.

This document presents the design of an arrangement for a secret random source for cryptographic systems. There are variants in the arrangement, but some characteristics remain throughout these variants, notably the use of a large period pseudo-random number generator (PRNG) and, correpondingly a lower emphasis on data throughput achievable from the unpredictable phenomenon in any variant. This idea of seeding a PRNG component from a true random source is nothing new. The design presented here includes the details required for avoiding the pitfalls of applied cryptography: while the academic literature may be satisfied with the general idea of seeding a cryptographically strong PRNG with a true random source, the actual implementation of such a thing might be vulnerable in many diverse ways.

The proposed arrangement contrasts with many hardware-based true random generators that claim to provide sufficient secret random throughput for every concievable cryptographic operations workload, thus making a PRNG component useless. It is not a purpose of the present document to argue the relative merits of the two approaches. Our sole purpose is to provide assurance that the proposed arrangement is adequate. Actually, a potential quality target might be precisely a high assurance high throughput hardware-based true random generator, as if such a thing was commercially available for any computing environment.

We present a design for applied cryptography implementations, including many design elements in the field of software engineering and refinements of scientific contributions in the field of cryptgraphy. In a few aspects however, the proposed arrangement may itself make a small contribution to the science (or rely on a scientific contribution already made by the present author). However, no review of the field is included that would put any such scientific contribution in perspective.

Some of the technical contents below is non-compliant with published standards achieving similar goals, or may sometimes appear departing from established practice. Accordingly, some background information may be useful about what might otherwise appear the result of a not invented here syndrome.

Generally speaking and according to the very semantic of these terms, innovation and standardization are antagonistic. While innovation calls for new ways of doing things, standards compliance means adherence to established principles. In the field of the present document, the most advanced standard document is a NIST special publication [1] that standardizes only some of the issues. While progressing towards a complete arrangement, it was found that the issues that are standardized in [1] are too constrained in the context of the proposed arrangement (as will be explained below, the PRNG period in the proposed arrangement requires more flexibililty than what is offered by the techniques standardized by NIST).

This bias towards innovation when an existing standard includes apparent restrictions is somehow inherited from the larger project that motivated the random source arrangement proposal. In this larger project, innovation is motivated by the challenges of authentication and identity management (at the portion of the problem space dealing with initial enrollment). If the existing IT security solutions fail to match the higher level of effectiveness and efficiency of IT soutions in general, maybe the existing standards and established practice need to be revisited.

Since the proposed arrangement extends to actual implementations, the established practices extend to cryptographic software components. In this category, a foremost example is openssl. While the proposed arrangement may turn into independent software implementations, the one produced by the author's organization is in whole a copyrighted work of this organization. While this situation is a matter of fact, some explanations are given here on the legal aspects, technical issues, and historical perspective.

The complete copyright ownership allows source code distribution with the GPL licensing terms, if need be modified to account for other intellectual property rights (notably invention patent rights over some uses of the cryptographic software when such uses may be not reprentative of the broader range of possible uses). The legal details are out of scope for the present document (for instance, modified is a misnomer for a GPL license to be made conditional to the acknowledgment of an interpretation of some facts). The general idea is that contributions to the community intellectual property should be possible without extending the legal framework to the anti-patents ideology. We are referring to the circumstances where an operating division A of a legal entity releases a software tool under the terms of the GPL and an operating division B uses that tool in a software implementation for a patented process (even if the patented process is implemented in software).

Furthermore, the complete copyright ownership allows inclusion of the software implementation in proprietary systems and software packages, through commercial license arrangements.

At the level of a technical analysis, the proposed arrangement implementation includes a software implementation of multiple precision integer arithmetic for public key cryptography (RSA acceleration) which relies on the Montgomery multiplication technique ([2] and [3]) for achieving decent speed with the current sizes of key material. While searching some of the open source solutions, some were found that failed to apply the Montgomery technique effectively, and others were found to include more multiplication techniques that would require extra software support with questionable benefits.

In the historical perspective, the proposed random source arrangement is implemented with a source code base dating from the early works of the present author in the field of applied public key cryptography. Throughout a long-term commitment to this research area, cryptographic primitives based on the integer factorization problem using the public exponent value two is a recurrent approach (the BBS pseudo-random generator [4], the PEKE key establishment protocol [5], the Rabin-Williams signature scheme [6], [7], [8], and an application of the MASH number theoretic hash function [9], [10]). Note that the cryptosystems based on the discrete logarithm problem, or its elliptic curve equivalent, also form a rich family, including schemes that made it through the adoption-standardization race and schemes that didn't despite sound theoretical support. The recurrent recourse to integer-factorization cryptosystems is mainly a matter of efficiency in project development, both at the implementation level and as a matter of re-use of theoretical foundations.

A secret random source is a system component or building block in a secure system assembly. The present specification is indeed developed as part of a larger development effort targeted at improved initial registration procedures for identity, authentication, and authorization management system. In this larger scope project, some simplifying assumptions are needed in order to keep the scope manageable.

The selection of a computer language sets limits in the deployment potential for an implementation of a given software logic solution. In the present case, the C++ language is used, and no special attention is paid to remain compatible with older C++ compilers which may not support language features present in the 2003 version of the C++ standard.

The typical Linux development environment is relied upon, including the autoconf and automake software building process conventions. This development methodology is intended to make it simple to distribute software to Linux users in source form and have it compiled and installed by the users with minimal support services. Actually, this methodology should support Linux and other systems as well, notably other POSIX-compliant systems on which a modern C++ compiler is available.

A true random source is needed for any instantiation of the proposed arrangement. Its selection is left to a later phase in the system design refinement process due to the diversity of solutions to the problem of integrating an unpredictable phenomenon in computer systems where determinism is generally desirable. Nonetheless, two specific alternatives are covered to a greater extent, each for its own cause:

- the operating system provided
`/dev/random`

special device, because it is readily available to the software (but not necessarily with high assurance) and - the PUDEC scheme developed by the author organisation ([11], [12]), because it provides self-evident entropy characteristics (but with an unusal dependency on trusted personnel).

The operating system provided true random source would be replaced by the `CryptGenRandom`

function in the Microsoft Windows environment.

The present document is in an ackward position between very precise software engineering issues and some claims of benefits originating from theoretical results, most notably a PRNG component *proven* to be cryptographically strong and the information theoretic definition of entropy. We want to avoid the pitfall of forgetting how mathematics shapes the commonly used technologies, a pitfall which in the case of cryptographic techniques may lead to dangerous overconfidence (reliance on theoretical results while assumptions are not verified) or unnecessary flaws (non-use of avalable theoretical results in favor of established practice).

The cryptographically strong qualification for a PRNG may refer to different things to different poeple:

- to the software engineer, it means some code reuse from cryptographic primitives such as block cipher, hash function, or public key cryptography operations,
- to the IT security specialist, it means some recognition, e.g. through standardization or academic publications from well-known cryptographers, that a PRNG does qualify,
- to a PhD candidate in computer science, the concept refers to theories at various levels of abstraction.

In this proposal, we basically attempt to make good use of the theoretical results, as a skeptical IT security specialist, without claiming to master them as a PhD candidate. As a general discussion of the relevancy of this approach, see reference [13]. Some proofs appear more useful than others. The ones relevant to our specific choice of cryptographically strong PRNG are those supporting the integer-factorization family of public key cryptosystems, more specifically with the public exponent two (foremostly the Rabin-Williams cryptosystem).

Some theoretical results are just too extreme for finding practical applications; they typically prove that a given problem can have a strong solution, but only with a very inefficient and/or impractical example. In a sense, the very definition of entropy is a problem with solutions that lack practical implementation in most instances of a truly random source. Indeed, the entropy definitions refer to probability distributions of independent samples of an unpredictable phenomenon, something that is usually hard to establish in practice and not needed for any other purpose in a digital system. The simple cases where an entropy estimate is almost trivial, such as PUDEC, do not need the most advanced theories. In more complex cases, the theories may require a more precise determination of a probability distribution than what is possible. In the proposed arrangement, the focus is placed on entropy *preservation* irrespective of the entropy definition (there is more than the classical Shanon information theory definition of entropy).

It is unfortunate that an important standardization document in the field of random secret generation for cryptography ([1]) uses the term entropy for the very data obtained from a true random source, and also amalgamates *secrecy protection effectiveness* with the information contents *measurement* associated with a probability distribution: "*Entropy is relative to an adversary and his *ability to observe *or predict a value.*" (appendix C, section C.1, emphasis added). The analysis of entropy as a measurement of information contents should be separate from a security review of secrecy protection mechanisms. In the present document, entropy refers to an information contents measurement from a probability distribution associated with an unpredictable event, and we prefer true random source to entropy source.

Among the challenges faced by theoreticians is the emergence of unexpected attack strategies, for instance the fault injection attack that allow an attacker to infer some information about secret variables used in computations from intentional interference with a cryptosystem operations, e.g. in a smartcard form factor. The field of use for the proposed secret random source arrangement is not immune from unexpected attack strategies. But is should be kept in mind that the two foremost practical problems, namely the preservation of long-term secrets held within a system boundary and the reliability assessment for a true random source, find very little assistance from theoretical work.

The diagram below shows where a PRNG may fit the overall picture of a secret random source. The three upper components refer to the analytical framework from [1] separating digitalization (e.g. an instrument's firmware) from conditioning (algorithmic processing for turning noise measurements into fair binary data with an entropy estimate). The proposed arrangement applies only to the last of the three options after the conditioning algorithm. The one-time-pad encryption is a "perfect" encryption scheme, yet totally unmanageable from the perspective of modern cryptographic key management; it is shown on the diagram for its tutorial value.

unpredictable phenomenon -\ | | V | digitalization +-- true random source | | V | conditioning -/ | +---> one-time-pad encryption | +-------------> application deterministic processing | +---> PRNG ---> application deterministic processing

The proposed arrangement includes three main aspects:

- a PRNG design,
- system integration provisions addressing the data protections essential to PRNG state data like other cryptographic key material deemed to be present where the random source arrangement is needed, and
- included in these system integration provisions, arrangement options for the true random source on which secret random generation ultimately (and inescapably) rests.

The PRNG design is characterized by software building or installation parameters for adjustable period length, such that a low entropy rate provided by the true random source may fill a very large PRNG state, thus meeting a minumum entropy requirement with no other entropy pool mixing method. In addition, obviously, the PRNG design has features that make it cryptographically strong.

The true random source options are meant to fulfill low throughput requirements derived from the overall arrangement. *No periodic or usage-based re-seeding of the PRNG from the true random source is included in the proposed arrangement.* This admitedly goes against generally accepted dsign principles for a secret random source, but it is a deliberate design choice. It is believed that better overall security can be achieved if other design aspects are given attention that would otherwise be allocated to make the periodic or usage-based re-seeding. The principle is that someone in charge of deployment of an implementation of the arrangement should pay close attention to the validity of the true random source, but only in the context of PRNG *initial seeding*.

Other system integration provisions are protection requirements derived from the PRNG-related data that must be handled as critical cryptographic key material for the overall system or software to be secure. In the above diagram, equivalent protection requirements are almost certainly derived from the (cryptographic portions of the) application deterministic processing.

As indicated in the introduction, the proposed arrangement focuses on entropy *preservation* more than optimisation or minimum assurance for a given entropy measurement definition. In the present section, we attempt to explain why entropy measurements can hardly be the primary focus of the proposed arrangament, mainly by reference to theoretical work in the field.

An in-depth review of the notion of entropy from a theoretical perspective, can be found in a PhD Thesis by Christian Cachin [14]. This reference reviews the existing entropy definitions and makes a contribution called smooth entropy, which appears seldom practical but is justified by a question that reveals limitations of the existing entropy definitions: "The main question is: Given an arbitrary random source, how many uniformly random bits can be extracted? We allow for an arbitrarily small deviation of the output bits from perfectly uniform random bits that may include a small correlation with the random bits used for smoothing." (Introduction to chapter 4 in the reference [14].) This citation indicates some dissatisfaction with existing entropy definitions. The presence of "random bits used for smoothing" positions the smooth entropy concept as a foundational one rather than a practical one. The presence of these random bits in the work of Cachin are inherited from the universal hash function theory in which they also make things more foundational and impractical.

The limitation of entropy measurements for secret random generation can be illustrated with an example of seeding a PRNG from an extremely simple true random source with a non-uniform distribution: a six-face die where the number 6 is replaced by a 1 (the drawing of a 1 has a 1/3 probability instead of 1/6 as for the outcomes 2 to 5), the die being rolled exactly once, and no algorithmic processing qualifying as conditioning in the above diagram. There are thus just 5 possible PRNG seeds, hence 5 possible PRNG output sequences, with a non-uniform distribution. It should be noted that statistical testing for any of the 5 possible output sequences would normally pass since it only tests the PRNG deterministic properties (the PRNG capability to output something looking random). More central to our purpose is that the entropy measure does not readily catch the impact of a non-uniform distribution. In practical true random source arrangements, the solution usually lies in a conditioning algorithm that reduces the true random source throughput but ensures a uniform distribution.

We are now ready to mention the various definitions of entropy. In some sense, the raw entropy measurement would be the Shanon information theoretic entropy, and some other measurement would be needed for estimating the useful randomness for cryptographic assurance purposes. These other measurements would be needed when the conditioning algorithm does not provide the desired iniform distribution. The reference [14] heavily relies on the Rényi entropy as a generalization of information contents measurement over a range of indices reflectig the varying measurement weight of events with lower probabilities. The minimal entropy is positioned at one extreme of the indices range (as the asymptotic value of the Rényi entropy indice grows towards infinity).

The proposed arrangement is intended for a diversified set of deployment contexts. It is when no good true random source (with a uniform distribution and acceptable throughput) is available that the *entropy preservation* focus is relevant: with its unusually large PRNG state seeded from a less than ideal true random source, the proposed arrangement is intended to perform well overall.

This section does not contribute to the arrangement description. Instead, it provides additional rationales as background information.

The reference [1] (clearly positionned on the practical side of things) correctly refers to minimal entropy as the conservative entropy definition but mandates either an optional conditioning function (appendix C) or algorithmic post-processing (appendix D) that produce full entropy, in which case the minimum entropy estimate is the same as the Shanon entropy estimate. No specific algorithms are provided in either case, which creates an unusual occurrence of an IT security design certification checklist with a large blind area. It is our understanding that NIST, and other governmental or industrial overseeing bodies in the field of applied cryptographic techniques, are satisfied with vendor solutions and certification experts making design reviews with their best knowledge as the certification basis. The present proposed arrangement does not mandate a uniform distribution true random source; the issue is deferred to the design of an arrangement profile or instantiation.

Also, note that the definition of full entropy in [1] is needlessly restrictive in mandating coding compactness in addition to uniform distribution: uniform distribution of a seven digit BCD (Binary Coded Decimal) value in the range `[1..1048576]`

is just as good as 20 random fair independent bits (`1048576=2`

), but the BCD encoding occupies 28 bits to deliver 20 bits of entropy (Shanon or minimal) and is thus formally non-compliant. But actually a cryptographically strong PRNG seeded with either value will have exactly the same diversity in the output sequences, i.e. the same entropy (the PRNG is assumed to have more than 28 bits of state). With a large and configurable PRNG state, the present proposed arrangement is intended to be tolerant to true random sources not providing coding compactness for entropy.
^{20}

A very strong intuition about randomness is that a long sequence of zeroes is not a valid output of a random source. However, the probability of such a sequence (assuming a uniform distribution of each independent digit) is exactly the same as any other sequences of the same length. We first give a narrative explanation about this apparent paradox, and then see how the information theory handles it.

If an observer has a-priori assurance that a random source is fair, a long sequence of zeroes shouldn't raise a question: the sequence is, indeed, possible. However, such a-priori assurance is never absolute and the monitoring of a random output sequence for redundant patterns is (in principle) a sound strategy for detecting a random source fault. However, one has to be ready to handle false fault alarms from a redundant pattern occurring with a very small probability, and this false alarm handling is operationally next to impossible (try to explain to a manager with operational accountability that the alarm raised by a very high assurance device is to be expected from time to time and how he should investigate to set the true alarms apart from the false ones). This is one important reason why statistical testing embedded in cryptographic systems is of limited effectiveness. For the system designer, the foremost lesson should be a strong reminder that the exact probability distribution of a true random source is usually very hard to ascertain, as it relates to the process by which an output sequence was generated.

In statistics, an output sequence made of random digits is a series of observations from an i.i.d. variable, that is an independent and identically distributed random variable. This definitional model for the basic distribution in a true random source is used in the development of information theory refinements.

Before we look at how the information theory models this explanation, we should envision statistical testing as a generalization of monitoring an output sequence for redundant patterns. Statistical testing might cut the sequence in chunks of equal length and apply a test for patterns that would be representative of a non-random generation process. The test is likely based on a classification of chunks into a defined types with known probability distribution (the probability that a chunk is classified into a given type if the chunk is output by a process following the basic output probability distribution). Then, deviations from the expected classification distribution after the classification of a number of chunks would make the test fail.

So, here would be the output sequence statistical test process steps (introducing AEP terminology):

- a long enough portion of the output sequence is recorded;
- it is cut in chunks of equal length (hereafter just sequence);
- some classification of the sequences group them by type classes;
- the type classes are further grouped in sets (the AEP defines the typical set and the non-typical set); and
- the partitioning of sequence observations into sets is compared with the expected distribution.

The Asymptotic Equipartition Property (AEP) is a nice distribution property for a classification of sequences of a given length. It basically partitions the sequences into a typical set and a non-typical set (indeed the sequence made of all zeroes lies in the non-typical set), with the probability of a sequence falling into the typical set growing to 1 as the sequence length grows to infinity. The AEP applies to any basic probability distribution; someone reviewing the mathematical development in [14] should pay attention to the impact of a uniform distribution on the various propositions.

The reader may think of a die shuffling event repeated 100 times as a concrete example of a sequence with a uniform basic distribution, and the same sequence with the outcome 6 replaced by a 1 for the non-uniform distribution.

The classification used for the AEP theory is based on the count of occurrences of the basic alphabet (1 to 6 and 1 to 5 in our two examples) in the sequence. This choice at once matches an intuitive understanding of rare sequences (including those with a unusually large number of occurrences for one outcome), and the mathematical formalism for a probability distribution (the relative number of occurrences of symbols in a sequence adds up to 1 in all cases). In the reference [14], this coincidence is labeled empirical probability distribution, but we stress that the similar mathematical formalism does not match any meaningful random variable in practice. The similarity of mathematical formalism is a mere convenience for the development of the information theory, and certainly a useful one. Notably the properties of relative entropy may be used in the mathematical developments (hint: this is a specific area where a uniform basic distribution allows simplification).

Note that the classification used for information theory does not catch other sequences, such as regularly cyclic ones, that would intuitively be identified as non-random.

Given the classification described in the preceding paragraph, the AEP theory defines the typical set as including sequences where the maximal deviation of relative count of occurrences in relation to the basic probability distribution is less than a parametric margin. With the uniform distribution die example, we would expect roughly 1/6 of outcomes for each possible digits and if any count deviates above this 1/6 plus the margin (or below this 1/6 minus the margin), the sequence is considered non-typical. That's it for the AEP assumptions; the rest is theoretical results or propositions, and those may either find applications or remain in the large universe of theoretical knowledge. In this respect, the main result is that the typical set has probability almost 1. Note that the definition of the typical set is dependent on the basic probability distribution, and the uniform definition simplifies the theoretical development by reducing it to a counting problem.

In some sense, the AEP, which is considered fundamental for information theory, does not seem to help the practitioner (except as a theory validating some statistical tests which are impractical for other reasons). For the uniform distribution, the theory appears little more enlightening than simple probability counting. The elements of AEP non-typical set are rare events due to the way the AEP assumptions have been made. That is a somehow useful conclusion: we reviewed the theory and found no clear directions about how our arrangement might be improved. For the non-uniform distribution, no theoretical developments past the reference [14] has been identified by the present author.

We now go one step extra with this discussion of the AEP with the uniform basic distribution, looking at how it adapts to transformations of the sequences. These transformations are taken from the tool set of applied cryptography. Let us assume a moment that the AEP non-typical set would be a valid tool in practice, because it would adequately model non-randomness. If we transform the output sequence with a stream cipher arrangement (for an alphabet of 0 to 1), or a polyalphabetic cipher with long non-repeating key (for an alphabet from 1 to 6), we shuffle the sequences. The sequences in the non-typical set are mapped anywhere (i.e. in the typical set with a high probability). Since the basic probability and independence of events are not changed and a new non-typical set appears, the original assumption can not be true: the original non-typical set has no fundemental non-randomness property. For the non uniform basic probability distribution, the same transformation invalidates the "identically distributed" aspect of the i.i.d. assumption.

This last avenue may be extended further with an hypothetical fixed arbitrary permutation applied to complete sequences considered by the AEP theory. In this case, the overall entropy of such random sequences is preserved, but the theoretical development may be invalidated for the non-uniform basic distribution because the individual events in the sequence are no longer independent.

The proposed arrangement is something along this line of thought: a true random sequence is fed into a PRNG seeding software component (designed to preserve the entropy in some very ad hoc practical sense) and then is reflected in the PRNG output sequence. The AEP theory is of little assistance for the validity of this arrangement. Instead, this arrangement validity should rest on a case by case determination of probability distribution for output sequences from each true random source.

One last observation: maybe the fact that the AEP handles the non-uniform distribution seamlessly along the uniform one is a sign that a non-uniform distribution for a true random source is not really detrimental to the overall effectiveness of the proposed secret random source arrangement.

The PRNG design is made of a Blum-Blum-Shub (BBS) generator and a composite LFSR-based generator by the acronym MTGFSR (Multiplexed and Twisted Generalized Feedback Shift register), run in parallel and combined with the bitwise exclusive-or operation. While the BBS component may be run in configurations where it qualifies as *cryptographic strength*, the MTGFSR it totally vulnerable to trivial analysis if small output sequence is observed outside of the system (an output sequence roughly as short as twice the generator internal state is sufficient).

The BBS and the MTGFSR components are each described in a subsection below. For each of them, the formal specification of the deterministic algorithm is found in referenced documents, but design parameters and other implementation issues are presented below. A less deterministic issue is the seeding from a true random source which requires special attention to entropy preservation as the foremost goal of the proposed random source arrangement, and as a little studied topic.

The PRNG design includes a state store and restore capability, and a separate seeding capability from a true random source. In other words, to seed the PRNG is not exactly to restore the PRNG state from truly random data.

The mathematical theory supporting the BBS generator is fully explained in the original publication, reference [4], including one of the earliest mathematical proofs of reduction to the integer factorization problem (breaking the PRNG is at least as hard as factoring the composite modulus used in a BBS instance). Since then, finer mathematical proofs about the tightness of such reduction has been applied to a closely related algorithm, namely the Rabin-Williams digital signature scheme ([15], [16]).

In order to benefit from the BBS theoretical foundations, an instance must meet some requirements:

- the modulus must be of adequate size (although the threat model is different, modulus size guidelines for RSA digital signatures are relevant to the BBS implementation as conservative choices),
- the number of bits extracted at each BBS generator step should not be set above the recommendation from the latest theoretical work (starting from [17] but also [15] or [16]),
- the factorization of the modulus must be unknown to the adversary (this factorization should preferably be completely forgotten, and may be so forgotten unless the efficient direct access to portions of the BBS output sequence is a needed capability, which is
*not*the case for the proposed arrangement), and - to some extent, the prime number generation process for the composite modulus must adhere to recommendations applicable to the RSA prime number generation (see the general conclusion in [18] and [19], which argue against using prime number of special forms).

About this last point, the original BBS publication investigates the relationship between strong primes (before the term was used for the RSA prime number generation) and the BBS generator period. But to the extent that the RSA iterated encryption attack does not bother the RSA implementators following mathematician advice, short period BBS moduli should not be a concern.

The BBS software implementation may make an inconsequential modification to the BBS algorithm: the state of the generator is kept in the Montgomery representation ([2] and [3]) from which the output bits are extracted. Strict implementation of the original algorithm would require a conversion from the Montgomery representation to the natural integer representation at each generator step, which would not provide any security benefit (a proof to the contrary would challenge the whole theoretical foundation). Also, the software adjusts the number of bits extracted to be a multiple of the size of the elementary storage unit for the associated MTGFSR (normally a multiple of 8 as in 8 bits for a byte).

At the system integration level, the proposed arrangement recommends to handle the BBS modulus (and its factorization) as a critical cryptographic parameter:

- no default BBS modulus should be trusted in arrangement variants where either portions of the random source outcome might become visible outside the system, or the PRNG state lasts during periods where the system is not under direct control by trusted personnel,
- an organization relying on the proposed arrangement should deploy the software implemantation with a BBS modulus selected by trusted personnel with assurance that the prime factors are irrevocably deleted soon after the composite modulus is computed, and
- an arrangement variant
*may*handle the composite modulus like a long term secret, thus providing another layer of protection.

It should be noted that the BBS prime number generation itself requires a secret random source, leading to a chicken vs egg situation. The recommended solution is an arrangement variant meeting the conditions where a default BBS modulus may be trusted as explained in section "6.2.5 Bootstrapping a Non-Default BBS Modulus Parameter". An alternative method is the use of an off-the-shelf RSA key pair generation utility, but this ultimately relies on a secret random source arrangement and the reader might wonder on which basis this other arrangement should be trusted.

In its original specification, the initial seeding of the BBS state is a value assignment to a large integer variable in the range `[2..N-1]`

where `N`

is the large BBS modulus parameter (several hundred bits long and an odd value), followed by one BBS generator step, as in `x`

. With respect to entropy characteristics, two questions arise: the preservation of entropy from the true random source to the BBS state (i.e. _{1}=x_{0}^{2} mod N`x`

for _{i}`i=1,2,3,...`

as the BBS steps are repeated), and the handling of low entropy source from which statistical flaws must not be carried to the BBS generator pseudo-random output. The answers to both questions draw a lot from the BBS theoretical foundation: the seeding operation looses at most exactly 2 bits of entropy in the transition from `x`

to _{0}`x`

(same as the 4:1 ambiguity in the Rabin-Williams cryptosystem), and the BBS pseudo-random output has good statistical properties irrespective of the seed _{1}`x`

.
_{0}

In converting true random data into an initial value for `x`

, it is not necessary that the full range of initial values have an equal probability: since probability in the context of a true random source actually refers to entropy, the full distribution requirement would imply that a perfect true random source is available in the first place. The entropy preservation requirement is a different one: for any probability distribution of data obtained from a true random source, the conversion into _{0}`x`

should preserve the entropy. The only way to achieve this is that each true random data value is converted into a distinct _{0}`x`

value.
_{0}

Moreover, it is useful to ensure that `x`

so that the generator seeding operation actually shuffles (diffuses) the bits of _{0}>square_root(N)`x`

. One reason is that the data redundancy that might occur in a low entropy source is replaced by a random-looking number. This is not a contradiction with the BBS proven statistical properties as it might look: small values of _{0}`x`

that are not obscured by the BBS step function _{i}`x`

are just a statistical oddity with an extremely low rate of occurrence. To force a modular reduction to occur in the BBS seeding operation avoids such rare events, and guarantees entropy preservation for the first output values obtained from the generator.
_{i+1}=x_{i}^{2} mod N

As an incidental observation, the BBS seeding step `x`

is deemed to fool any scheme for automated detection of entropy failures in actual systems. This property is indeed present in any PRNG with claims of cryptographic strength. Embedded statistical testing should be applied to the true random source directly, and only if operational arrangements can be made to handle the false alarms that statistical testing is deemed to trigger from time to time. There are no explicit provisions for statistical testing in the present proposal.
_{1}=x_{0}^{2} mod N

While the above considerations may seem complex and/or very subtle, they may turn into a simple BBS seeding operation. In the BBS software implementation, the BBS initial seeding proceeds as follows. Starting with the binary representation of `N`

, from 2 to 9 of the most significant bits are retained such that the remaining bits are byte-aligned; one is subtracted from the retained bits at their least significant position, and the remaining bits are replaced by truly random data.

As a final note about the BBS, here is an explanation about the period of the generated sequence. Empirical observations shows that the longest period occurs with a very high probability irrespective of the number-theoretic conditions where its occurrence is proved (in [4]). The longest period is almost `N/8`

, or more exactly `((P-1)*(Q-1))/8`

where `N=P*Q`

(there are almost `N*3/4`

initial values for `x`

lost in the 4:1 ambiguity resolution and _{0}`P+Q-1`

other values leading to shorter periods, which represents an extremely low probability of occurrence for a modulus `N`

large enough for its factorization to be impractical). There are two output cycles for a given N with the large period size. Accordingly, the BBS seeding operation can be seen as the loss of two bits, one bit used to select one of the two output cycles, and the remaining bits setting the initial position or index within this cycle. A similar analysis is made below for the MTGFSR.

The MTGFSR (defined in [20]) builds on academic publications ([21] and [22]) in which an improvement to LFSR-based pseudo-random generators is suggested, along with theorems for the selection of proper polynomials for assurance about the period of the resulting generator. In this class of PRNG algorithms, it is easy to validate the implementation with the Berlekamp-Massey algorithm [23] that an attacker would use if the PRNG was used for stream cipher encryption.

Further work in the field of random number generation for computer simulations led to the Mersenne twister, a huge period PRNG, and the comprehensive study found in the PhD thesis of François Panneton ([24], and academic publications including [25], [26], and [27]). The Mersenne twister is somehow popular but does not appear really differentiated from the MTFSR in view of the Panneton work.

The MTGFSR component in the proposed arrangement is kept as originally designed. The rationale for the selection of the MTGFSR (or the excuse for not revisiting this selection) is its relative flexibility for period length in the range of lengths of interest (typically from 2^{100} to 2^{10000}). It is deemed preferable to have a PRNG design from a class that is well studied, even if from a perspective different from its contribution potential in a hybryd crypto-non-crypto PRNG design, than to rely on algorithms that are lesser understood.

Basically, the MTGFSR implementation in the proposed arrangement is made of 8 TGFSR multiplexed on the 8 bit positions of a computer byte, taking a single TGFSR output bit at each step (instead of `w`

bits, typically a computer word size or the mantissa size of a floating point representation, as in the oringinal TGFSR). With the notation used in the reference [20], the interleave factor is `F=8`

and the number of bits retained at each PRNG iteration is `t=8`

(two such examples are given in the reference). In this document, we implicitly refer to the MTGFSR with the limitation `F=t`

. As with every MTGFSR parameter set selection, the multiplexed TGFSR must share the parameters `N`

and `M`

(the symbol `N`

is unrelated to the same symbol in the preceeding section about the BBS generator). What makes each of the multiplexed TGFSR different is the polynomial `A`

of degree _{j}`w`

(for _{j}`j`

from `0`

to `F-1`

inclusive).

In order to achieve a large period, the multiplexed TGFSR components should each have a distinct parameter `w`

, such that their respective periods _{j}`2`

have relatively few common factors. The MTGFSR overall period is the least common multiple of the periods of its multiplexed TGFSR. A mechanical analogy for the MTGFSR is eight gear wheels with different teeth counts, all on the same axis or rotation and advancing synchronously one tooth at a time. The MGTFSR state is made of the eight tooth index in front of a marker slot (parallel to the axis of rotation). The MTGFSR seeding operation is the independent random positioning of every wheels. The MTGFSR period is the number of steps required for the state to return to an initial position.
^{(Nwj)}-1

The MTGFSR software implementation uses the C++ template mechanism for software abstraction without run-time performance penalty. The benefits of this approach may be more apparent in the software engineering discipline than to mathematical programming per se, so the MTGFSR software implementation is not offered as a general purpose reference implementation. For instance, the C++ template specialization programming construct is used to allow future modifications where storage units larger than a byte are used (e.g. `F=32`

), while the C preprocessor conditional compilation construct would be more familiar to mathematicians.

The parameter search for the (M)TGFSR requires a special-purpose software utility that is not easy to integrate in an automated procedure. The required utility searches candidate polynomials `A`

for given parameters `w`

, `N`

, and `M`

. This candidate search is exhaustive; it is the user's responsibility to provide parameters that that do not trigger long runs (limits may be specified for the candidate polynomials `A`

). When a candidate is found, the utility reports it along with the larger polynomial for the equivalent LFSR (a TGFSR from which a single bit is extracted is merely an optimization trick for a larger LFSR) and the number of non-zero terms in this larger polynomial. These data elements allow a final verification of the generator with the Berlekamp-Massey algorithm (if everything is fine, the larger polynomial is recovered by the algorithm), and the manual selection of multiplexed TGFSR for the MTGFSR design (more non-zero terms is deemed preferable, up to half of the terms). For fulfilling its role, the software utility needs a-priori knowledge of `2`

factorizations for the various ^{Nw}-1`N`

and `w`

values it might encounter. In this respect, the knowledge shared by the mankind is found in the Cunningham factorization project [28], and is being advanced with the current limit around `2`

. It is somehow anecdotal that the large MTGFSR periods depend on the large number factorization achievements while the BBS theoretical foundation rests on the difficulty of large number factorization. The present author is not aware of any logical connection between these two opposite dependencies.
^{1000}-1

Once the rules for multiplexing TGFSR components into an MTGFSR design are understood, the use of the utility to assemble an MTGFSR design from individual TGFSR components should not be too difficult. The utility provides a special mode or command that reports the overall MTGFSR period length from a common `N`

and the list of individual `w`

.
_{j}

Contrary to the BBS, the MTGFSR benefits from very limited theoretical support for statistical quality, and lacks the cryptographic property that prevents going backwards in the history of PRNG states. We thus present the MTGFSR seeding operation details first, and then discuss the limited assurance that might be enjoyed.

The MTGFSR state is made of a fixed number of bits (based on the MTGFSR parameters), and can take almost any value: the only restriction is that every bits allocated to any of the eight TGFSR components must not be all zeroes. This special condition should be extremely rare with a large period MTGFSR but is handled in the MTGFSR seeding operation (a `1`

bit is set where it triggers an exclusive or at the next MTGFSR step).

The MTGFSR seeding operation is the collection of bits from the truly random source in sufficient number, an exclusive or with an arbitrary constant bit string, allocation of resulting bits to MTGFSR state bits, the above-mentioned special case processing, followed by a sufficient number of MTGFSR steps so that the internal state bits has been cycled at least once. No information is lost in the process. It remains to be seen to which extent the MTGFSR seeding operation matches the ideal situation where *all* initialized state bits would contribute to the index setting towards a *single* non-repeating pseudo-random cycle.

The MTGFSR output sequence period is always exactly `lcm(2`

where ^{(Nw0)}-1,2^{(Nw1)}-1,2^{(Nw2)}-1,...,2^{(Nw7)}-1)`lcm()`

is the least common multiple function. Within one such MTGFSR output cycle, the non-repeating property is certain, and the statistical properties may be assumed to be no much better nor much worse than what is reported in the relevant academic literature (e.g. discussion and references cited in [25], [26], or [27]). Thus a large portion of the true random data used to seed the MTGFSR selects an index into a long pseudo-random cycle and the remaining part selects one such cycle among a set of possibilities. With the two example MTGFSR found in the reference [20], the period (cycle length) is about `10`

(respectively ^{313}`10`

), the cycle count is about ^{798}`10`

(respectively ^{78}`10`

), and correspondingly the MTGFSR state diversity is about ^{47}`10`

(respectively ^{391}`10`

).
^{845}

Here are a few observations about the MTGFSR usage in the context of the present overall secret random source arrangements. The purpose of the MTGFSR is to preserve the true random source entropy when the combined PRNG is seeded. With a larger MTGFSR state, it becomes possible to use a longer output sequence from a low entropy source (e.g. random English text coded in 8 bits ASCII) and achieve the same overall entropy as from a shorter sequence from a high entropy source (e.g. the PUDEC scheme having a self-evident full entropy output, [11], [12]).

In practice, a larger MTGFSR is not necessarily better, since it would slow down the initialization operation from a good entropy true random source with a limited data rate. In general, a characterization of the true random source (and a minimum entropy target for the system) should be used to select the MTGFSR design parameters.

The purpose of the repeated MTGFSR steps in the seeding operation should be clear from the principle of operation of LFSR-based PRNG in general: the entropy contents of the initial state is spread among the whole state in an unknown pattern (it might be all at the end for the purpose of analysis), but the initial state influences the PRNG output gradually. The exclusive or with an arbitrary constant is meant to avoid the zeroland statistical oddity ([26] presents generators with built-in resilience to seeding with almost all zero bits).

The MTGFSR usage in the overall arrangement is intended as an entropy preserving strategy. Accordingly, a main claim of the present document with respect to algorithmic aspects of IT security is that *the PRNG output sequence indexing function achieved by the MTGFSR state seeding operation preserve the entropy up to the output sequence*. This is a conjecture. It might be invalidated by a counter-example, e.g. if it was shown that two independent MTGFSR output cycles have a non-negligible probability of having strong statistical correlation that would increase the probability of identical output sub-sequences. A conservative approach to the entropy preservation potential might thus restrict the quantitative analysys to the MTGFSR period instead of the full MTGFSR state. Note that this claim is separate from security claims about the combination of a BBS and a MTGFSR PRNG.

The combined PRNG design is simply the bitwise exclusive or operation applied to the BBS output and the MTGFSR output run in parallel. The current state of the combined PRNG is made of the independent state of each component, plus some buffering data (the elementary output size for the BBS is larger than for the MTGFSR). No effort is made to keep the two independent states synchronized through a state store followed by a state restore, the rationale being that the BBS may require one extra modular squaring operation for preserving backwards secrecy. It also makes the generator startup simpler by always resetting the buffering data instead of recovering it from stored state.

The BBS modulus size should be selected "with caution", yet keeping in mind that a successful BBS modulus factorization by an adversary does not lead to an immediate security flaw but nonetheless reduces the theoretical support for security assurance. Not every aspects of the BBS theoretical support would be affected by a modulus not large enough for preventing a factorization attempt from succeeding in a reasonable time. In this respect, there are three consequences of the BBS theoretical foundation used in the combined PRNG design.

- Knowledge of the current BBS state does not allow to recover the past output sequence. This property is totally lost if the modulus factorization is known.
- There is no way to recover the BBS state from a direct observation of a BBS output sequence. This property lacks theoretical support when the modulus factorization is known, but in practice an algorithm to exploit the theoretical weakness is not readily available.
- The BBS output sequence statistical quality preserves the entropy present when the BBS is seeded. This property remains if the modulus factorization is known.

For the foregoing analysis of the combined generator security, unless otherwise noted, it is assumed that all three properties are supported by a large enough modulus.

The combined generator security relies on the secrecy *and entropy estimate* of its state data upon seeding. We are now about to analyze how the combined generator fails when not enough entropy is present in the state data for the BBS component alone. For this purpose, we use the notation `S1`

for the BBS state and `S2`

for the MTGFSR state, both at the same point in the combined generator operation (the formal analysis is not limited to the generator operation immediately after seeding).

The purpose of the overall arrangement is a practical solution to the concrete problem of a secret random source for cryptographic applications, including where the available true random source (or sources) and/or secrecy mechanisms are more or less trustworthy. The formal security analysis exposes weaknesses of the combined generator that can easily be overcome with a BBS generator alone, provided a large enough modulus is used. But such an easy solution is in the formal analysis only because a very large BBS modulus may just become unworkable. In other words, we expose how the combined generator becomes vulnerable in a formal analysis so that when the designer faces the practical limitations, the critical elements are more readily identified for the balancing art of practical solution engineering.

The formal analysis is somehow related to the meet-in-the-middle attack on the combination of two symmetric encryption algorithms. In this formal attack model, the attacker knows a plaintext `p`

and a ciphertext `c`

for the combined encipher operation using cipher functions `E"`

and _{K"}()`E'`

for unknown keys _{K'}()`K"`

and `K'`

, with the combined encipher function

`c=E"`

_{K"}(E'_{K'}(p))

In the meet-in-the-middle, the attacker builds a table of partly enciphered `c'=E'`

for every possible _{K'}(p)`K'`

, and a table of partly deciphered `p"=E"`

for every possible _{K"}^{-1}(c)`K"`

. A much smaller effort is needed for to build these tables and find the matches where `p"=c'`

that a brute force attack on a monolitic cipher with a secret key as large as `K"`

concatenated with `K'`

.

The transposition to the BBS+MTGFSR combination is apparent when the formula for stream cipher encipherment is written:

`c=MTGFSR`

_{S2} XOR BBS_{S1} XOR p

The secrets targeted by the attacker are now `S2`

and `S1`

instead of `K"`

and `K'`

. It makes no difference to focus on the direct observation of a combined PRNG output sequence by the attacker, where the relevant formulation is

`OutputStream=MTGFSR`

_{S2} XOR BBS_{S1}

The present threat model addresses limited entropy in `S1`

rather than limited key size of both `K"`

and `K'`

. The attack scenario is based on a table of `BBS`

for the set of _{S1}`S1`

values that cumulates a good probability of occurrence from the true random source probability distribution. These are called significant `S1`

values. With respect to `S2`

, two cases are to be distinguished: entropy large enough to prevent a brute force search strategy (a case not included in the original meet-in-the-middle attack model), or smaller entropy.

In cases where large entropy in `S2`

prevents the attacker from a brute force attack on the MTGFSR, observation of an output sequence about twice as long as the `S2`

size is sufficient to apply the efficient Berlekamp-Massey algorithm on `OutputStream XOR BBS`

for significant _{S1}`S1`

values until the algorithm returns the characteristic polynomials in the MTGFSR and the state `S2`

.

The academic contributions to the science of encryption algorithms seem to be devoid of variable size ciphers that could replace the MTGFSR in the proposed PRNG combination. The likely cause is that the practice-oriented design motivation (i.e. a very large key space for entropy preserving from low entropy sources) attracts little attention in the academic community. In face of this void, the proposed arrangement relies on larger BBS which is positioned at the high assurance end of the assurance versus efficiency tradeoff scale (the MTGFSR component is at the opposite end of this scale). Specifying a variabe size cipher could be relatively easy; establishing firm grounds for its level of assurance is much more demanding.

In cases where the entropy in `S2`

is small enough, the original meet-in-the-middle attack strategy may be used as well, and any PRNG algorithm replacing the MTGFSR component would be equally vulnerable. The size of the output stream required by the attacker is in relation to the desired level of confidence about the result of a successful attack.

Various lessons may be noted from the account of theoretical limitations of the combined PRNG.

- An understanding is provided for the vulnerabilities occurring when either a low entropy true random source is known to provide little variability in the initial state for the BBS component, or in cases where a true random source degrades unexpectedly (e.g. white noise input is replaced by a defined waveform).
- Whenever possible, the BBS component should be seeded with sufficient entropy.
- When uncertainty surrounds the entropy used to seed the BBS component, the application use of the combined generator should be such that the output stream is never exposed. One way to achieve this goal is to apply a one-way transformation to the combined generator output sequence. Or to seed one or more independent PRNG such that a leak in one application area does not interfere with other application areas. For this last strategy, the PRNG may be a standard one, e.g. standardized by NIST in [1], and presumably the entropy preserving property of the proposed arrangement is carried through the PRNG fixed size state to the application use of secret random data.

Overall, in a system design that is meant to be scrutinized using the lessons from academic theories, the impact of insufficient entropy at the true random source has to be understood at every step in the data processing up to the end-result delivered to system users. The generation of multiple RSA signature key pairs by a system can serve as an interesting example. In this application, random numbers are used mainly for two purposes: for to select candidate integers that are passed to the Miller-Rabin probabilistic primality test, and for this test as such. Random numbers for the first purpose need to be secret. No such requirement applies to the random numbers used by the Miller-Rabin test. Irrespective of the detailed data flow from the true random source to the final deliverables by the system, it should be kept in mind that *the user of one RSA key pair has access to an output sequence portion from the secret random source used for generating other key pairs* (through the private prime numbers).

It should be stressed that even with a good true random source, the same type of attention should be paid at every step in the data processing up to the end-result delivered to system users, for the risk of inadvertent entropy reduction.

The mathematics and logical processing required for a secret random source turns into a software source code implementation, but more than just software source code. The unpredictable phenomenon and its sensing machanism is not software. The source code integration into a given system execution environment (which ultimately supports the applications) requires software building processes that can become intricate and should be integrity protected for system assurance. Notably the software building process depends on parameter values which may themselves require software components (e.g. the LFSR design parameters). The actual operation of the secret random source is likely dependent on some secrecy and/or authorization mechanisms (e.g. in a high assurance server where the PRNG state should be stored in a system memory subject to zeroization upon tamper detection).

These system security aspects induce a number of potential variations in the software source code and the software building process. The dedication to implementation practicalities of the present document is assisted by the selection of the Linux development environment and the autoconf and automake software building process conventions. The Linux development environment encompasses cross-compilation capabilities targeting from the low end to the high end of the system sophistication scale, that is from embedded systems (but seldom below 32 bits CPU) to large servers. While the autoconf/automake framework is intended to facilitate support of platforms with limited compliance with C and Posix standards, we give up some of this flexibility by requesting a C++ compiler with good support of the 2003 version of the language.

The Linux runtime environment flexibility comes in part from the kernel configuration and kernel module mechanisms. These are the entry points for the Linux device drivers for a wide variety of device drivers. Note that the kernel development methodology does not use the autoconf/automake framework and is based on the C language. The Linux kernel flexibility offers a good potential for support of true random sources of various kinds, including the built-in `/dev/random`

special device (more on that later). However, the Linux kernel flexibility is not directly applied to deliver secret random bytes to application, e.g. as a *replacement* for `/dev/random`

; while this is technically possible, we see no need to make the proposed arrangement fully transparent.

The secret random source function is considered a system resource with a single instance servicing an application context. This principle needs not create restrictions; it is present for to relax C++ object orientation requirements associated with multiple object instances in a given program run. It is not a design goal to offer multiple independent PRNG output squences to diverse portions (or threads) of a single application. The principle is perhaps more influential for the software building process as a simplifying assumption: a signle profile of arrangement variants is applicable to an instance of the software building process.

The single instance principle also reflects the usual scarcity of sampling mechanisms that can feed the true random source. A single instance per application context may mean a system resource implemented outside of the application software image (e.g. opened as a binary file handle), or the PRNG implemented within the software image and accessing an outside true random source, or an application software image having exclusive access to an unpredictable phenomenon sampling mechanism (and most or all of the deterministic software support within the application image).

The abstract view of the proposed arrangement options is thus applicable to the selection of concrete options for the autoconf `configure`

script. The taxonomy is defined in a syntax compatible with these concrete options. This syntax is simply a parameter name for an option variable, and small integer as an indication of each allowed selection. Note that the autoconf/automake methodology is in no way exemplary of a structured software development approach; it is merely an adhoc solution to the problem of distributing complex software in source code format for execution in (more or less) POSIX-compliant operating system environments.

The built software may have the deterministic PRNG software either (variable `--with-superprng-a1`

):

- within its built software image (
`--with-superprng-a1=1`

), or - outside of its built software image (
`--with-superprng-a1=2`

).

In the case of outside deterministic PRNG software, the access to the secret random source may be either (variable `--with-superprng-a2`

):

- in a separate software component on the same system, accessible through a named pipe (
`--with-superprng-a2=1`

), - in a software component on a different processor system, accessible through a communications port with a logical connection in cleartext (
`--with-superprng-a2=2`

), or - also in a software component on a different processor system, and also accessible through a communications port, but with a logical connection protected by a security protocol (
`--with-superprng-a2=3`

).

The last two cases are distinguished by a security analysis of the overall system operating environment: if the device connection is considered immune from eavesdroping threats, the logical connection may be in cleartext. The security protocol needed is an eavesdroping threat is present is a subject area outside the scope of the present document.

The BBS component of the PRNG requires the composite modulus parameter, the product of two prime numbers that must be unknown (preferably to everyone) for the BBS theoretical properties to apply in full. The BBS modulus parameter may be chosen either (variable `--with-superprng-n1`

):

- by means outside of the software build instance with a result known to the built software at runtime (
`--with-superprng-n1=1`

), - also by means outside of the software build instance but with a result known to the software built process (
`--with-superprng-n1=2`

), or - through the running of some software components within the software build instance (
`--with-superprng-n1=3`

).

Note that neither options provide assurance about the unknown factorization condition, which is a matter of trust in the complete circumstances in which a BBS modulus is generated.

The BBS modulus has local significance only, and can thus be kept secret to add another layer of protection in the overall arrangement security. With respect to the abstract views of software build options, this brings the three following choices (variable `--with-superprng-n2`

):

- no special secrecy protection is applied to the BBS modulus (
`--with-superprng-n2=1`

), - the BBS modulus secrecy protection mechanism is the same as for the PRNG state data (
`--with-superprng-n2=2`

), or - the BBS modulus secrecy protection mechanism is the same as for other cryptographic key material in the runtime environment but not the same as for the PRNG state data (
`--with-superprng-n2=3`

).

In any event, the incremental security offered by BBS modulus secrecy seems limited. The case where it would be more important is if the PRNG state leaks but the BBS modulus remains secret, which is unlikely to occur unless the third option above is used and the runtime environment has separate secrecy mechanisms.

This subsection pertains to the PRNG state when the software PRNG implementation is started; it is not applicable when the PRNG is outside of the software build image.

The deterministic PRNG initial state may be set either (variable `--with-superprng-b`

):

- by the PRNG seeding from a true random source available in the built software environment at runtime (
`--with-superprng-b=1`

), - by the PRNG state restoration from a previously saved state data set (
`--with-superprng-b=2`

), or - fisrt by an attempt at the PRNG state restoration as indicated above, and falling back to the PRNG seeding from a true random source (
`--with-superprng-b=3`

).

Note that the two PRNG state restoration options imply some secrecy protection mechanism, including the requirement to erase the stored data set if a leak is possible after the restore operation (we conservatively assume that this threat is always present). This requirement is not not an option since it seems to apply in every use case for the proposed arrangement.

This subsection pertains to the PRNG state when the software PRNG implementation is stopped or shutdown; it is not applicable when the PRNG is outside of the software build image.

When the determinstic PRNG operation is no longer needed, the final PRNG state may either (variable `--with-superprng-e`

):

- be simply dropped (
`--with-superprng-e=1`

), - be stored somewhere for later restoration (
`--with-superprng-e=2`

), or - be stored somewhere for later restoration with some serialization mechanism preventing a conflicting use of this storage write operation (
`--with-superprng-e=3`

).

Note that the two PRNG state storage options require some secrecy protection mechanism. This requirement is not not an option since it seems to apply in every use case for the proposed arrangement.

When the deterministic PRNG has to be seeded from a true random source, it is possible to either (variable `--with-superprng-t1`

):

- recursively shift the issue to a remote system component (
`--with-superprng-t1=1`

), or - directly read data from a true random source (
`--with-superprng-t1=2`

).

The recursivity option is made explicit in the present description of arrangement options so that its fundamantal limitation (it does not actually address the difficult issue of a true random source with security assurance) is not inadvertently ignored or avoided. It has the same options as the outside deterministic PRNG software (variable `--with-superprng-t2`

):

- in a separate software component on the same system, accessible through a named pipe (
`--with-superprng-t2=1`

), - in a software component on a different processor system, accessible through a communications port with a logical connection in cleartext (
`--with-superprng-t2=2`

), or - also in a software component on a different processor system, and also accessible through a communications port, but with a logical connection protected by a security protocol (
`--with-superprng-t2=3`

).

For the direct read from a true random source option, the option list could be almost endless. Thus, the abstraction used in the description of options is replaced by a short list of realistic alternatives. At the time of this writing, the envisioned options are (variable `--with-superprng-t3`

):

- the PUDEC scheme ([11], [12]) (
`--with-superprng-t3=1`

), or - the
`/dev/random`

special device (`--with-superprng-t3=2`

).

This subsection contains short narrative descriptions of envisioned profiles or instantiations of the proposed arrangement. The list below is not exhaustive and a cryptographic system designer may well face requirements and/or local resources justifying variations or other approaches. Notably the set of true random sources available may be expanded from the limited set of two choices occurring below.

The profiles listed below may be implemented independently from the option setting variables; those are defined in the present document only in case they facilitate the source code re-use among diverse profiles.

The purpose of this profile is a functional substitute for the Linux `/dev/random`

special device, seeded with the PUDEC true random source. The high security server systems are candidate systems for this arrangement. Upon server shutdown, the software should attempt to store the PRNG state for later system restart. The application of the fail-safe principle to this environment implies that a system startup failure at the PRNG state restore operation must prevent the normal application use of secret random number data until the software PRNG is re-seeded with a PUDEC random outcome sensing operation.

`--with-superprng-a1=1`

`--with-superprng-n1=3`

`--with-superprng-n2=1`

`--with-superprng-b=3`

`--with-superprng-e=2`

`--with-superprng-t1=2`

`--with-superprng-t3=1`

This profile is a simple case of an application on a server system served by an instance of the preceeding profile. The access to secret random numbers is simply through a read handle to the named pipe supplied by this instance of the preceeding profile.

`--with-superprng-a1=2`

`--with-superprng-a2=1`

If efficiency or secret random number generation is a concern, it is possible for a server application to seed a local PRNG through a read handle to a named pipe supplied with random numbers, and thereafter use the local PRNG. The principles of entropy preservation allow the local PRNG to be of a smaller size than one seeded from a low entropy true random source.

`--with-superprng-a1=1`

`--with-superprng-n1=2`

`--with-superprng-n2=1`

`--with-superprng-b=1`

`--with-superprng-e=1`

`--with-superprng-t1=1`

`--with-superprng-t2=1`

The first client side application envisioned for the proposed arrangement is a command-line utility for authentication key establishment, a kind of ceremonial step in the user perspective and a critical cryptographic protocol exchange in the IT security manager perspective (this utility interacts with a key management server based on a-priori knowledge of a remote server public key). It is thus used infrequently and needs to be self-contained as far as possible. The true random source offered by the operating system, that is the `/dev/random`

special device, is the resource available with the minimal software installation hindrance.

`--with-superprng-a1=1`

`--with-superprng-n1=2`

`--with-superprng-n2=1`

`--with-superprng-b=1`

`--with-superprng-e=1`

`--with-superprng-t1=2`

`--with-superprng-t3=2`

A higher security option, for the most security aware users, use the PUDEC true random source and allows the PRNG state to be stored in a local secret storage area, so that a single PUDEC random outcome sensing operation may supply secret random data for more than a single application run. Since the typical application establishes an authentication key, it is deemed to have access to a secret storage area with acceptable secrecy protection mechanisms.

`--with-superprng-a1=1`

`--with-superprng-n1=2`

`--with-superprng-n2=1`

`--with-superprng-b=3`

`--with-superprng-e=3`

`--with-superprng-t1=2`

`--with-superprng-t3=1`

This profile is representative of an RSA signature key pair generation utility. In a self-served strategy, the BBS modulus parameter generation is used as a concrete requirement model (it differs from the RSA key generation by some minor number theoretic differences and the absence of a storage requirement for the secret prime factors).

The security model for this profile is to read true random data, to seed the PRNG, to use it to generate two secret random prime numbers, and to erase any intermediate data used in the process including the PRNG state. The reader may try to convince himself that the *PRNG need not be cryptographically strong* in this application profile, or else try to find logical arguments to the contrary (the present author couldn't find any). What is critical is the secrecy and entropy of the true random source relied upon.

`--with-superprng-a1=1`

`--with-superprng-n1=2`

`--with-superprng-n2=1`

`--with-superprng-b=1`

`--with-superprng-e=1`

`--with-superprng-t1=2`

`--with-superprng-t3=1`

or`--with-superprng-t3=2`

[1] Elaine Barker and John Kelsey, "Recommendation for Random Number Generation Using Deterministic Random Bit Generators", NIST Special Publication 800-90, Revised March 2007, U.S. Department of Commerce, National Institute of Standards and Technology, Information Technology Laboratory, Computer Security Division, available at http://csrc.nist.gov/publications/nistpubs/800-90/SP800-90revised_March2007.pdf.

[2] Montgomery, Peter L., "Modular Multiplication Without Trial Division, Mathemetics of computations", Vol. 44, no. 170, April 1985, pp 519-522

[3] Dussé, Stephen R., and Kaliski, Burton S. Jr., "A Cryptographic Library for the Motorola DSP56000", Advances in Cryptology, Eurocrypt'90, Lecture Notes In Computer Science no. 473, pp 230-244, Springer-Verlag, 1990

[4] Blum, Leonore, Blum, Manuel, and Shub, M., "A Simple Unpredictable Pseudo-random Number Generator", SIAM Journal of Computing, vol. 15, no. 2, May 1986, pp 364-383

[5] Moreau, Thierry, "PEKE, Probabilistic Encryption Key Exchange, 10 Years Later, Including the PEKEv1.25 Specifications", CONNOTECH Experts-conseils inc., document number C003449, 2005/10/03, also Cryptology ePrint Archive: Report 2005/183, available at http://eprint.iacr.org/2005/183.pdf.

[6] Rabin, Michael, "Digitalized Signatures and Public-Key Functions as Intractable as Factorization", MIT Laboratory for Computer Science, TR-212, January 1979, available at http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS-TR-212.pdf.

[7] Hugh C. Williams, "A Modification of the RSA Public Key Encryption Procedure", IEEE Transactions on Information Theory, 26 (6): pp 726-729, November 1980.

[8] Thierry Moreau, "Scirpo, a Basic Rabin-Williams Digital Signature Specification", CONNOTECH Experts-conseils inc., Document Number C004883, 2009/09/02, available at http://www.connotech.com/doc_rw_sign_basic-03.html.

[9] Thierry Moreau, "A Note About Trust Anchor Key Distribution", CONNOTECH Experts-conseils inc., Document Number C003444, 2005/07/05, available at http://www.connotech.com/takrem.pdf.

[10] ISO/IEC 10118-4:1998, "Information technology -- Security techniques -- Hash-functions -- Part 4: Hash-functions using modular arithmetic".

[11] Thierry Moreau, "How to Seed Random Number Generation from Dice Shuffling", CONNOTECH Experts-conseils inc., Document Number C005033, 2010/09/16, available at http://www.connotech.com/doc_pudec_descr.html.

[12] Thierry Moreau, "The PUDEC Discrete Outcome Digitalization and Conditioning Algorithms", CONNOTECH Experts-conseils inc., Document Number C005069, 2010/11/12, available at http://www.connotech.com/doc_pudec_algo.html.

[13] Neal Koblitz, Alfred J. Menezes, "Another Look at 'Provable Security'", July 4, 2004; updated on July 16, 2004; October 25, 2004; March 31, 2005; and May 4, 2005, IACR eprint archive 2004/152, available at http://eprint.iacr.org/2004/152.pdf.

[14] Christian Cachin, "Entropy Measures and Unconditional Security in Cryptography", A dissertation submitted to the SWISS FEDERAL INSTITUTE OF TECHNOLOGY, ZURICH, Diss. ETH No. 12187, 1997, available at http://www.zurich.ibm.com/~cca/d.html.

[15] Dan Boneh, "Simplified OAEP for the RSA and Rabin Functions". CRYPTO 2001: 275-291

[16] D. J. Bernstein, "Proving tight security for Rabin-Williams signatures", Pages 70-87 in "Advances in cryptology---EUROCRYPT 2008, 27th annual international conference on the theory and applications of cryptographic techniques", Istanbul, Turkey, April 13--17, 2008, proceedings, edited by Nigel Smart, Lecture Notes in Computer Science 4965, Springer, 2008. ISBN 978-3-540-78966-6, also Document ID: c30057d690a8fb42af6a5172b5da9006, Date: 2008.02.01, available at http://cr.yp.to/papers.html#rwtight.

[17] Vazirani, Umesh V., and Vazirani, Vijay V., "Efficient and Secure Pseudo-random Number Generation", Proceedings of the 25th IEEE Symposium on the Foundations of Computer Science, 1984, pp 458-463

[18] Maurer, Ueli M., "Fast Generation of Secure RSA-Moduli with Almost Maximal Diversity", in Advances in Cryptology, Eurocrypt '92, Lecture Notes in Computer Science, no. 658, Springer-Verlag, 1992, pp 636-647

[19] Maurer, Ueli M., "Fast Generation of Prime Numbers and Secure Public-Key Cryptography Parameters", Journal of Cryptology (1995) Vol 8, pp 123-155

[20] Thierry Moreau "The 'Multiplexed and Twisted' GFSR, a Flexible Scheme for the Creation of Pseudo-random Generators", April 2000, Revised April 2005 and October 2005, CONNOTECH Experts-conseils Inc., available at http://www.connotech.com/mtgfsr_04.pdf.

[21] Matsumoto, Makoto, and Kurita, Yoshiharu, "Twisted GFSR Generators II", ACM Transactions on Modeling and Computer Simulations, Vol. 4, No. 3, July 1994, pp 254-266

[22] Matsumoto, Makoto, and Kurita, Yoshiharu, "Twisted GFSR Generators", ACM Transactions on Modeling and Computer Simulations, Vol. 2, No. 3, July 1992, pp 179-194

[23] Massey, James L., "Shift-Register Synthesis and BCH Decoding", IEEE Transactions on Information Theory, Vol. IT-15, no. 1, January 1969, pp 122-127

[24] François Panneton, "Construction d'ensembles de points basée sur une récurrence linéaire dans un corps fini de caractéristique 2 pour la simulation Monte Carlo et l'intégration quasi-Monte Carlo", PhD Thesis, Université de Montréal, Faculté des études supérieures, August 2004, available at http://www.iro.umontreal.ca/~panneton/These.pdf.

[25] P. L'Ecuyer and F. Panneton, "Construction of Equidistributed Generators Based on Linear Recurrences Modulo 2", Monte Carlo and Quasi-Monte Carlo Methods 2000, K.-T. Fang, F. J. Hickernell, and H. Niederreiter eds., Springer-Verlag, 2002, 318-330, available at http://www.iro.umontreal.ca/~lecuyer/myftp/papers/mcqmc00.pdf.

[26] François Panneton, Pierre L'Ecuyer, Makoto Matsumoto, "Improved long-period generators based on linear recurrences modulo 2". ACM Trans. Math. Softw. 32(1): 1-16 (2006), available at http://www.iro.umontreal.ca/~panneton/WELLRNG.html.

[27] François Panneton, Pierre L'Ecuyer: "Resolution-stationary random number generators". Mathematics and Computers in Simulation 80(6): 1096-1103 (2010)

[28] Samuel Wagstaff, "The Cunningham Project", web site hosted by Center for Education and Research in Information Assurance and Security (CERIAS), available at http://homes.cerias.purdue.edu/~ssw/cun/index.html.