The PUDEC logo should appear here.

The PUDEC Discrete Outcome Digitalization and Conditioning Algorithms

Thierry Moreau

Document Number C005069

2010/11/12

Copyright (C) 2010 CONNOTECH Experts-conseils inc.

This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

PUDEC and the PUDEC logo are trademarks of CONNOTECH Experts-conseils inc.

Specifications subject to change without notice.

Table of contents

1. Introduction

1.1 Overview of the Deterministic Algorithm Contributions to the Nondeterministic Arrangement

1.2 Editorial Process for the Future Revisions

2. Real Configuration and Data Representations

2.1 The Concrete PUDEC Discrete Outcome

2.2 Bar Code Data Representation

2.2.1 Special Arrangements for Additional Secrecy Protection

2.3 Digitalized PUDEC Discrete Outcome Data Representation

2.4 Conditioned Random Bits

3. Operator Feedback from the Digitalization Algorithm

3.1 Overview

3.2 Specific Levels of Detailed Operator Feedback

3.2.1 Bare Minumum Feedback

3.2.2 Normal Progress Feedback

3.2.3 Basic Error Processing Feedback

3.2.4 Additional Error Recovery Feedback

3.2.5 Ignored Unexpected Reading Feedback

3.2.6 Usual Cases of Ignored Reading

3.2.7 Early Warnings in the First Sequence

4. The PUDEC Digitalization Algorithm

4.1 Overview of Algorithm Operation

4.2 Expected Bar Code Readings for Each Scan Sequence

4.3 Error Processing

4.3.1 Duplicate Processing Applicable to Every Scan Sequences

4.3.2 Duplicate Processing in Relation to Dice Permutation

4.3.3 Missing Die Readings With or Without Occurrence in a Subsequent Sequence

4.3.4 Error Processing in the Fourth Sequence

4.3.5 Resolution of Corner Cases

5. The PUDEC Conditioning Algorithm

5.1 Buffering of Discrete Outcome Elements Queued by Prime Components of Mixed Radix Bases

5.2 Selection of Specific Mixed Radix Bases

5.2.1 Overview of Problem Statement

5.2.2 Recommended Solution

5.3 Final Computation of a Random Bit Stream

6. References

Document Revision History

C-Number

Date

Explanation

C005069

Current version

C005025

2010/09/03

Initial release in a draft version

C005061

2010/10/06

Updated a document reference with a new revision number.

Added the PUDEC logo at the top of the document.

Added the optional boolean parameter in the conditioning algorithm, text updaed in first paragraph of "5. The PUDEC Conditioning Algorithm" and second half of "5.1 Buffering of Discrete Outcome Elements Queued by Prime Components of Mixed Radix Bases".

C005065

2010/10/13

Text refinement for the more exact handling of sequences /C/ for the dice ordering data representation, in "4.3.3 Missing Die Readings With or Without Occurrence in a Subsequent Sequence", in preparation of software development after subversion revision 1870.

Minor clarification (changed "a new die/face" to "one new die/face"), in "4.3.4 Error Processing in the Fourth Sequence", already reflected in software at subversion revision 1869.

C005067

2010/10/15

Re-arrangement of the specification text for the handling of sequences /C/ for the dice ordering data representation, in "4.3.3 Missing Die Readings With or Without Occurrence in a Subsequent Sequence", based on the failed attempt to implement the previous text, the failed attempt being recorded in software at subversion revision 1872.

C005068

2010/10/17

Added a note about handling A-B-C-B-A-B-C-X-C-B-C-X-Y to avoid specifications ambiguity in section "4.1 Overview of Algorithm Operation", implemented as a bug fix in software at subversion revision 1880.

C005069

2010/11/12

Included the revised operator feedback section "3. Operator Feedback from the Digitalization Algorithm", aligned with source code revision 1898. Foremost addition is subsection "3.2 Specific Levels of Detailed Operator Feedback". The section "1.2 Editorial Process for the Future Revisions" is updated accordingly.

Added the section "4.3.5 Resolution of Corner Cases" to account for one corner case identified in the test suite and not conveniently reflected as an update or clarification of other provisions, used the section as a placeholder for tutorial text on corner cases already handled elsewhere.

Updated the second paragraph in section "4.3 Error Processing" as a matter of specifications explicitness.

Fixed an obvious inconsistency in the last portion of section "5.1 Buffering of Discrete Outcome Elements Queued by Prime Components of Mixed Radix Bases". The fixed text has been added in revision C005061.

Specification change for the bit order in the final random bit packing for the application layer (from least significant bit first to most significant bit first) in section "5.3 Final Computation of a Random Bit Stream". Change motivated by the left-to-right reading order of hexadecimal dump of random data, making auditable logs easier to work with. Implemented in software at subversion revision 1932.

1. Introduction

This document specifies two deterministic algorithms that are needed for nondeterministic digital entropy collection (or random data input) using the PUDEC set of bar coded dice. This specification uses a natural language formulation as an abstraction mechanism from specific software implementations.

Introductory background information about the PUDEC digital entropy collection scheme is found in the reference [1].

This document thus has very focused scope and purpose. It is a record of the detailed functions to be implemented in two software components that are useful only in the specific context of the PUDEC scheme for digital entropy collection. The motivations for the preparation of this document are also quite precise

If these motivations make no sense to the reader, further reading may not be very useful, unless you read this document in the context of an implementation effort for the described algorithms.

1.1 Overview of the Deterministic Algorithm Contributions to the Nondeterministic Arrangement

The typical strategy for providing a secret random source is a combination of digital sampling of an unpredictable phenomenon, similar to a measurement instrument with a digital readout, followed by some conditioning of the measurement values so that the final result is binary digits with a 50% probability of either values. The present document is limited to the required software components for the two steps, namely digitalization and conditioning, given that the PUDEC scheme deployment use of the bar code technology introduced in the reference [1], and follows the additional guidance from the reference [2].

A distinctive characteristic of the PUDEC scheme is that the measurement or digitalization portion is unequivocally devoid of measurement precision uncertainty: dubious bar code readings are ignored and cause the random event to be omitted altogether from the sampling. With many physical phenomenon measurements, the precision limit of the instrument may be blurred into the very unpredictability of the observation, which makes entropy estimates more difficult.

The PUDEC digital entropy collection scheme rests on a specification for the PUDEC discrete outcome (see section "2.1 The Concrete PUDEC Discrete Outcome") to be sampled in a series of four bar code reading sequences. There is some intrinsic redundancy in the bar code reading sequences, with a potential increase in the digitalization process confidence: the digitalization algorithm is in a position to detect input sequences which can not occur with a compliant discrete outcome. Furthermore, the algorithm may survive a limited number of typical bar code scanning errors. Thus, the PUDEC digitalization algorithm is a deterministic process that collects reliable observations of elements in the compound PUDEC discrete outcome (elements may be dropped from the discrete outcome by application of the error processing logic), from some signal sent by a bar code reader unit.

The randomness of the sampled data rests on the operator who shuffles the set of dice and align them prior to the series of four bar code reading sequences. The deterministic digitalization algorithm only validates the coherency with the discrete outcome structure and does not attempt statistical testing of randomness.

The secrecy of the sampled data partially rests on the operator who should systematically shuffle the set of dice after a discrete outcome instance has been scanned in order to prevent surreptitious duplication of discrete outcome sampling in a system controlled by an eavesdroper. The secrecy requirement prevents the deterministic digitalization algorithm from providing operator feedback that would leak information about the sampled data values.

The conditioning algorithm takes whatever the digitalization algorithm accepted as reliable discrete data elements, which may be a smaller set than the complete compound PUDEC discrete outcome.

1.2 Editorial Process for the Future Revisions

The first released version of this document is in a draft state. Moreover, no attempt is made to use a recognized formulation for specifications of algorithms besides an aim at covering the complete set of conditions that may be encountered by the algorithms, even if the resulting text is sometimes difficult to read.

The document is first released in a draft state since it records the author understanding of its subject matter prior to the programming of a second version of the PUDEC digitalization algorithm and a first version of the PUDEC conditioning algorithm. Some algorithmic experiments were made with ad-hoc source code for the conditioning algorithm. Ideally, the change control for the present document revision and the source code to be derived from it should be synchronized and coherent.

Two areas are identified as requiring further document advances. Some numerical values may require either a better justification, or an explicit definition as an algorithm parameter with a valid range statement (these are indicated by a typographical mark like [algorithm-parameter-3]). Also, the algorithms themselves may need refinements and/or fixes; an aspect requiring attention is the interaction between the bar code scanning errror recovery and the algorithm ability to keep track of the four scannning sequences that make a PUDEC discrete outcome sensing operation.

Another reason why this document is in draft form at its first release is the pending status for the companion document, reference [2]. Some material in the present document may be replaced by (or augmented with) pointers to the coverage of the same subject area in the reference [2].

2. Real Configuration and Data Representations

Each of the following document subsections covers a concrete occurrence or informational representation of the PUDEC discrete outcome as it proceeds in the overall process.

2.1 The Concrete PUDEC Discrete Outcome

The description for the concrete PUDEC discrete outcome benefits from the photos found in the reference [1], in which the figure 1 and 2 are photographic equivalent of for figure 1 below, while the reference [1] figure 7 is the photographic equivalent of figure 2 below.

Image not found.

Figure 1) A PUDEC die

Image not found.

Figure 2) The PUDEC dice set ready for discrete outcome sensing

The discrete information contained in the bar codes visible in the concrete discrete outcome is organized as

This figure 3 shows the semantic and orientation of bar code labels on the six faces of a PUDEC die, defining the strict PUDEC die layout. This assumes that one-dimensional bar codes are used, and the arrow watermark on each face indicates the bar code label orientation.

The figure 3 represents a convention or standard provision for interoperability between a specific PUDEC dice industrial design (selection of a bar code symbology, determination of size, color, and positioning of bar code label on a die face), bar code scanner technology capabilities (notably the selectivity of bar code readings according to the label orientation with the CCD bar code reader technology), and the software functions compliant to the present document.

Image not found.

Figure 3) The PUDEC die logical layout including bar code orientations

The table below is a tabular representation for the die layout shown in figure 3.

The Strict PUDEC Die Layout

East-west axis

North-south axis

Reference face

Westward face

Eastward face

Northward face

Southward face

1

5

2

3

4

2

4

3

1

6

3

6

1

2

5

4

1

6

2

5

5

3

4

1

6

6

2

5

3

4

The PUDEC die layout shown in figure 3 and reflected in the above table is actually more restrictive than the set of acceptable die layouts compatible with the PUDEC digitalization algorithm. The arrow directions in the may be reversed. Also, the opposite face numbers may be exchanged. This is caused by the bidirectional (instead of unidirectional) bar code reading capability of common CCD bar code reader units. The table below shows the relaxed die layout acceptable for the PUDEC digitalization algorithm.

The Relaxed Die Layout

Reference face

East-west axis

North-south axis

1 or 6

2 and 5

3 and 4

2 or 5

3 and 4

1 and 6

3 or 4

1 and 6

2 and 5

Using the strict die layout and separate die number indications underneath each die face pairs, the figure 4 shows the overall discrete outcome information available from a sample PUDEC concrete outcome. (Note that the complete discrete information might include the two die faces visible at each end of the dice aligement, but these are omitted from the figure and the PUDEC dicrete outcome scanning sequences -- these would be face 6 of the die 30 on the left end and face 4 of the die 15 on the right end with the sample outcome depicted on the figure.)

Image not found.

Figure 4) A sample PUDEC discrete outcome, shown logically after bar code decoding and die/face mapping

In a perfect world, each occurrence of a PUDEC concrete outcome would look like figure 4 and the PUDEC discrete outcome scanning sequences would yield the exact data representation without any error. The purpose of the deterministic PUDEC digitalization algorithm is to turn a stream of bar code readings obtained from an actual concrete outcome (as scanned by an operator) into the representation specified in section "2.3 Digitalized PUDEC Discrete Outcome Data Representation", but only to the extent that the bar code readings are logically coherent and reasonably error-free.

2.2 Bar Code Data Representation

For the scope of the present document, the bar code data representation is limited to 216 different die face identification codes. For sake of determinism, it is sufficient to refer to them as bar code value 1 to bar code value 216 inclusive. The correspondence between the bar code values (from 1 to 216 inclusive) and specifications of a bar code symbology is out of scope for the present document.

The 216 different bar code values are mapped into a die number, from 1 to 36, and a face number, from 1 to 6. A fixed arbitrary mapping is suggested, as indicated in the table below. The arbitrary mapping is intended to increase the difficulty of visual bar code decoding and memorization, hence increasing the PUDEC scheme secrecy protection.

Bar code data mapping to PUDEC die and face numbers

Bar Code Value

Die

Face

1

24

3

2

31

3

3

24

1

4

26

3

5

7

2

6

35

1

7

12

3

8

12

1

9

1

3

10

23

5

11

30

4

12

1

2

13

7

4

14

28

4

15

22

5

16

32

5

17

17

1

18

16

3

19

5

3

20

10

1

21

9

6

22

3

5

23

9

2

24

30

1

25

16

5

26

13

5

27

27

2

28

25

1

29

9

4

30

36

4

31

14

3

32

2

2

33

28

2

34

6

5

35

17

4

36

18

4

37

4

6

38

33

3

39

1

4

40

10

5

41

24

2

42

31

1

43

18

5

44

12

2

45

9

3

46

2

5

47

35

4

48

7

5

49

24

6

50

23

2

51

11

6

52

14

4

53

5

4

54

14

2

55

17

6

56

6

2

57

27

6

58

34

3

59

27

5

60

20

3

61

8

3

62

15

6

63

21

4

64

29

1

65

10

3

66

18

3

67

19

1

68

36

6

69

35

3

70

12

4

71

4

1

72

23

6

73

31

5

74

36

1

75

7

1

76

15

2

77

32

3

78

22

1

79

8

2

80

25

5

81

8

4

82

29

6

83

6

4

84

34

1

85

35

5

86

22

2

87

2

4

88

1

6

89

3

3

90

16

1

91

31

2

92

33

4

93

28

1

94

16

6

95

3

6

96

10

2

97

21

5

98

11

4

99

26

5

100

1

5

101

7

3

102

21

6

103

33

2

104

19

3

105

3

4

106

30

5

107

5

2

108

9

5

109

34

5

110

20

4

111

33

5

112

1

1

113

13

3

114

34

4

115

31

6

116

16

2

117

15

3

118

20

5

119

3

1

120

13

2

121

35

2

122

32

1

123

5

6

124

2

1

125

24

5

126

31

4

127

28

3

128

32

4

129

2

6

130

30

3

131

22

4

132

3

2

133

8

6

134

36

5

135

25

6

136

9

1

137

29

5

138

21

3

139

25

2

140

33

6

141

30

2

142

20

1

143

19

2

144

19

5

145

36

2

146

33

1

147

21

2

148

13

4

149

20

6

150

10

4

151

5

5

152

5

1

153

16

4

154

18

1

155

34

6

156

17

5

157

14

6

158

6

1

159

27

3

160

20

2

161

27

1

162

22

3

163

25

4

164

6

3

165

28

5

166

23

4

167

26

4

168

35

6

169

26

1

170

13

6

171

4

2

172

8

1

173

11

1

174

11

5

175

21

1

176

10

6

177

4

3

178

32

6

179

34

2

180

26

2

181

24

4

182

27

4

183

29

3

184

2

3

185

4

5

186

25

3

187

23

1

188

12

6

189

15

5

190

29

2

191

32

2

192

26

6

193

18

6

194

28

6

195

14

1

196

12

5

197

15

4

198

4

4

199

8

5

200

13

1

201

11

3

202

17

2

203

30

6

204

29

4

205

17

3

206

19

6

207

23

3

208

7

6

209

19

4

210

36

3

211

14

5

212

6

6

213

11

2

214

22

6

215

18

2

216

15

1

2.2.1 Special Arrangements for Additional Secrecy Protection

The above table together with the strict PUDEC die layout define the canonical set of 36 PUDEC dice that is compatible with the deterministic PUDEC digitalization algorithm. This document subsection introduces two ways of enhancing the intrinsic secrecy protection of the discrete outcome observation. These change the contents of the above table, but otherwise don't alter the deterministic algorithms specified in the present document.

Given the canonical set of dice, the above 216 entries permutation table is not the only one possible. It is conceivable to fill the entries in the table from orderly scanning of every die faces after shuffling them and aligning them from die 1 to die 36 (in the order to be observed and then stored in the table). If this was done in a given PUDEC usage environment and the permutation table kept secret, a given discrete outcome observation eavesdroping (in the raw mode where numbers in the range 1 to 216 would replace the die and face numbers found in figure 4) would not allow the duplication of PUDEC random process results in another environment. The operating principle is that nothing restricts a-priori the christening of dice from 1 to 36 and the christening of face pairs (as long as the face number assignment complies to the relaxed die layout). The orderly scanning of shuffled and aligned dice for the purpose of initializing a specific secret table is a convenient way to make a secret selection.

This operator-assisted secret definition of the bar code input translation table may be extended. So far, it applies to the canonical set of PUDEC dice and to sets that departs only because one or more dice comply to the relaxed die layout but not to the strict one (it was mentioned that the relaxed die layout is a sufficient condition for a die to be used in the PUDEC digitalization algorithm). The extension applies to arbitrary sets of 36 dice satisfying two conditions:

With such a set of dice, the above fixed table does not apply (the most basic data validation rules in the deterministic digitalization algorithm would be breached with a very high probability), and the operator-assisted table definition is mandatory (the table definition operation may be embedded in either the system assembly process or the system commissioning process). In addition, the set would be useful only with those systems configured with a matching table definition. This approach may enhance a sense of security (it prevents someone with a non-matching set of dice from surreptitiously re-seeding a victim's system), but the marginal security benefit is small (with just a secret table configured in the system for the canonical set of PUDEC dice, the attacker can not duplicate the seeding operation results anyway) in view of the operational hindrance (the specific arbitrary set of dice becomes a physical key for the system seeding operation).

Further consideration of these variations in the PUDEC scheme belongs to the companion document reference [2], notably for operational impacts.

2.3 Digitalized PUDEC Discrete Outcome Data Representation

The digitalized PUDEC discrete outcome is intended to provide a complete data representation of the validated bar code scan data, accounting for missing data caused by accepted bar code scanning errors. It is not unusual for a digitalization process to output a less than perfect data representation of a sampled instance, while the actual data is preferable to no data. In the case of the PUDEC discrete outcome, the digitalized data representation may have holes or missing data elements as long as the reduced data has a clean probability distribution. The main benefits of accepting such PUDEC reduced data representation are operator efficiency (average rate of output) and predictability (the rate of output degrades less abruptly in the presence of a few bar code scanning errors).

The digitalized PUDEC outcome data representation is made of

a permutation
or more specifically an ordering of the 36 dice (this ordering indicates a complete permutation in the absence of errors) and
36 independent die outcomes
or more specifically up to 36 independent die face pairs or die face singletons (exactly 36 independent die face pairs in the absence of errors).

The following notation is used for the dice ordering: a list of comma separated die numbers (each from 1 to 36), where each die number may appear at most once. These die numbers MUST appear in the exact sequence that was sensed by the PUDEC digitalization algorithm. Every die for which an exact position has been determined by the PUDEC digitalization algorithm MUST be present. Each such exact die number may have a trailer indication like +(2,13) and/or a trailer indication like -(13). Each die number for which no exact position has been determined by the PUDEC digitalization algorithm must be present exactly once in a trailer in the form +(...) and exactly once in a trailer of the form -(...), and the latter MUST NOT occur earlier than the former in the list. Within either trailer form, die numbers MUST appear in increasing numerical order. These two pair-wise trailer occurrences indicate respectively the earliest position in the sequence just before which the die number might have been present in the concrete PUDEC discrete outcome, and the latest position in the sequence just after which the die number might have been present. For instance, if only the die number 13 has been completely absent from an otherwise perfectly sensed discrete outcome, the following dice ordering string might be representative: 35+(13),32,20,14,30,36,6,29,21,3,19,1,8,15,18,33,2,23,12,5,9,26,10,7,31,25,17,27,16,22,24,11,4,28,34-(13) . Using the same data, if the die number 7 ordering has been sensed only in either the second or third PUDEC scan sequence between dice 9 and 31, the following dice ordering string would apply: 35+(13),32,20,14,30,36,6,29,21,3,19,1,8,15,18,33,2,23,12,5,9,26+(7),10-(7),31,25,17,27,16,22,24,11,4,28,34-(13) .

An independent die outcome is either

When present, the die face numbers MUST indicate outcomes sensed by the PUDEC digitalization algorithm. A pair of die face numbers MUST appear for a die that was reliably sensed on two adjacent faces by the PUDEC digitalization algorithm (thus the sum of the two die face numbers may not add to 7). A single face number MUST appear for a die that was reliably sensed on a single face by the PUDEC digitalization algorithm. A dash MUST appear for a die that was not reliably sensed by the PUDEC digitalization algorithm.

For a successful PUDEC discrete outcome sensing operation by the PUDEC digitalization algorithm, the outcome data representation MUST include a dice ordering string and the 36 independent die outcomes for the PUDEC dice from die number 1 to die number 36 inclusive, in that order, separated by white space characters.

2.4 Conditioned Random Bits

The result of the PUDEC conditioning algorithm is simply a stream of random bits, with a portion at the end of the stream delivered after the PUDEC conditioning algorithm receives an indication that no further input is to be expected (some internal buffering occurs within the PUDEC conditioning algorithm).

3. Operator Feedback from the Digitalization Algorithm

3.1 Overview

Generally, the PUDEC digitalization and conditioning algorithms need to provide little feedback to the operator if this person is trained to perform the four bar code scanning sequences. The bar code reading equipment provides audible and/or visible feedback when a bar code label is decoded and accepted by the reader (when a bar code is decoded multiple times in close succession, it is accepted only once by the reader). So, the foremost operator feedback is built-in the bar code reader technology.

Then, the PUDEC discrete outcome sensing sequence relies on the bi-directional property of unsophisticated bar code reader equipment, which prompts a dedicated operator to keep his visual attention to the die face orientations in order to follow the progress in the four scanning sequences. It is thus unrealistic to rely on usual computer display feedback as a means of requesting any change in the operator routine. In practice, the operator may have a look at a computer display between complete PUDEC discrete outcome sensing operation (after the fourth scanning sequence), and when an unrecoverable error is suspected. Even in the later case, the operator has the option to restart the sensing operation on his own initiative, notably if an unrecoverable error is almost certain in the operator's mind.

The scope of the present document is limited to the operator feedback by the two deterministic algorithms. This feedback is essentially informational and it can be adjusted as to the desired detail level. The digitalization algorithm has more information about the operator performance and efficiency than the conditioning algorithm which has little to report except perhaps its termination.

The normal progress outcome might be augmented with an overall mark for the operator performance and efficiency, based on points lost on each action taken by the error processing logic and timing of bar code readings when an otherwise perfect score is reached. Such outcome sensing operation rating should be offered only as an element of personal challenge like in a computer game. The value added by the operator workload is by no means correlated to the suggested rating.

3.2 Specific Levels of Detailed Operator Feedback

This document subsection is informational. In case it deviates from provisions of other sections, the latter must prevail. A PUDEC digitalization algorithm implementation may implement the following operator feedback messages as soon as the corresponding condition is detected in the bar code reading stream.

3.2.1 Bare Minumum Feedback

With the bare minimum feedback, the operator will be signaled when a fatal outcome sensing error occurs, and when the outcome sensing is complete and/or restarted. Since the digitalization algorithm is forgiving for a number of deviations from the expected scan sequences, many typical fatal errors would best be diagnosed with a generic message like "too much deviations from expected scan sequences" and no further explanation is expected from a PUDEC digitalization algorithm implementation.

bcd_feedb_1_1 Restarting the discrete outcome sensing operation following detection of a reset pattern.
The sequence of bar codes A-B-C-B-A-B-C-X has been recognized per section "4.1 Overview of Algorithm Operation".
bcd_feedb_1_2 Unrecoverable bar code scanning error.
This message applies to any bar code reading that breaches the logical rules of the PUDEC digitalization algorithm. These rules completely specify the error recovery logic, including error recovery that cause data loss in the discrete outcome data representation. There remains only fatal logical errors. The handling of fatal errors is covered in the introduction of section "4.3 Error Processing".
bcd_feedb_1_3 Too many bar code scan operations without completing outcome sensing or reset pattern detection.
This message could be included in the preceding one, but refers to an arbitrary limit for the bar code scan data (see the introduction of section "4.3 Error Processing"). Also, an implementation should repeat the message if the limit is reached more than once before detecting a reset sequence.
bcd_feedb_1_4 Discrete outcome data representation complete.
This message applies when the algorithm actually synthetize the digitized discrete outcome data representation, after the expiration of the inactivity timer described in section "4.1 Overview of Algorithm Operation".

3.2.2 Normal Progress Feedback

At this level, the completion of each scan sequence is signalled to the operator.

bcd_feedb_2_1 End of first sequence detected
This digitalization algorithm rules are such that the detection of an end of sequence occurs upon determination that a bar code reading belongs to the next sequence. An implementation if thus expected to defer this message by one bar code reading from the actual end of first sequence.
bcd_feedb_2_2 End of second sequence detected
This digitalization algorithm rules are such that the detection of an end of sequence occurs upon determination that a bar code reading belongs to the next sequence. An implementation if thus expected to defer this message by one bar code reading from the actual end of second sequence.
bcd_feedb_2_3 End of third sequence detected
This digitalization algorithm rules are such that the detection of an end of sequence occurs upon determination that a bar code reading belongs to the next sequence. An implementation if thus expected to defer this message by one bar code reading from the actual end of third sequence.
bcd_feedb_2_4 End of fourth sequence detected
The fourth sequence is considered complete when every dice observed in the second sequence have a corresponding reading in the fourth sequence. During the delay associated with the inactivity timer, further bar code readings may be received and processed, which might invalidate the outcome sensing operation altogether. The inactivity timer mechanism is described in section "4.1 Overview of Algorithm Operation".

3.2.3 Basic Error Processing Feedback

The category of error processing having the foremost impact on the discrete outcome result is the missing dice handling including recovery at a later point in the outcome sensing procedure. This is mainly specified in section "4.3.3 Missing Die Readings With or Without Occurrence in a Subsequent Sequence". At this level of detailed operator feedback, only a few related messages are included.

bcd_feedb_3_1 Die missing in both the second and third sequences.
The digitalization algorithm exploits data redundancy among the four conventional scan sequences. This message indicates that some data is missing but the logic can cope with it. If the missing die should have been scanned in the second sequence, it might be recovered in the fourth sequence.
bcd_feedb_3_2 A bar code symbol in the fourth sequence confirms a missing die in the second sequence
This message confirms that a bar code reading in the fourth sequence fixed a previous reported error. The specific circumstances are specified in section "4.3.4 Error Processing in the Fourth Sequence".
bcd_feedb_3_3 Die missing from the first, second, and third sequences.
The digitalization algorithm admits a number of missing dice from the first sequence, and this may include a die totally absent. This message indicates that such a missing die is detected at the end of the third scan sequence. Any data collected about this die during the fourth sequence will be checked for coherency but ignored in the discrete outcome data representation.
bcd_feedb_3_4 A bar code symbol in the second sequence confirms a missing die in the first sequence
The digitalization algorithm admits a number of missing dice from the first sequence. If scanned in the second sequence, a missing die is partially or totally recovered. This message applies to either case.
bcd_feedb_3_5 A bar code symbol in the third sequence gives position data for a missing die in the first sequence
The digitalization algorithm admits a number of missing dice from the first sequence. If scanned in the third sequence, a missing die is partially recovered and this message is issued.

3.2.4 Additional Error Recovery Feedback

Some other specific conditions in which the error processing logic applies trigger specific error messages. They may give operator confidence about the effectiveness of the outcome sensing operation in spite of some deviations from the expected sequence. Messages at this level of detailed operator feedback follow the logic of section "4.3.2 Duplicate Processing in Relation to Dice Permutation".

bcd_feedb_4_1 Resolved a previously noticed inversion while scanning the second sequence.
During the first scan sequence, the digitalization algorithm is forgiving to bar code reading duplication like A-C-B-C-D or A-B-C-B-D where A-B-C-D should have been read. This message occurs where the resulting ambiguity is resolved during the second sequence.
bcd_feedb_4_2 Resolved a previously noticed inversion while scanning the third sequence.
During the first scan sequence, the digitalization algorithm is forgiving to bar code reading duplication like A-C-B-C-D or A-B-C-B-D where A-B-C-D should have been read. This message occurs where the resulting ambiguity is resolved during the third sequence.
bcd_feedb_4_3 A previously noticed inversion remains unresolved after scanning the second and third sequences.
During the first scan sequence, the digitalization algorithm is forgiving to bar code reading duplication like A-C-B-C-D or A-B-C-B-D where A-B-C-D should have been read. This message occurs where the resulting ambiguity is not resolved despite the additional bar code data from the second and third sequences.

3.2.5 Ignored Unexpected Reading Feedback

The fourth sequence is more forgiving than the previous ones. The following message pertains to bar code readings in the fourth sequence that could be symptomatic of a lack of understanding from the part of the operator, but are nevertheless inocuous to the discrete outcome result allowed by the first three sequences.

bcd_feedb_5_1 Ignoring bar code symbol in the fourth sequence
This message signals that a bar code reading in the fourth sequence pertains to a missing die that went through superficial coherency checks. The fourth sequence is devoid of dice ordering rules enforcement that would be needed for full validation of this die reading, which is thus ignored in the preparation of die outcome data. This logic is covered in section "4.3.4 Error Processing in the Fourth Sequence".

3.2.6 Usual Cases of Ignored Reading

Ignoring duplicate bar code readings is a usual error processing strategy for bar code data entry applications. At this fine level of detailed operator feedback, ignored bar code readings are signaled to the operator.

bcd_feedb_6_1 Ignoring duplicate bar code symbol
Two identical bar code readings in succession trigger this message. The digitalization algorithm remembers a single one. This is described in section "4.3.1 Duplicate Processing Applicable to Every Scan Sequences".
bcd_feedb_6_2 Ignoring duplicate pair of bar code symbol
Two identical pairs of bar code readings in succession trigger this message (as in A-B-A-B). The digitalization algorithm remembers a single pair. This is described in section "4.3.1 Duplicate Processing Applicable to Every Scan Sequences".
bcd_feedb_6_3 Ignoring duplicate bar code symbol in the fourth sequence
The fourth scan sequence being more forgiving than the previous ones, it admits any bar code reading repeated from a previous one (not only in close succession). This message indicates the receipt of such a repeated bar code reading. See "4.3.4 Error Processing in the Fourth Sequence".

3.2.7 Early Warnings in the First Sequence

This level of detailed operator feedback provides early warnings in the first sequence for conditions that may be either recovered or reported once it is known that no recovery is going to happen. Although it may be seem benefitial to provide timely feedback, the operator actually receives little cues about what can be made to assist the error recovery.

bcd_feedb_7_1 An ambiguous inversion is noted in the first sequence.
This is an early warning of a bar code reading duplication like A-C-B-C-D detected while in the first sequence.
bcd_feedb_7_2 Missing die in the first sequence.
This is an early warning of a missing die in the first sequence.

4. The PUDEC Digitalization Algorithm

4.1 Overview of Algorithm Operation

The digitalization algorithm receives successive bar code readings and MUST output a digitized discrete outcome data representation when encountering the conditions set forth in the present document section. No other type of input is needed, except for a request for algorithm termination, in which case the digitalization algorithm MUST clean up its internal resources and terminate.

Whenever the digitalization algorithm processes a bar code reading according to its die and face indentification, it converts the bar code reading to a die and face indication either using the mapping found in section "2.2 Bar Code Data Representation", or an alternate mapping established using the guidelines found in section "2.2.1 Special Arrangements for Additional Secrecy Protection".

The internal state needed for digitized discrete outcome synthesis is implementation dependent. However, this internal state MUST be initialized before the very first bar code reading is received. Additionally, and irrespective of the current state of the algorithm, when a sequence of bar codes A-B-C-B-A-B-C-X is recognized, where A, B, C, and X are distinct bar code readings otherwise valid as die and face indications, the internal state MUST be initialized and the bar code reading X MUST be processed as if it was the very first bar code reading. Note that if A-B-C-B-A-B-C-X-C-B-C-X-Y is received, Y must supercede X as the assumed very first bar code reading.

Once the digitalization algorithm has received a bar code reading that puts it in a position to synthetize a digitized discrete outcome data representation according to its current internal state, it MUST start an inactivity timer and accept further bar code readings that has been accepted or would have been accepted if received at any earlier point in the sequence starting with the real or presumed very first bar code reading, until no bar code reading is received for the duration of the inactivity timer. When the inactivity timer so expires, the digitalization algorithm MUST synthetize the digitized discrete outcome data representation, output this data representation to the conditioning algorithm, initialize its internal state needed for digitized outcome synthesis, and process the next bar code reading as if it was the very first one. The inactivity timer duration is 3 seconds [algorithm-parameter-3]. Note that the internal state needed for the detection of the A-B-C-B-A-B-C-X sequence is preserved. Note also that bar code readings received while the inactivity timer is not yet expired may enhance the comprehensiveness of digitized discrete outcome data representation.

4.2 Expected Bar Code Readings for Each Scan Sequence

The expected first scan sequence is made of 36 die and face indications, each from a different die.

The expected second scan sequence is a made of at least 1 and at most 35 die and face indications repeated from the first sequence and occurring in the same order as their occurrence in the first sequence.

When considering only the die indications (ignoring the face indications), the expected third sequence is made of every dice not present in the second sequence, each occurring once and in the same order as their occurrence in the first sequence. For each die and face indication in the third sequence, the face indication is among the east-west axis in reference to the face indication for the same die in the first sequence.

When considering only the die indications (ignoring the face indications), the expected fourth sequence is made of every dice present in the second sequence (in any order and allowing multiple indications for the same die). For each die and face indication in the expected fourth sequence, the face indication is among the east-west axis in reference to the face indication for the same die in the first or second sequence.

Whenever the digitalization algorithm receives bar code readings that qualify as an expected first sequence starting with the real or presumed very first bar code reading, and then receives exactly bar code readings that qualify as corresponding expected second and third sequences, and finally receives bar code readings that qualify as a corresponding expected fourth sequence, it becomes in a position to synthetize a discrete outcome data representation.

When synthetizing a discrete outcome data representation after encountering expected bar code readings as indicated above in the present section, the digitalization algorithm MUST fill the permutation data with the dice ordering observed in the first sequence. For each die number, it MUST also fill the independent die outcome with a pair of face numbers having the first face number from either the second or third sequence (as the case may be from the observations), and the second face number from respectively the fourth or first sequence.

4.3 Error Processing

Upon encountering a bar code reading not compliant to the expected bar code readings, and unless otherwise specified in the following document subsections, the PUDEC digitalization algorithm MUST enter a state in which no output is produced and remain in that state until algorithm termination or detection of the A-B-C-B-A-B-C-X sequence. This same state MUST be entered if more than 300 bar code readings [algorithm-parameter-4] has been received (starting with a real or presumed very first bar code reading) without outputting the discrete outcome data representation.

The provisions of the following subsections apply independently to each sequence of bar code readings starting with a real or presumed very first bar code reading and ending with either entry in the above error state or successful outcome data representation synthesis at the expiration of the inactivity timer.

The error processing provisions are designed from a desire to survive the most common bar code scanning errors without sacrificing the uniform discrete probability distribution that characterizes the use of PUDEC dice shuffling. In general, a single error (of one of the common error patterns) for a given die and face reading may be tolerated and/or recovered. However, errors occurring at the start of a sequence past the first one are less likely to be tolerated. The error processing is intended to make a typical operator more efficient: without tolerance to typical errors, successful discrete outcome sensing may require constant operator attention and cause fatigue. Also, failure to recover from a single common error would cause operator frustration, notably near the end of the sensing procedure.

The error processing logic specifications below should be such that a bar code reading may be recognized as an element in either the first, second, third, or fourth bar code scanning sequence. This should be possible in the presence of tolerated and/or recovered errors according to the same specifications. [If a corner case is identified where it becomes impossible, then the present document will need a revision.]

4.3.1 Duplicate Processing Applicable to Every Scan Sequences

Upon encountering the same bar code reading twice in succession, the digitalization algorithm MUST behave as if it encountered a single one.

Upon encountering the same pair of bar code readings twice in succession, the digitalization algorithm MUST behave as if it encountered a single pair.

These algorithm provisions are to be applied iteratively and prior to the error processing provisions in other document subsections.

4.3.2 Duplicate Processing in Relation to Dice Permutation

Upon encountering the dice pattern A-B-A-C where A, B, and C are distinct die/face observations where at least A-B lies in the first sequence, and later on encountering A-B in either the second or third sequence (or A-/D/-B where /D/ is a sequence of one or more die observations missing from the first sequence), the digitalization algorithm MUST behave as if it encountered A-B (instead of A-B-A) in the first sequence.

Upon encountering the dice pattern A-B-A-C where A, B, and C are distinct die/face observations where at least A-B lies in the first sequence, and later on encountering B-A in either the second or third sequence (or B-/D/-A where /D/ is a sequence of one or more die observations missing from the first sequence), the digitalization algorithm MUST behave as if it encountered B-A (instead of A-B-A) in the first sequence.

Upon encountering the dice pattern A-B-A-C where A, B, and C are distinct die/face observations where at least A-B lies in the first sequence, and later on encountering neither A-B nor B-A (and neither A-/D/-B nor B-/D/-A where /D/ is a sequence of one or more die observations missing from the first sequence) in either the second or third sequence, the digitalization algorithm MUST synthesize the discrete outcome dice ordering with the uncertainty surrounding the A position in the dice ordering data representation as in B+(A)-(A).

4.3.3 Missing Die Readings With or Without Occurrence in a Subsequent Sequence

Missing die readings matching one of the following patterns are the only ones that may be handled by the error processing provisions in the present document subsection. Note that the next document subsection contains an error recovery provision applicable to one of the cases below.

Three further conditions apply and further restrict the applicability of the present subsection. Neither the second nor the third sequence may be totally empty. No more than 3 [algorithm-parameter-1] dice may be missing from the first sequence. In the case where any die is missing from the first sequence, the first die observation in the second sequence must have a corresponding position in the first sequence before the last 3 [algorithm-parameter-2].

Whenever a die observation appears in the first sequence, the digitalization algorithm MUST synthesize the discrete outcome dice ordering either with an exact position for the die or according to the previous document subsection.

Whenever a die observation is missing from the first, second, and third sequences, the digitalization algorithm MUST synthesize the discrete outcome dice ordering with the uncertainty surrounding the die position in the dice ordering data representation using the +(...) notation at the first position and the -(...) notation at the last position.

Whenever a sequence, notation /C/, of at least one missing die observation from the first sequence appears in either the second or third sequence, we use the notation A-/C/-B to refer to the die number preceding /C/ as A and the die number following /C/ as B. Similarly, the notation A-/C/ could refer to the sequence /C/ occurring at the end of either the second or third sequence. In order to allocate the dice from the /C/ sequence to the dice ordering data representation, the position of A and B in the first sequence are first determined. This determination detailed below is not trivial due to the required handling of a corner case. The first time reader is invited to refer to the section "4.3.5 Resolution of Corner Cases" for an informal description of the simplified procedure if the related corner case were non-existent.

One of four cases may occur for A and the first one that applies must be noted:

Five cases may occur for B and the first one that applies must be noted:

Note that these two lists start with the unusual conditions and end with the expected ones. It is through the elimination of special cases that the expected ones are detected without glitches.

The bold face labels in parenthesis are used to further explain the algorithm. The label A_last applies to the same condition irrespective if it was detected from die A determination or die B determination. The labels A_B and B_A need independent determination (each of the two conditions A_permut and B_permut must be eliminated). When the condition A_last is detected or the two conditions A_B and B_A are detected, the digitalization algorithm MUST synthesize the discrete outcome dice ordering with an exact position for the dice in the sequence /C/ immediately after the position for A, in the same order as in /C/.

In all other combination of determinations for die A and die B, the /C/ sequence position in the dice permutation outcome is cast with uncertainty, and the algorithm determines some placeholder range within the first sequence (more or less bounded by A and B positions) for to encode the positional uncertainty for dice in /C/. For the die A determination A_permut (resp. die B determination B_permut) the placeholder range starts with either A or (resp. ends with either B or) the die number permuted with it, whichever occupies the main position in the outcome data representation (see preceding subsection). For the die A determination A_succ (resp. die B determination B_predec) the placeholder range starts with A' (resp. ends with B'). For the die B determination A_notlast the placeholder range ends with the last die in the first sequence. In case these rules leave one placeholder range limit undefined (respectively the start or end limit for the die A determination A_B or the die B determination B_A), the range consists of a single die at the other limit. Once this placeholder range is determined, the digitalization algorithm MUST synthesize the discrete outcome dice ordering with the dice in /C/ in the +(...) notation at the placeholder range start position and in the -(...) notation at the placeholder end position.

The independent die outcome in the discrete outcome data representation is just that, independent, except that if the digitalization algorithm is prevented from outputting the dice ordering due to errors, no independent dice outcome is output.

When a missing die observation from the first sequence is present in the second sequence, the digitalization algorithm MUST fill the independent die outcome as if the die/face observation from the second sequence occurred also in the first sequence.

When a missing die observation from the first sequence is present in the third sequence, the digitalization algorithm MUST fill the independent die outcome with a single face number from this observation in the third sequence.

When neither the digitalization algorithm provision for the expected reading sequences nor any of the two above provisions and nor the provisions from the next subsection apply for a die, the digitalization algorithm MUST fill the independent die outcome with a dash (-) for this die number.

4.3.4 Error Processing in the Fourth Sequence

No missing die observations are tolerated in the fourth sequence. Since the order of bar code reading is irrelevant and duplicates are tolerated, it is possible for the operator to attempt bar code scanning retries without putting the PUDEC digitalization algorithm result at risk.

Formally, the following die/face observations are allowed after a first die/face observation has been identified to belong to the fourth sequence:

Whenever a die observation present in the first sequence is missing from both the second and third sequences and a new die/face observation for the same die occurs after a first die/face observation has been identified to belong to the fourth sequence and the new die/face observation has a face indication among the east-west axis in reference to the face indication for the same die in the first sequence, the digitalization algorithm MUST fill the independent die outcome as if the die/face observation from the first sequence occurred also in the second sequence.

4.3.5 Resolution of Corner Cases

This subsection specifies the resolution of corner cases in the error recovery procedures in situations where the other provisions are ambiguous or subject to interpretation and the provisions of other sections might become too intricate if they were to integrate the contents of the present section. This subsection also identifies, merely from a tutorial perspective, some processing provisions in other sections that are basically intended to handle a corner case in an otherwise more generic or more intuitive processing logic that is not specified.

If a very short second sequence occurs at the end of the first one, a corner case may arise where the duplicate processing rules from section "4.3.1 Duplicate Processing Applicable to Every Scan Sequences" prevent the detection of the transition from first sequence to the second secuence. In this case, the dice duplication recovery will prevent an otherwise valid concrete discrete outcome from being sensed at all. This is considered acceptable since the probability of such concrete discrete outcome is very small.

The [algorithm-parameter-2] in section "4.3.3 Missing Die Readings With or Without Occurrence in a Subsequent Sequence" fulfills a similar disambiguation role between dice duplication handling and end of scan sequence detection.

The remaining of this subsection is an informal description of the simplified processing rules that would apply to section "4.3.3 Missing Die Readings With or Without Occurrence in a Subsequent Sequence" if a corner case could be ignored.

Whenever a sequence, notation /C/, of at least one missing die observation from the first sequence appears in either the second or third sequence, we use the notation A-/C/-B to refer to the die number preceding /C/ as A and the die number following /C/ as B. Similarly, the notation A-/C/ could refer to the sequence /C/ occurring at the end of either the second or third sequence. In order to allocate the dice from the /C/ sequence to the dice ordering data representation, the position of A and B in the first sequence are considered.

When the enclosing dice A and B are next to each other in the first sequence, or A is at the end of the first sequence, the position of the sequence /C/ is fully determined and the digitalization algorithm synthesizes the discrete outcome dice ordering with an exact position for the dice in the sequence /C/ immediately after the position for A, in the same order as in /C/. Otherwise, the earliest position for the sequence /C/ is the die immediately following A and the latest position is the die immediately preceding B or the last one in the first sequence. These earliest and latest positions may be the same.

The intricate rules of section "4.3.3 Missing Die Readings With or Without Occurrence in a Subsequent Sequence" are motivated by the desirable uniform handling of dice permutation (section "4.3.1 Duplicate Processing Applicable to Every Scan Sequences") occurring next to a missing die in the first sequence. In the absence of such uniform hanlding, the operator would see a fatal error while the normal processing for either a die permutation or a missing die is forgiveness, and would be mislead about how forgiving the scannng procedure actually is for both types of errors.

5. The PUDEC Conditioning Algorithm

The PUDEC conditioning algorithm takes one or more discrete outcome messages and outputs bits. Each discrete outcome message holds the data representation specified in section "2.3 Digitalized PUDEC Discrete Outcome Data Representation", or some equivalent data representation. Given evenly distributed discrete outcome messages (according to the probability distribution of fair dice shuffling), the output stream of bits MUST have the same statistical properties as fair coin flipping. The conditioning algorithm may implement a boolean parameter ([algorithm-parameter-7]) which indicates if the bar code orientations implied in the discrete outcome message should be considered evenly distributed and unbiased. When set to false, this optional parameter allows greater assurance of statistical properties even if the PUDEC operator introduced a bias in the bar code orientations, with a small loss in operator efficiency.

The mandatory provision in the above paragraph is about all what is needed for the PUDEC conditioning algorithm. The algorithm specifications should be unambiguous and deterministic, so that it can be subjected to testing using test vectors. A preferred conditioning algorithm is specified below, but alternatives might be developed. Firstly, some informal mathematical reasoning is presented because it represents the mental model for the conditioning algorithm.

The change of base (towards base two) is the class of algorithm from which the PUDEC conditioning algorithm is a special case. In the following explanation for the preferred conditioning algorithm, the starting base is a mixed radix base. As a reminder, 93784 seconds is 1 day 2 hours 3 minutes and 4 seconds in the everyday mixed radix number base <24,60,60> that we use to count time. Using this numerical example, the reference [3] appendix B, section B.5.2 might refer to 604800 (the number of seconds in a week) as a modulus parameter equals to one plus the largest value that can be represented in a mixed radix base. Conversely, starting from a modulus parameter, a mixed radix base is an enumeration of every factors present in a factorization of the modulus parameter (the enumerated factors order is significant). In the application of these mathematical concepts to the problem at hand, the modulus parameter is a count of distinct discrete outcomes each having an equal probability.

The PUDEC conditioning algorithm may use some internal buffering so that some of (preferably most of) the randomness of a discrete outcome message is readily output as fair binary random bits and some leftover randomness is merged with the next discrete outcome message. As will be detailed below, the preferred algorithm buffering works by mixed radix number base selection for separating what is output first from what is left in the buffer. The overall goal of such an arrangement is conversion efficiency. But the end of processing remains difficut to optimize since it is always possible, although with a probability inversely proportional to the mixed radix modulus, that the conditioning algorithm conversion (parameterized by the mixed radix modulus) provides little or no unbiased random bits.

Here is a detailed explanation for the difficulty with the last buffer processing. When the application making use of the PUDEC conditioning algorithm comes close to fulfilment of its random bit requirements, it can not query the conditioning algorithm about the number of bits present in the current buffer without invalidating the unbiased distribution of the bits to be obtained from the next discrete outcome message (if merged with the current buffer contents because the query reported an insufficient buffer size). Accordingly, there are three approaches for the end of processing when the application has some knowledge of remaining random bit requirements:

5.1 Buffering of Discrete Outcome Elements Queued by Prime Components of Mixed Radix Bases

When processed according to the provisions of this document subsection, the PUDEC discrete outcome message is broken into discrete outcome elements each associated with a small prime integer representing the count of possible outcomes. The notation used for such prime-outcomes is p-outcome n where p is a small prime and n is an unsigned number smaller than p. For instance, a die outcome of 5 (respectively 1, 2, 3, 4, and 6) turns into a 3-outcome 2 and a 2-outcome 0 (respectively 3-outcome 0 and a 2-outcome 0, 3-outcome 0 and a 2-outcome 1, 3-outcome 1 and a 2-outcome 0, 3-outcome 1 and a 2-outcome 1, and finally 3-outcome 2 and a 2-outcome 1). As illustrated by this numerical example, random event reporting using one-based counting with a composite number of possible outcomes turns into zero-based counting of prime-outcomes based on the composite number factorization, with a larger prime occupying a more significant position (in the original counting) than a smaller prime.

Using these conventions, the PUDEC discrete outcome message processing gives a sequence of p-outcomes with various primes p. With p=2, the p-outcome value is a random bit already conditioned, so it is passed directly to the final processing described in section "5.3 Final Computation of a Random Bit Stream". Each prime p greater than 2 has a buffer queue (first in first out) in which the respective p-outcome values are inserted as the outcome message is processed. When a random event reporting has a composite number of possibilities with the same prime factor occurring more than once, the prime-outcome value occupying the least significant position (in the original counting) is either passed (for prime equals 2) or queued (for prime greater than 2) first.

The outcome message processing proceeds in the following sequence. First, the dice permutation is processed. Starting from the first die number position in the dice ordering portion of the outcome message and processing each die number in any -(...) trailer as if it occupied the main die number position, and proceeding up to the before-last die number position, a permutation position outcome is computed. In this computation,

Then the independent die outcomes portion of the outcome message is processed. The face numbers present in the discrete outcome message are processed in their order of occurrence. A face number occurring alone or in the first position in a face number pair is a simple one-based random event reporting with 6 possible outcomes; it is thus processed as the above numerical example. Then, a face number occurring at the second position in a face number pair is processed. One of two processing variants is selected based on an optional boolean parameter [algorithm-parameter-7].

This queuing arrangement requires a limited number of p-outcome queues for the various primes p. Specifically, with 36 dice in the PUDEC set, the ten prime numbers from 3 to 31 are needed.

5.2 Selection of Specific Mixed Radix Bases

5.2.1 Overview of Problem Statement

The prime-outcome values queued during the initial processing of PUDEC discrete outcome messages need to be converted to fair binary digits. Two algorithm options are explained in the reference [3] appendix B, section B.5.2 (converting a random number into random bits). For the PUDEC application, only the no skew (variable length extraction) option seems appropriate (section B.5.2.1). As indicated in the reference, the drawback of this algorithm is the variable length extraction, and the difficulty is more or less severe depending on (the binary representation of) a modulus, which corresponds to the selected composite number for the mixed radix base in the present application. The desirable moduli are those with many consecutive zeroes after the most significant binary digit in their binary representation, hence as close as possible to an exact power of two, but still higher than it.

The idea is thus to select a composite number with a factorization possible with the prime numbers in the queued p-outcomes at a given time, and to compute the random value from the respective p-outcome values using the mixed radix base rules (somehow a reversal of the factorization-driven breakdown of random event reporting into prime-outcome values). The composite number is the modulus n and the random number is r in the reference [3] notation. The mixed radix base selection occurs upstream of the algorithm in reference [3] and MUST NOT be influenced by the prime outcome values present in the queues, as this would likely introduce statistical bias (only the counts of various primes in the queues are taken into account).

The present section is only concerned with the selection of good composite numbers for mixed radix bases. The random number computation and its final conversion into random bits are mechanical steps covered in section "5.3 Final Computation of a Random Bit Stream".

The triggering conditions and/or the pace at which the conversion into random bits is attempted is not subject to a-priori constraints. Nonetheless it seems natural to attempt one or more conversions after the processing of each discrete outcome messages, and allow the application to request more attempts if it wishes to use the random bits left in the queues at the end of the PUDEC discrete outcome sensing session.

5.2.2 Recommended Solution

In the recommended solution for the selection of a mixed radix base for extracting prime-outcomes from the conditioning algorithm queues, a large acceptable base is selected once after the initial processing of a discrete outcome message. A mixed radix base is defined by exponent values for each prime for which there are queued prime-outcomes, where each exponent range is from zero to the respective queue size. For instance, after a single error-free PUDEC discrete outcome initial processing, there are

With this PUDEC conditioning buffer state, the recommended solution selects the mixed radix base modulus 351*58*75*112*132*172*19*23*29*31, which is the maximum exponent for every prime except 3 and 11. The mixed radix base is then <3,3, ... 51 times in all ... ,3,5,5,5,5,5,5,5,5,5,11,11,13,13,17,17,19,23,29,31>. The next discrete outcome message will adapt to the increased prime-outcome queue sizes for primes 3 and 11 and thus may select a different mixed radix base.

The binary representation for the above large modulus has 14 consecutive zeroes after the most significant one. This sets to circa 2-14 the probability of less than maximum length extraction during the conversion to random bits.

Leaving aside the possible procedural optimizations, the formal procedure is as follows:

This procedure requires multiple precision integer arithmetic, but from the observation that only the last step requires examination of binary expansion beyond the first 20 to 30 bits (supported by empirical verification), it is possible to implement most of the procedure with double precision floating point variables. Logarithmic computations circumvent the possibility of floating point exponent overflow, and base-2 logarithms allows convenient separation of modulus length (the integer part of the base-2 logarithm) from the binary representation of highest significant bits (computed from the fractional part of the base-2 logarithm).

Conceptually, it is possible for the procedure to give no answer (empty set in the first step), but in practice this does not happen. If different parameters were selected for the procedure and the exception became possible, the queuing of additional prime-outcome values from the next PUDEC discrete outcome message procedure should handle the situation.

If the application requests the final random bits that can be extracted from the current buffers, the mixed radix base selection becomes trivial and returns the largest base possible given the current queue contents.

5.3 Final Computation of a Random Bit Stream

Random bits originating from 2-outcomes are processed first in the random bit stream.

For prime-outcomes with a prime greater than 2, the final processing sees the a mixed radix base and the prime-outcome queues holding an adequate number of random values for the following process to occur:

From the previous document subsection, a mixed radix base is defined by the exponents applied to the small prime of each prime-outcome queue. As hinted by the numerical example in the section "5.2.2 Recommended Solution", a smaller prime lies in a more significant position in the mixed radix base sequence. For the purpose of random value computation, the prime-outcome values extracted first from the queues occupy the most significant position (this applies to exponents greater than one).

The random bit extraction algorithm starts with the modulus n having a most significant bit of weight 2m and the random number r where r<n. The algorithm then proceeds as follows:

for i=m down to 0 begin
        if ( n has bit 2i set and r has bit 2i clear ) then begin
                output bits of weight 20 to 2i-1, if any, from r;
                terminate;
        end; /* if */
end; /* for */

Finally, this document subsection specifies output bit packing rules for the PUDEC conditioning algorithm. The output stream is presumed organized as a sequence of elementary computer storage units, e.g. an 8 bit byte. Some buffer state is maintained in order to account for unaligned supply of random bits. Random bits fill elementary storage units starting with the highest significant bit. An elementary storage unit is not released to the application until completely filled.

6. References

[1] Thierry Moreau, "How to Seed Random Number Generation from Dice Shuffling", CONNOTECH Experts-conseils inc., Document Number C005033, 2010/09/16, available at http://www.connotech.com/doc_pudec_descr.html.

[2] Thierry Moreau, "The PUDEC Digital Entropy Collection Scheme, System and Application Aspects", CONNOTECH Experts-conseils inc., <to appear>.

[3] Elaine Barker and John Kelsey, "Recommendation for Random Number Generation Using Deterministic Random Bit Generators", NIST Special Publication 800-90, Revised March 2007, U.S. Department of Commerce, National Institute of Standards and Technology, Information Technology Laboratory, Computer Security Division, available at http://csrc.nist.gov/publications/nistpubs/800-90/SP800-90revised_March2007.pdf.