A Project Medley Design Notebook

Thierry Moreau

Document Number C004695

2008/10/30

Copyright (C) 2008 CONNOTECH Experts-conseils inc.

This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Specifications subject to change without notice.

Table of contents

1. Preface

1.1 This Document in Its Broader Innovation Context

1.2 Document Contents Purpose

1.3 Collaboration and Feedback

2. General Concepts

2.1 Some Pervasive Design Principles

2.1.1 If You Can't Explain It in Simple Words ...

2.1.2 Make It Features Lean

2.1.3 Open Source Software Is a Bargain

2.2 A Focus on IT Security

2.2.1 Be Paranoid in Software and System Engineering

2.2.2 Take Good Notice of Prior Art Inadequacies In Identity and Authentication

2.2.3 Theory is Ever Highly Relevant to Applied Cryptography

2.2.4 Key Management Is Key

2.2.5 Cryptographic Techniques Merely Shift IT Controls to Fewer Hands

2.2.6 Do Not Design From a Risk Analysis

3. The Dogbane Phantom Project

3.1 Adaptation of LAMP - Linux Apache MySQL PHP

3.1.1 Linux

3.1.2 Reduction of Apache Functionality

3.1.3 MySQL Usage

3.1.4 The PHP language

3.2 The Dogbane Application User Interface Model

3.2.1 Interactive Session Within a Browser Window

3.2.2 The "Workplace Notebook" Model of Interactive Session

3.2.3 Typical Sections of a "Workplace Notebook" Page

3.3 Security Constraints on the Dogbane Development Model

3.3.1 A Well Protected Linux Box

3.3.2 A Relational DBMS

3.3.3 A Customized Web Server

3.3.3.1 A TLS Profile

3.3.3.2 A Web Server Configuration Such That No Firewall Is Needed

3.3.3.3 A Paradoxical Client TLS Operation Mode

3.3.4 IAM, Identity and Access Management

3.3.4.1 Identity Authentication Management

3.3.4.2 Roles, Authorization, and Permission Management

3.3.4.3 Cryptographic Support for Identity Authentication Management

4. The HTML Object Oriented Authoring Utility

4.1 Overview

4.2 HTML Style Sheets, Separation of Form and Content

4.2.1 HTML Style Sheet and Visual Contents Integrity

5. The STOA Secure Application Server Development

5.1 Server and Software Development Environments

5.1.1 A Linux Installation

5.1.2 Software Architecture Design

5.1.3 Software Architecture Implementation Issues

6. [Suffering the Pain of Software Development]

6.1 Linux Kernel Installation Blues

6.2 The GNU Build System - Why?

6.3 The GNU Build System - High Level Technical Comments

7. References

7.1 Normative Rerefences

7.2 Informative Rerefences

A. HTTP Protocol Features Subset

B. Global Trust Policy Considerations

Document Revision History

C-Number

Date

Explanation

C004695

Current version

C004674

2008/09/24

Initial release

C004677

2008/10/22

Document editing changes:

o HTML anchors to source files have explicit "text/plain" MIME type.

o Added the bulk of section "5.1.2 Software Architecture Design" and the whole sections "5.1.3 Software Architecture Implementation Issues" and "6. [Suffering the Pain of Software Development]".

o Added the reference to the source code distribution following the GNU conventions for the HTML object oriented authoring utility.

C004695

2008/10/30

Minor document change in the list of informative references.

1. Preface

1.1 This Document in Its Broader Innovation Context

The publication of this document in the public Internet may be seen as a naive attempt to trigger interest in a desperate innovation process in the field of IT security. If indeed it triggers such interest, that's fine.

The document drafting effort serves a more immediate purpose: as a exercise in project documentation discipline, in the perspective where the "big picture" project has little chances of neraing completion in any foreseeable time frame. Despite being a desperate grandiose plan, some intermediate by-products might make sense, and thus deserves some form of documentation support. Still, the potentially valuable intermediate achievements are not structured as sub-projects, so the document drafting effort is an improvisation.

As of 2007Q3, it became clear that some of the applied cryptographic schemes invented by the author's organization could be deployed using widely fielded Internet protocols, without requiring the sacrifice of the comprehensiveness of the security solution. This technological opportunity is within reach only thanks to the broad scope and excellent quality of free software for software development, server and network services. The recourse to free software makes little sense without some contributions given back to the developer community. To the extent that some intermediate achievements mentioned in the preceeding paragraph could be part of this contribution back to the free software body of knowledge, a purpose of the present document is the in-progress documentation of free software projects.

1.2 Document Contents Purpose

This document is intended to collect design ideas in a diversified set of software and web projects. The big picture is out of scope, keeping the focus on the guidance of individual smaller projects. As a notebook, this document contains "draft" material. Perhaps a common documentation style could emerge such that the value of each project documentation is enhanced by being part of the medley.

The projects that will be included in this document are in various phases of development. They all originate from the same author.

1.3 Collaboration and Feedback

This document could have been be a web site using the web 2.0 paradigms, such as a blog, a wiki, or something similar in which contributions by those interested can be merged. Indeed, the public distribution of this document is intended to trigger feedback and suggestions. We hope to be able to reflect this feedback within the document as it evolves, obviously at our own discretion.

A more extensive adoption of the web 2.0 paradigms seems to imply editorial challenges (including the attribution of contributions in the Internet landscape where identity and authentication are very loosely defined concepts), a significant workload associated with web 2.0 site maintenance, and the likely loss of formalism in contents change tracking. A document that records design decisions should be subject to revision control, as a good engineering practice.

2. General Concepts

2.1 Some Pervasive Design Principles

2.1.1 If You Can't Explain It in Simple Words ...

A project is deemed not worth much without a concise and complete description of its goals, intended uses, technological base, and implementation plan. In this spirit, source code alone is not worth much.

2.1.2 Make It Features Lean

Trying to do everything is a sure way to ever postpone the completion of a project. The economics of application specific software projects justifies the concentration of development efforts in functionality elements according to a well informed priority ordering.

For instance, this applies to the use of a generic package in the context of an application-specific project, where it is preferable to document specific capabilities of the package that are relien upon. Benefits are listed below.

Admitteldy, this emphasis on limited features to fit a requirements set is a strategy that sets us apart from IT giants, like Microsoft and Google, who grew from the widest possible application of the technology they supplied.

2.1.3 Open Source Software Is a Bargain

The open source software movement is a very significant trend in the evolution of the ITC (Information Technology and Communications) sector since around 2001-2005. In the context of the present document, open source software means a drastic change in both the cost structure of project development (e.g. no more software license fees for development tools) and the overall technological environment.

Every project, and nearly every project design elements, in the present docuement respectively makes use, and complies with the basic open source software paradigm that revenues tied to software distribution in non-source form is nowadays a counterproductive business strategy.

2.2 A Focus on IT Security

A focus on IT security influences most of the projects described in the present document. This influence takes various forms as indicated in the following subsections.

2.2.1 Be Paranoid in Software and System Engineering

[...] The designer should be on the watch for potential vulnerabilities for data leak and data integrity.

2.2.2 Take Good Notice of Prior Art Inadequacies In Identity and Authentication

[...]

2.2.3 Theory is Ever Highly Relevant to Applied Cryptography

[...]

2.2.4 Key Management Is Key

[...]

2.2.5 Cryptographic Techniques Merely Shift IT Controls to Fewer Hands

[...]

2.2.6 Do Not Design From a Risk Analysis

[...]

3. The Dogbane Phantom Project

It is not sure whether the name "Dogbane" applies to a given software project. The term nevertheless refers to a definite strategy for an application software development model. This model may well be applied with concrete software source code. As an internal project name, "Dogbane" initially referred to an application development model associated with the [[SAKEM]] security scheme for remote user registration. There is indeed a dual link between the two: the Dogbane application development model is security focused and the SAKEM fielding would require secure distributed interactive applications anyway.

Nonetheless, the mix of two orthogonal concepts in a single project name made less sense when the SAKEM usage evolved into something loosely related to the original SAKEM scheme. The new approach to user registration with a service provider got the internal project name BASC, which stands for Bootstrapping an Authenticated Session Configuration.

Accordingly, the Dogbane project name refers only to a distributed interactive application development model. The security focus has a pervasive influence on the Dogbane development model, but the model should be described it its own sake. There is yet another reason for the abstract level still present in the meaning of "Dogbane". The deliverables in the Dogbane project completion could well consist of specific software design and development guidelines plus tools selection recommendations. In this perspective, any concrete examples would be provided merely to show the relevance and practicality of these guidelines and recommendations.

It is reasonable for the Dogbane development model to stay at the abstract level of guidelines and recommendations, with possible actual software project example(s). In part, this is justified by the inevitable development productivity gains offered by sticking with known or proven tools. Foremostly, the security influence in the Dogbane model will occasionally look counterproductive. IT security and operational efficiency are frequently in opposition, no matter how wisely the IT security advocate might present his/her case. It thus makes sense to adapt development model guidelines and recommendations for security vs efficiency trade-offs.

3.1 Adaptation of LAMP - Linux Apache MySQL PHP

The acronym LAMP refers to a selection of software development tools and server system components for interactive web site programming and operation, with explicit reference to:

3.1.1 Linux

There is no doubt that Linux is a good choice for a server operating system. The Dogbane development model brings no special requirements in this respect, but for the security focus. See the section "3.3.1 A Well Protected Linux Box".

3.1.2 Reduction of Apache Functionality

The annex "A. HTTP Protocol Features Subset" describes a subset of the HTTP specifications that apply to [the Dogbane project]. This is an application of the design principle explained in section "2.1.2 Make It Features Lean". This should allow to bypass the Apache web server functionality overhead both in terms of unnecessary configuration flexibility and software interfacing, e.g. CGI, between the web server and the application software. In addition, this enables the close integration of the stateful application logic with the HTTP server function which is considered stateless.

With the security focus of the Dogbane project, the web server functionality is limited to the SSL or TLS protocol according to [RFC2818]. The web server software is thus mainly a matter of SSL or TLS protocol support, and integration with stateful application logic elements.

3.1.3 MySQL Usage

There is nothing very special about the MySQL usage in the Dogbane development model compared with other projects based on the LAMP combination. However, the Dogbane project design principles includes that "the application logic is in the database". In some other web application development models, the recourse to a database is a matter of efficiency or convenience. With the application logic in the database, user actions allowed through the web should be easily reproduced with (relatively) unsophisticated database utilities, e.g. command line interface, with equivalent data validation, transaction coherency, user access privilege checking, and the like. Compliance to this design rule may not be perfect, but the web user interface logic should not add functional sophistication that would breach the rule without explicit justification.

3.1.4 The PHP language

A proof of concept Dogbane application has been programmed for verifying the feasibility of a few of the broader security scheme elements, using the LAMP combination with the PHP language integration in the Apache web server software. This programming task started from a simple PHP script and evolved into a complex hybrid PHP/C++ software which could have grown to a hardly manageable thing. The proof of concept was nonetheless a success since it validated the feasibility of specific security scheme elements. [...]

However, we found that PHP was missing the combination of object orientation and type safety offered by a C++ only solution. The hands-on programming experimentation with the various aspects of an interactive web site provided confidence that the vast library of PHP functions is not so important for the Dogbane secure application development model. This conclusion is partially justified by the design principle in section "2.1.2 Make It Features Lean".

3.2 The Dogbane Application User Interface Model

Basically, the Dogbane application model is an interactive session allowing the user to update an application-oriented database maintained by the service organization. This sounds like the traditional "electronic data processing" model. Except that the end-user interface is a web browser. The web user model, by default, is based on a persistent "resources" made available to unauthenticated users, which is quite remote from an interactive session that is usually tailored to the user roles, permissions, and privileges.

3.2.1 Interactive Session Within a Browser Window

With as few exceptions as possible, the HTML contents received from the server is self-contained and provides links and buttons only for the purpose of advancing the interactive session. Once any link or button is activated, the HTML contents becomes stale, and the next received HTML contents indicates the possible user actions. This may seem obvious and inescapable, but it is a departure from some of the basic browser functions, e.g. the "back" button, the opening of a link in a separate window, the use of pop-up windows for error messages, and page decoration with links to home page, site map, search facility, and the like.

3.2.2 The "Workplace Notebook" Model of Interactive Session

Users should be allowed to multitask, but the Dogbane user interface does not rely on the operating system or browser windowing capability to offer the multitasking function. Instead, the user controls a logical "workplace notebook" organized as a set of concurrent pages, each being a rectangular screen area typically with vertical scrolling and limited horizontal scrolling. Within a workplace notebook page, horizontal scrolling should be required only when tabular representation of data deserves a total column width larger than the assumed screen size.

The user application menu and the list of current active notebook pages are two special notebook pages. Otherwise, a notebook page is simply an interaction with the database. If the interaction is listed as currently active, it exists for the application server, subject to inactivity time-out expiration.

The layout of notebook pages within the HTML contents organization is left as a work in progress, with little emphasis put on the initial implementation. Among other characterisitics [TBD], a notebook page is either displayed or hidden. Only displayed pages are subject to user action. Hidden pages are nonetheless active and could receive application notifications (error messages) which shall not annoy the user unless she triggers a page status change from hidden to displayed.

3.2.3 Typical Sections of a "Workplace Notebook" Page

A typical workplace notebook page contains sections related to an interaction with the database. It would contain a few sections, each of which might be absent in a given instance.

3.3 Security Constraints on the Dogbane Development Model

This section inventories the various impacts of the security focus on the Dogbane development model. Being just an inventory means that the security mechanisms are merely listed, and not defined. [[Further revisions of the present document should include references to where the design of each security mechanism may be found.]]

3.3.1 A Well Protected Linux Box

The Dogbane application should be run in a well protected Linux box.

This has to be a physical Linux box, and not a virtualized operating system. A definite justification is the role of this box in the system-wide key management scheme required in the section "3.3.4.3 Cryptographic Support for Identity Authentication Management". If a system-wide key management scheme calls for controlled and auditable key storage in a given physical box, this box is not allowed to turn virtual without a clear breach of cornerstone key management principle.

3.3.2 A Relational DBMS

The relational DBMS technology allows the management of user identities and premissions along the database schema definition. If this capability is applied for each end-user, the user and permission management becomes closely tied to the specifics of a given DBMS implementation (SQL strandardization does not extend to every aspects of user and permission management).

The recourse to DBMS user management applied at the granularity of individual end-user would duplicate the end-user management for TLS that is required in any event. This requirement is tied to the web user authentication solution selected for the Dogbane model, see section "3.3.3.3 A Paradoxical Client TLS Operation Mode".

Similarly, the reliance on DBMS facilities for permission management would interfere with the approach to identity authentication management intended for the Dogbane model. See section "3.3.4.1 Identity Authentication Management". An incompatibility between the intended approach and the RDBMS technology lies in the intended notion "introducer" role that has ramifications that can hardly be supported by typical RDBMS user management facilities.

In summary, the relational DBMS security features do not represent a specific contribution to the security focus influence on the Dogbane model. This is perhaps typical with other interactive web site development approaches.

3.3.3 A Customized Web Server

[...]

3.3.3.1 A TLS Profile

[...]

3.3.3.2 A Web Server Configuration Such That No Firewall Is Needed

[...]

3.3.3.3 A Paradoxical Client TLS Operation Mode

TLS client X.509 security certificates are seldom used. In this situation, the IT security scene fails to exploit the significant potential for identity theft prevention provided by the basic public key cryptography operations. By principle, but neither in the TLS formal specification nor in any other widely deployed protocol, basic public key operations need no security certificates. The paradoxical client TLS operation mode intended for the Dogbane model uses basic public key cryptography operation with dummy or meaningless X.509 security certificate, created for the sole purpose of TLS interoperability purpose.

This TLS operation mode was first introduced in the informative reference [PKC-ONLY]. The normative reference [AIXCM] provides a formal specification for this approach. The TLS server configuration may have to take these references into account whenever the service operation context adopts this specific TLS operations mode.

Furthermore, when security certificates where first proposed, they offered third party trust management, with the introduction of certification authorities. The avoidance of X.509 security certificates comes with the loss of this third party trust management. It has to be replaced by first party trust management, i.e. the Dogbane application service provider inherits the task of managing client identities all by itself. This is covered notably in the section "3.3.4.1 Identity Authentication Management".

3.3.4 IAM, Identity and Access Management

The acronym IAM stands for "Indentity and Access Management", as a specialized field of the computer and software industry. For our purpose, we break down IAM in its two main issues:

3.3.4.1 Identity Authentication Management

This specific security aspect influencing the Dogbane application development model is the one that started it all. As hinted in the introductory section "3. The Dogbane Phantom Project", the BASC project evolved from the first design of the Dogbane model where the original SAKEM project fulfilled the identity authentication management function.

Briefly, in the present context, the identity authentication management function is concerned with the initial establishment of cryptographic key material in the context of user provisioning. This is distinct from the routine (and automated) remote user authentication, which is done e.g. during TLS session negotiation phase. Our description assumes that user authentication is based on cryptographic methods, and not merely relying on secret user passwords (even encrypted passwords would not fit the description). Both SAKEM and BASC associate cryptographic key material with a user identification, respectively a symmetric secret key with SAKEM, and an asymmetric private-public key pair with BASC. With these provisionning schemes, there are no facilities for user password resets by agents of service providers as is common practice in the identity authentication management solutions. Instead, the initial user registration is ([[hopefully]]) streamlined and efficient to the point where it becomes suitable for registration renewal after loss of cryptographic key material by the end-user.

3.3.4.2 Roles, Authorization, and Permission Management

Roles, authorization, and permission management is at once security critical, labor intensive, and tied to an organization's administrative structure and culture.

If the Dogbane development model is to be applied in an environment where security is taken seriously, then a role, authorization and permission management solution has to be part of the solution.

3.3.4.3 Cryptographic Support for Identity Authentication Management

The details of cryptographic mechanisms for BASC are [[currently not covered in the present document]]. As with the rest of this document section "3.3 Security Constraints on the Dogbane Development Model", this subsection merely indicates how the cryptographic mechanisms influence the Dogbane application development model.

For interactive applications following the Dogbane model, but not being part of the "3.3.4.1 Identity Authentication Management", the cryptographic mechanisms for identity authentication management are devoted essentially to the database integrity for the association between a client public key (used in the TLS protocol) and the end-user identity. This is the case whenever the interactive application adheres to the security framework briefly presented here, which is not strictly necessary: interactive applications that are security critical for an organization serve as the core usage scenario for the Dogbane development model and the security framework, but a usage scenario is not a definitive design criteria. In the case of the application needed for the "3.3.4.2 Roles, Authorization, and Permission Management", adherence to the security framework is nonetheless a strong design criteria (otherwise there are little justifications for yet another role management software).

The interactive applications part of the "3.3.4.1 Identity Authentication Management" are tied to the BASC user registration scheme and as such require strong confidentiality protection, both for cryptographic parameters and non-cryptographic user identification information. Such interactive applications are self-served by the integration of the Dogbane application development model and the security framework briefly presented here.

As a consequence, the server systems running some interactive applications handling identity authentication must be part of a well controlled cryptographic key management scheme. As hinted in section "2.2.5 Cryptographic Techniques Merely Shift IT Controls to Fewer Hands", these servers are part of a key hierarchy that shifts IT controls towards a cornerstone system-wide master key. For a given organization relying on the security framework, there are two options: it either asserts itself as "the root", or it outsources the global trust dissemination function to a shared service operation. We recommend the latter and we bring an intellectual property ingredient to make this option more palatable, i.e. to prevent ineffective critical mass strategy due to market fragmentation. See the annex "B. Global Trust Policy Considerations" for more information.

This section refers to military-grade-equivalent cryptography for confidentiality an integrity/authentication, but neither traffic flow confidentiality nor denial of service resistance. Furthermore, the security framework briefly presented here does not fit well existing IT security certification schemes (either NIST or "Common Criteria") that are better suited to well-delineated IT security components than our overall framework that combines a key management scheme and an application development model. In a sense, this is perhaps a manifestation of the tension between innovation and standardization.

4. The HTML Object Oriented Authoring Utility

4.1 Overview

As an initial attempt at C++ software for direct generation of HTML contents, the "HTML Object Oriented Authoring" utility was created from scratch. The project objectives are:

The source files for the core html object oriented implementation are http://www.connotech.com/html-ooa/html-ooa-incl.h and http://www.connotech.com/html-ooa/html-ooa-impl.cpp. The program logic for the word processor facilities is in the source files http://www.connotech.com/html-ooa/html-ooa-doc.h and http://www.connotech.com/html-ooa/html-ooa-doc.cpp.

This small project crudely lacks a reasonable user interface since the author writes C++ source code in order to creae a document. One alternative would be some type of HTML filter capability as commonly found in wiki entry edition tools. It is far from evident that the author productivity can get to an acceptable level with this approach, and change control is almost certainly a nightmare. Another alternative would be to create a visual HTML edition tool, which would be a me-too project. Finally, macro preprocessing that would make the document source more readable has been rejected because the C++ object orientation discipline and grood software design should allow the document source to remain somehow author friendly.

For instance, the present document source is in the file http://www.connotech.com/html-ooa/doc_proj_medley_notebook.cpp, and a small software source file simply turns the document source into an output file, see http://www.connotech.com/html-ooa/html-ooa-main.cpp.

The complete HTML object oriented authoring utility is available as a source code software package prepared according to the GNU build system conventions in the file http://www.connotech.com/html-ooa/doc-proj-medley-notebook-0.1.tar.gz.

4.2 HTML Style Sheets, Separation of Form and Content

The HTML standard and practice evolved towards a clear separation of form and contents, the latter using the HTML syntax while visual formatting is left in separate files called Cascading Stlyle Sheets (CSS) [ref]. At first glance when looking at the HTML and CSS source, the benefit of this separation may not be clear. This is due in part to the many possible source of visual formatting parameters applicable to a given portion of an HTML source element.

The HTML object oriented authoring project is an opportunity to explore the separation of HTML contents and CSS visual formatting.

The style sheet for the present document is http://www.connotech.com/html-ooa/stylesheet.css.

The structured representation of HTML contents and CSS visual formatting parameters are different. An HTML document is a tree with attributes affixed to non-leaf nodes and leaf nodes except elementary text portions. CSS visual formatting is organized as selectors, property names, and property values. Selectors are conditions for applying CSS properties to HTML elements. This CSS organization suggests a more or less tabular representation for internal processing. The CSS source file syntax is basically an abbreviated tabular representation of selectors, property names, and property values.

For the purpose of the HTML object oriented authoring project, it has been determined that there are little benefits associated with the maintenance of structured internal data representation for CSS visual formatting. A cross-reference reporting function, i.e. listing the CSS specifications applicable to each HTML element, would be a requirement justifying the maintenance of structured internal CSS visual formatting data, but that's about it.

4.2.1 HTML Style Sheet and Visual Contents Integrity

In the section "2.2.1 Be Paranoid in Software and System Engineering", it is stressed that the designer should be on the watch for potential vulnerabilities for data leak and data integrity. If the HTML object oriented authorship project is applied to HTML contents generated in the Dogbane development model, default CSS style sheets do represent a potential attack route.

For the service provider in the Dogbane scheme, there should be confidence that the end user sees the HTML contents without integrity vulnerabilities. The default style sheet in the web browser environment can alter the visual contents. It is the CSS selector fine grained expressivity potential that creates practical attack opportunities. An IT security vulnerability report could be filed in the specialized databases of IT security incidents and vulnerabilities. The solution for the paranoid is to specify explicit CSS properties for every HTML element subject to visual contents alteration.

The following HTML markup portion is an example of such a countermeasure that may be added near the beginning or the HTML source:

<STYLE type="text/css">:before{content:""}:after{content:""}</STYLE>

Similar explicit HTML markup may be required for to block the CSS property {display:none} in default CSS files that might surreptitiously remove HTML contents portions.

5. The STOA Secure Application Server Development

5.1 Server and Software Development Environments

5.1.1 A Linux Installation

The need for a Linux installation in this context is focused on the following criteria:

Controlled, reproducible, and auditable installation.
This extends to simplicity, reliance on command lines and scripts for configuration and management. Software installation from the source code is a natural choice in this context.
Limited network connectivity.
This is because the system is intended to be operated directly on a public IP address. During software development, limited network connectivity is perhaps a positive ingredient of "software development assurance" with the meaning in the field of software certification (e.g. IT security with the Common Criteria or civil aviation with RTCA DO178B).
Deployment efficiency.
This criteria implies secondary ones such as the use of a mainstream CPU architecture (e.g. i686 instead of x86_64), the availability of binary packages for the latest versions of tools and utilities (when installation from source is postponed for sake of expeditiousness), a recent version of the GCC pckage, and some trade-offs with respect to "textbook" security.
Preferably some security assurance.
Somehow paradoxically, this is almost a secondary creteria. The security focus in the STOA development model dictates arrangements of system, software, network services, user interaction principles, and application design in a way that makes many typical vulnerabilities simply not applicable. If we started with a Linux installation strengthened for the generic vulnerability pattern applicable to the general purpose computing environment, we would suffer the very IT security hindrance that cause the vulnerabilities to last over time (e.g. file and directory permission management is time-consuming - an installation that is strictier in this respect is less efficient than a looser one).

The Crux Linux distribution seems to fit these requirements. See http://crux.nu/Main/HomePage.

5.1.2 Software Architecture Design

The STOA secure server implements the web server protocol defined in [RFC2818]. The application logic coded in C++ should be integrated in the web server.

The open source software packages http://www.tntnet.org/download/cxxtools-1.4.8.tar.gz and http://www.tntnet.org/download/tntnet-1.6.3.tar.gz seem to provide the required base for the intended server. The package tntnet uses either OpenSSL or GnuTLS for TLS protocol support, the latter being the default. See http://ftp.gnu.org/pub/gnu/gnutls/gnutls-2.4.2.tar.bz2.

A basic idea behind the STOA development model is to integrate the https server and the session-oriented application logic. We present a software design in which the https server and the application logic are implemented in separate but nonetheless closely related, with a shared memory segment being the interprocess mechanism providing the core functionality. Obvioulsy, synchronization can not be avoided, and semaphores and interprocess signalling complement the shared memory usage.

The software design relies on shared memory with a further assumption: every processes accessing the shared memory are compiled with the same compiler and can, and should, refer to common declarations of data structures for objects located in the shared area. Care must be taken for pointer values: although the shared memory is located at the same address in every process address space, and pointers to shared data objects are valid for any process, a pointer value referring to a non-shared memory location is valid for a single process.

In the case of the C++ language, the above observation applies to pointers and references that the programmer explicitly declares in the software source, and also to implicit pointers, notably those related to virtual functions and virtual base classes. It is thus a requirement to avoid C++ objects with either of these characteristics in the shared memory organization. Furthermore, the C++ notions of object construction and object destruction do not fit well the data objects occurring in a shared memory area with a lifespan unrelated to any process initialization or termination. If this further restriction is accepted as a coding convention for shared memory (it could be argued that the C++ constructors and destructors are useful even for shared memory objects since they can be invoked explicitly), the C++ language definition offers the union dummy_t { int x; an_object_t t; }; as a convenient programming construct for to test the object (struct or class) type an_object_t against the requirements. This is because a union member has the exact desired limitations on virtual functions, virtual base classes, constructors and destructors.

A software engineering project was led under the assumption that a C++ compiler might support virtual functions for data objects stored in a shared memory region. This development was halted when the compiler development team rejected the request to implement this non-standard language feature, which is hard to specify in any event. See the bug report and the discussion that followed at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21251.

At an abstract level, the interprocess communications scheme comprises a single queue of request/reply objects to the server, and one queue of request/reply objects to each of the processes associated with the current sessions. With any of these queues, the receiving process is signalled when a request/reply object is ready. The interprocess scheme does not make distinction between request and reply semantic, hence the term "request/reply".

A request/reply object may point to data objects allocated dynamically in the shared memory, in which case the receiving process should free the object. This implies a specialty memory allocator. This is intended to allow relatively small request/reply objects in the queues, and things such as a web page contents in dynamically allocated buffers. The design should put an upper limit on the size of dynamically allocated shared memory objects, so that larger web pages overflow in temporary files supported by the file system. At this level of abstraction where the actual queueing mechanism is unspecified, it is unknown whether request/reply objects and/or queues are located in the shared memory or implemented with native operating system functions.

The creation of an https session is triggered when the https server receives a valid network packet that can not be identified with a currently active session. A registration mechanism must be specified for the application logic process(es) to be launched upon session initiation. It remains TBD whether this registration applies irrespective of, or dependent on, some authentication classification to be defined for the https server. Once the initial application logic process is launched, the session is associated with a queue, so the application logic process should be able to launch a successor for itself and then die or sleep. These design issues remain TBD.

The URL encoding technique must be used to implement session continuity among the http incoming messages to the server This statement refers to one of the known techniques for introducing sessions in the state-less http protocol and precludes the use of web browser cockies for this purpose. The URL encoding technique requires mutually complied conventions between the server and application logic processes since the later creates the dynamic URLs in web pages and the former parses these URLs to sort out the various sessions.

The session mechanism is not a-priori tied to the https, but it is by a deliberate design decision. Presumably, this shortens the development cycle for the STOA development model by sparing the test of a more generic solution than the final one, and integrates whatever TLS imposes as restrictions (e.g. the non-availability of http server virtual host feature) into the main development effort. It should be noted that one-to-one correspondence between a TLS session and an http session is somehow arbitrary. Accrodingly, the later relaxation of this correspondence might reveal some implementation errors which otherwise would remain harmless.

[reference implementation, i.e. proof-of-concept] [also point to the Apache configuration tricks in use].

Here is a list of rationales for the separation of server process from application logic processes.

Software modularity
Self-explanatory.
Easier application logic upgrade
The application logic process upgrades may be introduced for new user sessions without stopping current sessions using a prior version.
Potential for multi-threaded process avoidance
It should be apparent that multi-threaded applications are less in demand if the https networking function is isolated from individual database application logic processes. Single threaded software is easier to design, write and debug. The better responsiveness generally associated with multi-threading is provided by the specific interprocess communications scheme.
Software product isolation for licensing compliance
The Free Software Foundation licenses apply to a software product which may be delineated by a documented interface, so that one side of the interface might be non-free, provided the interface documentation and development tools support allow the replacement of the proprietary portion with an independently developped implementation (not necessarily with the same set of functionalities). In the case at hand, the server side is characterized mainly by its networking requirements and the application logic side is mainly a database application. The latter is more likely to fall in the non-free world according to local requirements. This issue is mentioned less as a rationale than as a derived requirement: the interface needs to be fully documented for license compliance whenever one side has such licensing requirements based on the non-free status of the other side.

5.1.3 Software Architecture Implementation Issues

The IPC model between the server proces and application processes is somewhat inspired from an embedded development documented in the informative reference [ABCD_PROTOK]. In this reference, an embedded system kernel uses a unified queueing mechanism as an alternate to the select() polling loop with multiple file descriptors invariably found in networking software in the Unix/Linux world. The ABCD Proto-Kernel abcd_wait_two_queues() function should be ignored since it is not relevant to the Unix/Linux world where file descriptor set management provides greater flexibility. The STOA IPC model shared memory and its specialty memory allocator may conveniently re-use the implementation of a Buddy System Memory Allocation described in the section 5.1 in this reference.

A reasonable implementation secnario relies on Linux kernel pipes strictly for signalling, and an associated ABCD Proto-Kernel queue in the shared memory segment for the fixed-size message objects. Synchronization with futexes appear attractive in terms of functionality and performance.

6. [Suffering the Pain of Software Development]

This docuement section reports high level observations about various sources of inefficiencies in software development activities. The IT industry often falls short of efficiently fulfilling its promises. From a management perspective, this is reflected in "total cost of ownership" analyses that reveal true costs of IT solutions much higher than anticipated. In here, we attempt to document the genuine causes of inefficiencies in selected areas of the softrware development cycle.

6.1 Linux Kernel Installation Blues

The requirements for a customized Linux installation are explained in section "5.1.1 A Linux Installation". This typically requires the customization of the Linux kernel as an important task. This is not an activity manifestly within the scope of the present document, but too significant to be left without some trace in documentation. Thus a separate document has been prepared, i.e. the informative reference [KERNEL_BUILD]. The installation of software packages once a basic Linux installation is done, based on the aggregate of requirements, is relatively straightforward. The Crux Linux distribution makes very little difference between package installation from source and installation from a binary package, and is thus well suited to a software development environment.

The Linux kernel cutomization activity has many significant benefits in terms of fulfillment of design requirements, hardware support, assurance of controlled system functions (e.g. a system less vulnerable to security holes in unused kernel features), and maybe reliability. The Linux installation is sometimes inefficient due to technical difficulties that may arise at any stage in the overall precedure, either during kernel or software packages installation. Nonetheless, it appears as a globally efficient activity given the great rewards of the Linux technology potential. Also, we can hardly refer to a technical difficulty traceable to a software quality issue, at least for the components on which we rely for our software development projects.

6.2 The GNU Build System - Why?

In the early days of computerized data processing, the software development cycle was straightforward: the developer writes source code, then compiles, links, and runs until the software is considered tested. The end-user organization acquired computer hardware to run the program, transferred the software executable version on this hardware, and finally the software benefits are enjoyed. (Admittedly, using software was typically quite unintuitive at that time, but this is not relevant.) In these days, the leading computer manufacturers supported dialects of rogramming languages that were compatible with their respective computer hardware and operating systems.

Nowadays, the end-user already uses some computer hardware and adopted software usage patterns that dictate many aspects of application software. That was the personal computer revolution, and then came the networking revolution. For our purpose, we need not pay too much attention to the world wide web aspect of the networking revolution, the ubiquitousness of distributed applications is what matters.

In the scope of this document, efficient software development is Linux-based and server-centric. The end-user has a browser somewhere and the public Internet or another IP network lies between the application server software and the end-user. The former compile-link-and-run cycle now involves distributing the application server software to a Linux/Unix/BSD system of a variant not known at the time of software design and development. Nowadays, programming languages are formally specified in standards, and development tools are almost universally drawn from a pool of free software projects.

Throughout the evolution, software development cycles became increasingly effective in fulfilling application requirements in ever diversifying fields of human activity. But the software development mechanics consistently stayed in a state not far from chaos. Thus, inescapably the GNU build system stands in our way (somehow a barrier to entry) towards the completion of the projects described in the present document.

It may be noted that the Java programming language takes a very different approach to software portability. The preceding observations are somehow oblivious of the Java alternative. In the case of the Microsoft market dominance in some definitions of the IT sector, some market-based de-facto standardization may occur, but its value seems seriously threatened by the Microsoft decision to terminate the distribution of the operating system version preferred by many end-users and organizations.

This high level introduction of the GNU build system is deemed valuable as a justification for efforts devoted to it, i.e. fulfilling the design principle stated in section "2.1.1 If You Can't Explain It in Simple Words ..." . In summary, the GNU build system is needed in modern software development projects because the Linux/Unix/BSD systems are the industry workhorse for computer server systems. Technically, while the GNU build system is generally of poor quality, it seems to be unrivalled as an effective strategy for software package distribution: it wins by default. The following subsection covers some aspects of the GNU build system that we found noteworthy for understanding its role and major implications for our projects.

6.3 The GNU Build System - High Level Technical Comments

Dependencies on operating system features and external packages at the various stages in the development process are more or less straightforward. Dependencies within the various components of the GNU build system are more intricate. For instance, the role of the aclocal executable is not readily evident from the documentation. It appears to work as a glue between automake and autoconf at installation time. Likewise, it took us too long to find out that autoreconf was available to somehow handle such interdependencies within the GNU build system.

There seems to be many assumed rules applicable to every bits and pieces that the GNU build system touches, e.g. that 777, 644, or 666 are meaningful file permissions. The basic assumption that anything less cumbersome than a handwritten Makefile is a valuable improvement may be unconvinving to newcomers.

Despite bad documentation and shaky implementation safeguards, the GNU build system has a purpose and let us try to explain it and put it in perspective. Basically, there is a developer and a client to whom a software has to be delivered. The GNU build system provides a couple of potential interface levels between these two for software delivery. Concurrent support of more than one interface level can lead to ambiguities and inefficiencies. Here is a list of conceivable interface levels:

Raw Source Code
This software delivery interface level provides very little assurance that the software can be used. Countless times a proprietary software vendor supplied source code that couldn't be compiled or linked or run, or that didn't match the installed executable version at the client side.
Input to the GNU Build System
At this interface level, to be credited to the GNU build system, the client is in a position to change every single implementation aspect of the software, efficiently. Many output files created by the GNU build system are text files that can be edited, but their contents can be large, intricate, and redundant.
./configure Delivery Interface
In theory, the ./configure delivery interface approach could be served by other tools, but for all practical purposes it should be credited to the GNU build system. Basically, the idea is to identify and test the target system characteristics, starting from minimal assumptions about Linux/Unix/BSD systems, and create a Makefile that can turn the delivered source code into the software executable version. This delivery interface option is well suited to moderately expert clients having a wide range of compatible runtime environments, with minimal support burden for the developper.
Individual Binary Package Distribution
At this delivery interface, no source code is needed. One package has to be prepared by the developer for each target environement, with possible grouping based on the software project dependencies on various target system characteristics.
Inclusion in Operating System Distributions
Linux distributions typically include a number of individual binary packages ready for installation. This is usually embedded in an automated, encompassing, and user friendly installation process.
Installed System
A computer system may be delivered with a Linux/Unix/BSD variant pre-installed with the individual binary package distribution.

In this overview, the GNU build system offers the two most flexible and efficient interface levels between a software developer and clients for software distribution in the context of free software. We are not aware of restrictions that would prevent the recourse to the GNU build system for non-free application software distribution.

In the above list, we omitted the distribution through a revision control database ( cvs or subversion svn) since they can be seen either as a variant of the Input to the GNU Build System, or as a collabration mechanism instead of a software delivery interface.

7. References

7.1 Normative Rerefences

[AIXCM]
Thierry Moreau, "Auto Issued X.509 Certificate Mechanism (AIXCM)", 6 August 2008 (available at http://www.connotech.com/public-domain-aixcm-00.txt).
[PKC-ONLY]
Thierry Moreau, "Explicit Meaningless X.509 Security Certificates as a Specifications-Based Interoperability Mechanism", CONNOTECH Experts-conseils inc., Document Number C004635, 2008/07/23 (available at http://www.connotech.com/pkc-only-meaningless-certs.pdf).
[RFC2818]
Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000. (available at http://www.ietf.org/rfc/rfc2818.txt).

7.2 Informative Rerefences

[ABCD_PROTOK]
Thierry Moreau, "The ABCD Proto-Kernel™ Guide (Embedded Software Document)", CONNOTECH Experts-conseils inc., Document Number C002274, 2004/03/09 (available at http://www.connotech.com/abcd_proto_kernel/abcd_proto_kernel_1_2.pdf).
[KERNEL_BUILD]
Thierry Moreau, "Linux Kernel Building: What Can You say to Your Boss", CONNOTECH Experts-conseils inc., Document Number C004695, 2008/10/30 (available at http://www.connotech.com/doc_linux_kernel_build.html).

A. HTTP Protocol Features Subset

B. Global Trust Policy Considerations