designing test

Designing Test Suites

Software Testing

Testing is an important part of the development life cycle of a software. Appropriate testing methods are necessary for ensuring the reliability of a program/software so developed. According to the ANSI/IEEE 1059 standard, testing is described as the process of analyzing a software item to detect the differences between existing and required conditions, i.e., defects/errors/bugs, and to evaluate the features of the software item. The purpose of testing is to verify and validate a software and to find the defects present in it, if any, with the subsequent aim of fixing them.

  • Verification is the process of checking a software for consistency and conformance by evaluating the results against pre-specified requirements.
  • Validation looks at the system’s correctness, i.e., the process of checking that what has been done is what the user actually wanted.
  • Defect is a variance between the expected and actual result. A defect’s source may be traced to a fault introduced in the specification, design, or development (coding) phases.

Standards for Software Test Documentation

IEEE 829-1998, known as the 829 Standard for Software Test Documentation, specifies the form of a set of documents for use in software testing [i]. There are other different standards some of which are discussed below.

  • IEEE 1008, a standard for unit testing
  • IEEE 1012, a standard for Software Verification and Validation
  • IEEE 1028, a standard for software inspections
  • IEEE 1044, a standard for the classification of software anomalies
  • IEEE 1044-1, a guide to the classification of software anomalies
  • IEEE 830, a guide for developing system requirements specifications
  • IEEE 730, a standard for software quality assurance plans
  • IEEE 1061, a standard for software quality metrics and methodology
  • IEEE 12207, a standard for software life cycle processes and life cycle data
  • BS 7925-1, a vocabulary of terms used in software testing
  • BS 7925-2, a standard for software component testing

Testing Frameworks

Testing frameworks help in writing and executing test cases easier for a given programming language or technology platform. In the following, we mention a few (amongst many) such frameworks that are commonly used.

  • jUnit – for writing unit test cases with Java [ii]
  • Selenium – is a suite of tools for automating web applications for software testing purposes. It also has a plugin for Firefox [iii]
  • HP QC – is the HP Web-based test management tool. It familiarizes with the process of defining releases, specifying requirements, planning tests, executing tests, tracking defects, alerting on changes, and analyzing results. It also shows how to customize project [iv]
  • IBM Rational – Rational software has a solution to support business sector for designing, implementing, and testing software [v]

Need for Software Testing

There are many reasons for why we should test software, such as:

  • Software testing identifies any existing faults in a software. Removal of such faults helps to reduce the number of system failures, which improves the reliability and the quality of the system.
  • Software testing can also improve other system qualities, such as maintainability, usability, and testability.
  • In order to meet the different legal requirements.
  • In order to meet industry specific standards, such as the Aerospace, Missile and Railway Signaling standards.

Test Cases and Test Suite

A test case describes an input and its corresponding expected output. Inputs are of two types: preconditions (circumstances that hold prior to the test case execution) and the actual inputs that are identified by some testing methods. A set of test cases is called a test suite.

Types of Software Testing

Testing is done at every stage of software development life cycle, but with different objectives at different levels. There are different types of testing, such as stress testing, volume testing, configuration testing, compatibility testing, recovery testing, maintenance testing, documentation testing, and usability testing. In the following, we briefly discuss a few categories of testing [1]

Unit Testing

Unit testing is done at the lowest level. It tests the basic unit — the smallest testable piece — of software. Unit testing are of two types.

  • Black box testing: This is also known asfunctional testing, where the test cases are designed based on input-output values only. There are different types of Black Box Testing, such as:

    Equivalence class partitioning: In this approach, the domain of input values to a program is divided into a set of equivalence classes. E.g., consider a software program that computes whether an integer in the range of 0 to 10 is even or not. There are three equivalence classes for this program: 1) The set of negative integers, 2) Integers in the range 0 to 10, and 3) Integers larger than 10

    Boundary value analysis: In this approach, the values at boundaries of different equivalence classes are taken into consideration. E.g., in the above given example, a boundary values-based test suite is { 0, -1, 10, 11 }

  • White box testing: It is also known as structural testing. In this methodology, test cases are designed on the basis of examination of the code. This testing is performed based on the knowledge of how the system is implemented. It includes analyzing data flow, control flow, information flow, coding practices, and exception and error handling within the system to test the intended and unintended software behavior. White box testing can be performed to validate whether code implementation follows intended design to validate implemented security functionality and to uncover exploitable vulnerabilities. This testing requires access to the source code. Though white box testing can be performed any time in the life cycle after the code is developed, it is a good practice to perform white box testing during the unit testing phase.

Integration Testing

Integration testing is performed when two or more tested units are combined into a larger structure. The main objective of this testing is to check whether the different modules of a program interface with each other properly or not. This testing is mainly of two types:

  • Top-down approach
  • Bottom-up approach

In the bottom-up approach, each subsystem is tested separately and then the full system is tested. Top-down integration testing starts with the main routine and one or two subordinate routines in the system. After the top-level “skeleton” has been tested, the immediate subroutines of the “skeleton” are combined with it and tested.

System Testing

System testing tends to affirm the end-to-end quality of the entire system. System testing is often based on the functional/requirement specifications of the system. Non-functional quality attributes, such as reliability, security, and maintainability are also checked. There are three types of system testing

  • Alpha testing is done by the developers who develop the software. This testing is also done by the client or an outsider with the presence of developer or tester.
  • Beta testingis done by very few number of end-users before the delivery, where the change requests are fixed, if the user gives any feedback or reports any type of defect.
  • User Acceptance testing is also another level of the system testing process where the system is tested for acceptability. This test evaluates the system’s compliance with the client requirements and assess whether it is acceptable for software delivery.

An error correction may introduce new errors. Therefore, after every round of error-fixing, another testing is carried out called regression testing. Regression testing does not belong to either unit testing, integration testing, or system testing, instead, it is a separate dimension to these three forms of testing.

Regression Testing

The purpose of regression testing is to ensure that bug fixes and new functionality introduced in a software do not adversely affect the unmodified parts of the program [2]. Regression testing is an important activity at both testing and maintenance phases. When a piece of software is modified, it is necessary to ensure that the quality of the software is preserved. To this end, regression testing is to retest the software using the test cases selected from the original test suite.


Write a program to calculate the square of a number in the range 1-100

#include <stdio.h>
int main()  {
     int n, res;
     printf("Enter a number: ");
     scanf("%d", &n);
     if (n >= 1 && n <= 100)
         res = n * n;
          printf("\n Square of %d is %d\n", n, res);
     else if (n <= 0 || n > 100)
         printf("Beyond the range");

      return 0;


Inputs               Outputs
I1 :  -2        O1 :  Beyond the range
I2 :   0        O2 :  Beyond the range
I3 :   1        O3 :  Square of 1 is 1
I4 : 100        O4 :  Square of 100 is 10000
I5 : 101        O5 :  Beyond the range
I6 :   4        O6 :  Square of 4 is 16
I7 :  62        O7 :  Square of 62 is 3844

Test Cases

T1 : {I1, O1}
T2 : {I2, O2}
T3 : {I3, O3}
T4 : {I4, O4}
T5 : {I5, O5}
T6 : {I6, O6}
T7 : {I7, O7}

Some Remarks

Beginners often have a misconception that one should be concerned with testing only after coding ends. Testing is, however, not a just phase towards the end. It is rather a continuous process. The efforts for testing should begin in the form of preparation of test cases after the requirements have been finalized. The Software Requirements Specification (SRS) document captures all features to be expected from the system. The requirements so identified there should serve as a basis towards preparation of the test cases. Test cases should be designed in such a way that all target features can be verified. However, testing a software is not only about proving that it works correctly. Successful testing should also point out the bugs present in the system, if any.

estimation of test coverage metrics and structual complexity

Estimation of Test Coverage Metrics and Structural Complexity

Control Flow Graph

A control flow graph (CFG) is a directed graph where the nodes represent different instructions of a program and the edges define the sequence of execution of such instructions. Figure 1 shows a small snippet of code (computing the square of an integer) along with its CFG. For simplicity, each node in the CFG has been labeled with the line numbers of the program containing the instructions. A directed edge from node #1 to node #2 in Figure 1 implies that after execution of the first statement, the control of execution is transferred to the second instruction.

int x = 10, x_2 = 0;
x_2 = x * x;
return x_2;

A simple CFG

Figure 1: A simple program and its CFG

A program, however, does not always consist of only sequential statements. There can be branching and looping involved in it as well. Figure 2 shows how a CFG would look like if there are sequential, selection, and iteration kind of statements in order.

CFG with branch and loopFigure 2: CFG for different types of statements

A real life application can seldom be written in a few lines. In fact, it might consist of hundreds and thousands of lines. A CFG for such a program is likely to become very large and it would contain mostly straight-line connections. To simplify such a graph, different sequential statements can be grouped together to form a basic block. A basic block is a [ii, iii] maximal sequence of program instructions I1, I2, …, In such that for any two adjacent instructions Ik and Ik+1, the following holds true:

  • Ik is executed immediately before Ik+1
  • Ik+1 is executed immediately after Ik

The size of a CFG could be reduced by representing each basic block with a node. To illustrate this, let us consider the following example.

    sum = 0;
    i = 1;
    while (i ≤ n) {
        sum += i;
    printf("%d", sum);
    if (sum > 0) {

The CFG with basic blocks for the above code is shown in Figure 3.

CFG with basic blocksFigure 3: Basic blocks in a CFGThe first statement of a basic block is termed as leader. Any node x in a CFG is said to dominate another node y (written as x dom y) if all possible execution paths that goes through node y must pass through node x. The node x is said to be adominator [ii]. In the above example, line #s 1, 3, 4, 6, 7, 9, and 10 are leaders. The node containing lines 7 and 8 dominate the node containing line # 10. The block containing line #s 1 and 2 is said to be the entry block; the block containing line # 10 is said to be the exit block.

If any block (or sub-graph) in a CFG is not connected with the sub-graph containing the entry block, it signifies that the concerned block contains code, which is unreachable while the program is executed. Such unreachable code can be safely removed from the program. To illustrate this, let us consider a modified version of our previous code:

    sum = 0;
    i = 1;
    while (i ≤ n) {
        sum += i;
    return sum;
    if (sum < 0) {
        return 0;

Figure 4 shows the corresponding CFG. The sub-graph containing line #s 8, 9 and 10 is disconnected from the graph containing the entry block. The code in the disconnected sub-graph would never get executed, and therefore, can be discarded.

Unreachable blocksFigure 4: CFG with unreachable blocks.


A path in a CFG is a sequence of nodes and edges that starts from the initial node (or entry block) and ends at the terminal node. The CFG of a program can have more than one terminal nodes.

Linearly Independent Path
A linearly independent path is any path in the CFG of a program such that it includes at least one new edge not present in any other linearly independent path. A set of linearly independent paths give a clear picture of all possible paths that a program can take during its execution. Therefore, path-coverage testing of a program would suffice by considering only the linearly independent paths.
In Figure 3 we can find four linearly independent paths:

        1 - 3 - 6 - (7, 8) - 10
        1 - 3 - 6 - (7, 8) - 9 - 10
        1 - 3 - (4, 5) - 6 - (7, 8) - 10
        1 - 3 - (4, 5) - 6 - (7, 8) - 9 - 10

Note that 1 – 3 – (4, 5) – 3 – (4, 5) – 6 – (7, 8) – 10, for instance, would not qualify as a linearly independent path because there is no new edge not already present in any of the above four linearly independent paths.

McCabe’s Cyclomatic Complexity

McCabe applied graph-theoretic analysis to determine the complexity of a program module [vi]. The Cyclomatic complexity metric, as proposed by McCabe, provides an upper bound for the number of linearly independent paths that can exist for a given program module. Complexity of a module increases as the number of such paths in the module increase. Thus, if Cyclomatic complexity of any program module is 7, there can be up to seven linearly independent paths in th module. For a complete testing, each of those possible paths should be tested.

Computing Cyclomatic Complexity

Let G be a a given CFG. Let E denote the number of edges and N denote the number of nodes. Let V(G) denote the Cyclomatic complexity for the CFG. V(G) can be obtained in either of the following three ways:

  • Method #1V(G) = E – N + 2
  • Method #2V(G) could be directly computed by a visual inspection of the CFG: V(G) = Total number of bounded areas + 1 It may be noted here that structured programming would always lead to a planar CFG.
  • Method #3: If LN be the total number of loops and decision statements in a program, then V(G) = LN + 1

In case of object-oriented programming, the above equations apply to methods of a class [viii]. Also, the value of V(G) so obtained is incremented by 1 considering the entry point of the method. A quick summary of how different types of statements affect V(G) could be found in [ix], for example. Once the complexities of individual modules of a program are known, complexity of the program (or class) can be determined by [4], [ix]: V(G) = SUM( V(Gi) ) – COUNT( V(Gi) ) + 1 whereCOUNT( V(Gi) ) gives the total number of procedures (methods) in the program (class).

Optimum Value of Cyclomatic Complexity

A set of threshold values for Cyclomatic complexity has been presented in [vii], which we reproduce below.

V(G) Module Category Risk
1-10 Simple Low
11-20 More complex Moderate
21-50 Complex High
> 50 Unstable Very high

It has been suggested that the Cyclomatic complexity of any module should not exceed 10 [vi], [4]. Doing so would make a module difficult to understand for humans. If any module is found to have Cyclomatic complexity greater than 10, the module should be considered for redesign. Note that, a high value of V(G) is possible for a given module if it contains multiple cases in C likeswitch-case statements. McCabe exempted such modules from the limit of V(G) as 10 [vi].


McCabe’s Cyclomatic complexity has certain advantages:

  • Independent of programming language
  • Helps in risk analysis during development or maintenance phase
  • Gives an idea about the maximum number of test cases to be executed (hence, the required effort) for a given module


Cyclomatic complexity does not reflect on cohesion and coupling of modules.

McCabe’s Cyclomatic complexity was originally proposed for procedural languages. One may look in [xi] to get an idea of how the complexity calculation can be modified for object-oriented languages. In fact, one may also wish to make use of Chidamber-Kemerer metrics [x] (or any other similar metric), which has been designed for object-oriented programming.

Modeling data flow diagram

Modeling Data Flow Diagrams

Data Flow Diagram

A Data Flow Diagram (DFD) provides the functional overview of a system. The graphical representation easily overcomes any gap between “user and system analyst” and “analyst and system designer” in understanding a system. Starting from an overview of the system, it explores detailed design of the system through a hierarchy. A DFD shows the external entities from which data flow into the process and also the other flows of data within a system. It also includes the transformations of data flow by the process and the data stores to read or write a data.

Graphical notations for Data Flow Diagram

Term Notation Remarks
External entity  External entity Name of the external entity is written inside the rectangle
Process  Process Name of the process is written inside the circle
Data store  Data store A left-right open rectangle is denoted as data store; name of the data store is written inside the shape
Data flow  Data flow Data flow is represented by a directed arc with its data name

Symbols used in DFD

  • Process: Processes are represented by circle. The name of the process is written inside the circle. A process is usually named in such a way that it represents its functionality. More detailed functionality can be shown in the next Level if required. Usually, it is recommended to keep the number of processes less than seven [i]. If we see that the number of processes becomes more than that, then we should combine some of the processes into a single one to reduce their count and further decompose the combined process in the next level [2].
  • External entity: External entities only appear the in context diagram [2]. External entities are represented by a rectangle and the name of the external entity is written inside the shape. These send data to be processed and also receive the processed data.
  • Data store: Data stores are represented by a left-right open rectangle. Name of the data store is written in between the two horizontal lines of the open rectangle. Data stores are used as repositories from where data can be flown in to or flown out from from a process.
  • Data flow: Data flows are shown as a directed edge between two components of a Data Flow Diagram. Data can flow from external entity to process, data store to process, in between two processes, and vice-versa.

Context diagram and leveling DFD

We start with a broad overview of a system represented in level 0 diagram, also known as the context diagram of the system. The entire system is shown as single process and the interactions of external entities with the system are also represented here. In the subsequent levels, we split the process into several sub-processes to represent the detailed functionality performed by the system. Data stores may appear at higher level DFDs. 
Numbering of processes: If process “p” in context diagram is split into three processes, say “p1”, “p2”, and “p3”, in next level, then these are labeled as 0.1, 0.2, and 0.3 in level 1, respectively. Let the process “p3” be again split into three processes “p31”, “p32”, and “p33” in level 2. Then, these are labeled as 0.3.1, 0.3.2, and 0.3.3, respectively. 
Balancing DFD: Data that flow into a process and data that flow out of the process need to be match when the process is split in the next level [2]. This is known as balancing a DFD. 

See simulation [ii] and case study [iii] of this experiment to understand data flow diagram in more real context.


  1. External entities may appear only in the context diagram [2], i.e, only at level 0 of a DFD.
  2. Keep number of processes at each level less than seven [i].
  3. Data flow is not possible between any two external entities and between any two data stores [i].
  4. Data cannot flow from an external entity to a data store and vice-versa [i].

modeling uml class diagrams and sequence diagrams

Modeling UML Class Diagrams and Sequence diagrams

Structural and Behavioral Aspects

Developing a software system in object oriented approach (or any other approach, in general) is very much dependent on understanding the problem. From the point of view of a developer, it is essential to have insights on the structural and behavioral aspects of a problem in order to design a solution for the same. Class and sequence diagrams in UML, respectively, can help us with these aspects.

Class diagram

It is a graphical representation for describing a system in context of its static construction [1]. A class diagram contains the system classes with its data members, operations, and relationships between classes.


A set of objects containing similar data members and member functions is described by a class. In UML syntax, class is identified by solid outline rectangle with three horizontal compartments which contain

  • Class name
  • : A class is uniquely identified in a system by its name, which is a textual string [2]. The name is written inside the first (top) compartment in class rectangle.

  • Attributes: These are properties shared by all instances of a class. They are listed inside the second compartment in class rectangle.
  • Operations: These are actions that can be performed by/upon the instances of a class or the class itself. These are listed inside the last (bottom) compartment in class rectangle.


To build a structural model for an educational organization, “Course” can be treated as a class, which contains the attributes “courseName” and “courseID” along with the operations “addCourse()” and “removeCourse()” allowed to be performed over any object of that class.

It may be noted usually significant members and operations are depicted in a class diagram. For example, the Course class may have a member, say “Description”, which does not affect the structural aspects as such, and therefore, omitted in Figure 1. In fact, when only the relationships among different classes matter, these attributes and operations may be skipped altogether.

StateFigure 1: A class with attributes and operations


Existing relationships in a system describe legitimate connections among the classes in that system. Such relationships can be of different types and are briefly discussed below:

  • Association: It is an instance level relationship [i] that allows exchanging messages among the objects of the both ends of an association. A solid straight line connecting two class boxes represent an association relationship. We can give name to an association. Also, we can indicate roles and multiplicity of the adjacent classes at the end points of the straight line. Association can be uni-directional or bi-directional.


    In the structure model for a system of an organization, an employee (instance of “Employee” class) is always assigned to a particular department (instance of “Department” class) and the association can be shown by a line connecting the respective classes.

    StateFigure 2: Association between two classes

  • Aggregation: It is a special form of association which describes a part-whole [i] relationship between a pair of classes. It means, in a relationship, when a class holds some instances of related class, then that relationship can be designed as an aggregation.


    For a supermarket in a city, each branch runs some of the departments that they have. So, the relation among the classes “Branch” and “Department” can be designed as an aggregation. In UML, it can be indicated as shown in the Figure below.

    StateFigure 3: An example of aggregation

  • Composition [i]: It is a strong from of aggregation which describes that whole completely owns its part. Life cycle of the part depends on the whole.


    Let us consider a shopping mall that has several branches in different locations in a city. The existence of these branches completely depends on the shopping mall, since if the mall does not exist, none of its branches would exist in the city. This relation can be described as composition and can be shown as illustrated below.

    StateFigure 4: Composition of classes

  • Generalization/Specialization: It describes how one class is derived from another class. Derived classes inherit the properties of its parent class.

    Example >

    Geometric_Shapes in Figure 5 describes how many sides a particular shape has. Triangle, Quadrilateral, and Pentagon are three classes that inherit the property of the Geometric_Shapes class. So the relationship of these classes with Geometric_Shapes is generalization. Now Equilateral_Triangle, Isosceles_Triangle, and Scalene_Triangle — these three classes inherit the properties of the Triangle class as each one of them has three sides. So, these are specialization of the Triangle class.

    StateFigure 5: Hierarchical relationship among classes

  • Multiplicity: It describes how many instances of a given class is related to the number of instances of another class in an association relationship.

    Notation for different types of multiplicity:

    StateFigure 6: Different types of multiplicities


    A vehicle can have two or more wheels

    StateFigure 7: Multiplicity relation between a vehicle and is wheels

Sequence diagram

It represents the behavioral aspects of a system [1]. A sequence diagram shows the interactions among different objects in a system [1] by means of message passing from one object to another with respect to time [2].

Elements in sequence diagram

A sequence diagram consists of the objects of a system and their life-line bar and the messages passed among them.


Objects appear at the top portion of sequence diagram [1]. An object is shown in a rectangular box. Name of the object precedes a colon “:” and the class name from which the object is instantiated. The whole string is underlined and appears inside the concerned rectangle box. Also, we may use only class name or only instance name.

Life-line bar

A downward vertical line from object-box indicates the life-line of the corresponding object. A rectangular bar on life-line indicates that the concerned object is active at that point of time [1].


>A message is shown as an arrow from the life-line of sender object to the life-line of receiver object and labeled with the message name. Their sequence of ordering indicate the chronological order of the message passing among the different objects [1]. There can be different types of messages:

  • Synchronous messages: When a receiver starts processing such a message after receiving it, the sender needs to wait until the processing is over [iii]. A straight arrow with closes and filled arrow-head from sender life-line bar to the receiver end indicate a synchronous message [iii].
  • Asynchronous messages: In case of asynchronous messages, a sender does not need to wait for the receiver to process the message [iv]. A function call that creates thread can be represented as an asynchronous message in a sequence diagram [iv]. A straight arrow with open arrow-head from the sender’s life-line bar to receiver end indicate an asynchronous message [iii].
  • Return message: When we need to return the value of a functin call to the object from which it was called, return messagees are used. A dashed arrow with open arrow-head from the sender’s life-line bar to receiver end indicate a return message.
  • Response message: An object can send a message to itself [iv]. We use this type of message when we need to show the interaction between the same object.

StateFigure 8: Different types of messages

statechart and activity modeling

Statechart and Activity Modeling

Statechart Diagrams

In case of Object Oriented Analysis and Design, a system is often abstracted by one or more classes with some well defined behaviour and states. A statechart diagram is a pictorial representation of such a system with all its states and different events that lead transition from one state to another.

To illustrate this, let us consider a computer. Some possible states that it could have are: running, shutdown, and hibernate. A transition from running state to shutdown state occurs when user presses the “Power off” switch, or clicks on the “Shut down” button as displayed by the OS. Here, clicking on the shutdown button or pressing the power off switch act as external events causing the transition. Statechart diagrams are usually drawn to model behaviour of complex systems. For a simple system, it is optional.

Building Blocks of a Statechart Diagram


A state is any “distinct” stage that an object (system) passes through in its lifetime. An object remains in a given state for finite time until “something” happens, which makes it to move to another state. All such states can be broadly categorized into following three types:

  • Initial: The state in which an object initially remains when created
  • Final: The state from which an object does not move to any other state [optional]
  • Intermediate: Any state, which is neither initial nor final

As shown in Figure 1, an initial state is represented by a circle filled with black. An intermediate state is depicted by a rectangle with rounded corners. A final state is represented by a unfilled circle with an inner black-filled circle.

StateFigure 1: Representation of initial, intermediate, and final states of a statechart diagramIntermediate states usually have two compartments separated by a horizontal line called the name compartment and internal transitions compartment [iv]. They are described below:

  • Name compartment: Contains the name of the state, which is a short, simple, and descriptive string
  • Internal transitions compartment: Contains a list of internal activities performed as long as the system is in this state

The internal activities are indicated using the following syntax: action-label / action-expression. Action labels can be any condition indicator. There are, however, four special action labels:

  • Entry: Indicates activity performed when the system enters this state
  • Exit: Indicates activity performed when the system exits this state
  • Do: Indicates any activity that is performed while the system remains in this state or until the action expression results in a completed computation
  • Include: Indicates invocation of a sub-machine

Any other action label identify the event (internal transition) as a result of which the corresponding action is triggered. Internal transition is almost similar to self transition except that the former does not result in execution of entry and exit actions. That is, the system does not exit or re-enter that state. Figure 2 shows the syntax for representing a typical (intermediate) state

StateFigure 2: A typical state in a statechart diagramStates can also be classified as simple or composite (a state containing other states). Here, however, we shall deal only with simple states.


Transition is movement from one state to another state in response to an external stimulus (or any internal event). A transition is represented by a solid arrow from the current state to the next state. It is labeled by: event [guard-condition]/[action-expression], where

  • Event is a trigger that caused the concerned transition (mandatory) — Written in past tense [iii]
  • Guard-condition is (are) precondition(s), which must be true for the transition to happen [optional]
  • Action-expression indicate action(s) to be performed as a result of the transition [optional]

It may be noted that if a transition is triggered with one or more guard-condition(s) that evaluate to false, the system will continue to stay in the present state. Also, not all transitions do result in a state change. For example, if a queue is full, any further attempt to append will fail until the delete method is invoked at least once. Thus, the state of the queue does not change in this duration.


As mentioned in [ii], actions represent behaviour of the system. While the system is performing any action for the current event, it does not accept or process any new event. The order in which different actions are executed, is given below:

  1. Exit actions of the present state
  2. Actions specified for the transition
  3. Entry actions of the next state

Figure 3 shows a typical statechart diagram with all its syntaxes.

State transitionFigure 3: A statechart diagram showing transition from state A to B

Guidelines for drawing Statechart Diagrams

The following steps can be followed, as suggested in [i], to draw a statechart diagram:

  • For the system to developed, identify the distinct states that it passes through
  • Identify the events (and any precondition) that cause the state transitions. Often these would be the methods of a class as identified in a class diagram.
  • Identify what activities are performed while the system remains in a given state

Activity Diagrams

Activity diagrams fall under the category of behavioural diagrams in Unified Modeling Language. It is a high level diagram used to visually represent the flow of control in a system. It has similarities with traditional flow charts. However, it is more powerful than a simple flow chart since it can represent various other concepts like concurrent activities, their joining, and so on [vii, viii].

Activity diagrams, however, cannot depict the message passing among related objects. As such, it cannnot be directly translated into code. These kind of diagrams are suitable for confirming the logic to be implemented with business users. These diagrams are typically used when the business logic is complex. In simple scenarios it can be avoided entirely [ix].

Components of an Activity Diagram

Below we describe the building blocks of an activity diagram.


An activity denotes a particular action taken in the logical flow of control. This can simply be invocation of a mathematical function, alter an object’s properties, and so on [x]. An activity is represented with a rounded rectangle, as shown in Table 1. A label inside the rectangle identifies the corresponding activity.

There are two special type of activity nodes: initial and final. They are represented with a filled circle and a filled in circle with a border, respectively (Table 1). An initial node represents the starting point of a flow in an activity diagram. There can be multiple initial nodes, which means that invoking that particular activity diagram would initiate multiple flows. A final node represents the end point of all activities. Like an initial node, there can be multiple final nodes. Any transition reaching a final node would stop all activities.


A flow (also termed as edge or transition) is represented with a directed arrow. This is used to depict transfer of control from one activity to another or to other types of components, as we will see below. A flow is often accompanied with a label, called the guard condition, indicating the necessary condition for the transition to happen. The syntax to depict it is [guard condition].


A decision node, represented with a diamond, is a point where a single flow enters and two or more flows leave. The control flow can follow only one of the outgoing paths. The outgoing edges often have guard conditions indicating true-false or if-then-else conditions. However, they can be omitted in obvious cases. The input edge can also have guard conditions. Alternately, a note can be attached to the decision node indicating the condition to be tested.


This is (also) represented with a diamond shape, with two or more flows entering and a single flow leaving out. A merge node represents the point where at least a single control should reach before further processing can continue.


Fork is a point where parallel activities begin. For example, when a student has been registered with a college, he/she can in parallel apply for student ID card and library card. A fork is graphically depicted with a black bar together with a single flow entering and multiple flows leaving out.


A join is depicted with a black bar with multiple input flows, but a ingle output flow. Physically it represents the synchronization of all concurrent activities. Unlike a merge, in case of a join, all of the incoming controls must be completed before any further progress can be made. For example, a sales order is closed only when the customer has received the productand the sales company has received its payment.


UML allows attaching a note to different components of a diagram to present some textual information. The information can simply be a comment or may be some constraint. A note can be attached to a decision point, for example, to indicate the branching criteria.


Different components of an activity diagram can be logically grouped into different areas called partitions or swimlanes. They often correspond to different units of an organization or different actors. The drawing area can be partitioned into multiple compartments using vertical (or horizontal) parallel lines. Partitions in an activity diagram are not mandatory.

The following Table shows commonly used components with a typical activity diagram.

Component Graphical Notation
Activity Activity
Flow Flow
Decision Decision
Merge Merge
Fork Fork
Join Join
Note Note

Table 1: Typical components used in an activity diagram

Apart from the above stated components, there are few other components as well (e.g., representing events, sending of signals, and nested activity diagrams), which are not discussed here. The reader is suggested to go through [x], for example, for further details.

A Simple Example

Figure 4 shows a simple activity diagram with two activities. The Figure depicts two stages of a form submission. At first, a form is filled up with relevant and correct information. Once it is verified that there is no error in the form, it is submitted. The two other symbols shown in the Figure are the initial node (dark filled circle) and final node (outer hollow circle with inner filled circle). It may be noted that there can be zero or more final node(s) in an activity diagram [ix].

Activity diagramFigure 4: A simple activity diagram

Guidelines for drawing an Activity Diagram

The following general guidelines could be followed to pictorially represent a complex logic.

  • Identify tiny pieces of work being performed by the system
  • Identify the next logical activity that should be performed
  • Think about all those conditions that should be made, and all those constraints that should be satisfied, before one can move to the next activity
  • Put non-trivial guard conditions on the edges to avoid confusion

identifying domain classes from the problem statements

Identifying Domain Classes from the Problem Statements

Domain Class

In Object Oriented paradigm, Domain Object Model has become subject of interest for its excellent problem comprehending capabilities towards the goal of designing a good software system. Domain Model, as a conceptual model, gives proper understanding of problem description through its highly effective component — the Domain Classes. Domain classes are the abstraction of key entities, concepts, or ideas presented in the problem statement [iv]. As stated in [v], domain classes are used for representing business activities during the analysis phase. Below we discuss some techniques that can be used to identify the domain classes.

Traditional Techniques for Identification of Classes

Grammatical Approach Using Nouns

This object identification technique was proposed by Russell J. Abbot and the technique was made popular by Grady Booch [1]. This technique involves grammatical analysis of the problem statement to identify list of potential classes. The logical steps are:

  1. Obtain the user requirements (problem statement) as a simple and descriptive English text. This basically corresponds to the use-case diagram for the problem statement.
  2. Identify and mark the nouns, pronouns, and noun phrases from the above problem statement.
  3. List of potential classes is obtained based on the category of the nouns (details given later). For example, nouns that direct refer to any person, place, or entity in general, correspond to different objects. And so does singular proper nouns. On the other hand, plural nouns and common nouns are candidates that usually map into classes.


This is one of the simplest approaches that can be easily understood and applied by a larger section of the user base. The problem statement does not necessarily be in English, but can be in any other language.


The problem statement may not always help towards correct identification of a class. At times, it could give us redundant classes. At other times, the problem statement may use abbreviations for large systems or concepts, and therefore, the identified class may actually point to an aggregate of classes. In other words, it may not find all the objects.

Using Generalization

In this approach, all potential objects are classified into different groups based on some common behaviour. Classes are derived from these groups.

Using Subclasses

Here, instead of identifying objects, one goes for identification of classes based on some similar characteristics. These are the specialized classes. Common characteristics are taken from them to form the higher level generalized classes.

Steps to Identify Domain Classes from Problem Statement

We now present the steps to identify domain classes from a given problem statement. This approach is mostly based on the “Grammatical approach using nouns” discussed above, with some insights from [i].

  1. Make a list of potential objects by finding out the nouns and noun phrases from the narrative problem statement
  2. Apply subject matter expertise (or domain knowledge) to identify additional classes
  3. Filter out the redundant or irrelevant classes
  4. Classify all potential objects based on categories. We follow the category table as described by Ross (Table 5-3, pg 88, [1])
    Categories Explanation
    People Humans who carry out some function
    Places Areas set aside for people or things
    Things Physical objects
    Organizations Collection of people, resources, facilities and capabilities having a defined mission
    Concepts Principles or Ideas not tangible
    Events Things that happen (usually at a given date and time), or as a steps in an ordered sequence
  5. Group the objects based on similar attributes. While grouping we should remember that
    • Different nouns (or noun phrases) can actually refer to the same thing (examples: house, home, and abode)
    • Same nouns (or noun phrases) can refer to different things or concepts (example: I go to school every day / This school of thought agrees with the theory)
  6. Give related names to each group to generate the final list of top level classes
  7. Iterate over to refine the list of classes

Advanced Concepts

Identification of domain classes might not be a simple task for novices. It requires expertise and domain knowledge to identify business classes from plain English text. The concepts presented here have been kept simple in order to make a student familiarize with the subject. A lot of research work has been done in this area and various techniques have been proposed to identify domain classes. Interested readers may look at the following paper for an advanced treatment on this subject matter.
I. Y. Song, K. Yano, J. Trujillo, and S. Lujan-Mora. “A Taxonomic Class Modeling Methodology for Object-Oriented Analysis,” In Information Modeling Methods and Methodologies, Advanced Topics in Databases Series, Ed. (J Krostige, T. Halpin, K. Siau), Idea Group Publishing, 2004, pp. 216-240.

E-R modeling from the problem statements

E-R Modeling from the Problem Statements

Entity Relationship Model

Entity-Relationship (ER) model is used to represent a logical design of a database to be created. In ER model, real world objects (or concepts) are abstracted as entities and different possible associations among them are modeled as relationships.

For example, student and school — they are two entities. Students study in school. So, these two entities are associated with a relationship “Studies in”. As another example, consider a system where some job runs every night, which updates the database. Here, job and database could be two entities. They are associated with the relationship “Updates”.

Entity Set and Relationship Set

An entity set is a collection of all similar entities. For example, “Student” is an entity set that abstracts all students. Ram and John are specific entities belonging to this set. Similarly, a “Relationship” set is a set of similar relationships.

Attributes of Entity

Attributes are the characteristics describing any entity belonging to an entity set. Any entity in a set can be described by zero or more attributes. For example, any student has got a name, age, and address. At any given point of time a student can study only at one school. In the school he/she would have a roll number and a grade in which he/she studies. These data are the attributes of the entity set Student.


One or more attribute(s) of an entity set can be used to define the following keys:

  • Super key: One or more attributes, which when taken together, helps to uniquely identify an entity in an entity set. For example, a school can have any number of students. However, if we know the grade and roll number, then we can uniquely identify a student in that school.
  • Candidate key: It is a minimal subset of a super key. In other words, a super key might contain extraneous attributes, which do not help in identifying an object uniquely. When such attributes are removed, the key so formed is called a candidate key.
  • Primary key: A database might have more than one candidate key. Any candidate key chosen for a particular implementation of the database is called a primary key.
  • Prime attribute: Any attribute taking part in a super key.

Weak Entity

An entity set is said to be weak if it is dependent upon another entity set. A weak entity cannot be uniquely identified only by its attributes. In other words, it does not have a super key.

For example, consider a company that allows employees to have travel allowance for their immediate family. So, here we have two entity sets: employee and family, which are related by the “Can claim for” relationship. However, the family entity does not have a super key — the existence of a family is entirely dependent on the concerned employee. So, it is meaningful only with reference to employee.

Entity Generalization and Specialization

Once we have identified the entity sets, we might find some similarities among them. For example, multiple person interacts with a banking system. Most of them are customers, while the rest are employees or other service providers. Here, customers and employees are persons, but with certain specializations.In other words, a person is the generalized form of customer and employee entity sets. ER model uses the “ISA” hierarchy to depict specialization (and thus, generalization).

Mapping Cardinalities

One of the main tasks of ER modeling is to associate different entity sets. Let us consider two entity sets E1 and E2 associated by a relationship set R. Based on the number of entities in E1 and E2 are associated with, we can have the following four type of mappings:

  • One to one: An entity in E1 is related to at most a single entity in E2, and vice versa
  • One to many: An entity in E1 can be related to zero or more entities in E2. Any entity in E2 can be related to at most a single entity in E1.
  • Many to one: Zero or more number of entities in E1 can be associated to a single entity in E2. However, an entity in E2 can be related to at most one entity in E1.
  • Many to many: Any number of entities in E1 can be related to any number of entities in E2, including zero, and vice versa.

ER Diagram

From a given problem statement, we identify the possible entity sets, their attributes, and relationships among different entity sets. Once we have these information, we represent them pictorially, which is called an entity-relationship (ER) diagram.

Graphical Notations for ER Diagram

Term Notation Remarks
Entity set Entity Name of the set is written inside the rectangle
Attribute Attribute Name of the attribute is written inside the ellipse
Entity with attributes Entity with attributes Roll is the primary key; denoted with an underline
Weak entity set Weak entity
Relationship set Relationship Name of the relationship is written inside the diamond
Related enity sets Entity relationship
Relationship cardinality Relationship cardinality A person can own zero or more cars but no two persons can own the same car
Relationship with weak entity set Weak entity relationship

Importance of ER modeling

Figure 1 shows the different steps involved in implementation of a (relational) database.

Database design stepsFigure 1: Steps to implement a RDBMS

Given a problem statement, the first step is to identify the entities, attributes, and relationships. We represent them using an ER diagram. Using this ER diagram, table structures are created along with required constraints. Finally, these tables are normalized in order to remove any redundancy and maintain data integrity. Thus, to have data stored efficiently, the ER diagram is to be drawn as much detailed and accurate as possible.

modeling uml use case diagrams and capturing use case scenarios

Modeling UML Use Case Diagrams and Capturing Use Case Scenarios

Use case diagrams

Use case diagrams belong to the category of behavioural diagram of UML diagrams. Use case diagrams aim to present a graphical overview of the functionality provided by the system. It consists of a set of actions (referred to as use cases) that the concerned system can perform, one or more actors, and dependencies among them.


An actor can be defined as [1] an object or set of objects external to the system, which interacts with the system to get some meaningful work done. Actors can be human, devices, or even other systems. For example, consider the case where a customerwithdraws cash from an ATM. Here, customer is a human actor. Actors can be classified into the following types [2], [i]:

  • Primary actor: They are principal users of the system who fulfill their goal by availing some service from the system. For example, a customer uses an ATM to withdraw cash when he/she needs it. Here, customer is the primary actor.
  • Supporting actor: They render some kind of service to the system. “Bank representatives”, who replenishes the stock of cash, is such an example. It may be noted that replenishing stock of cash is not the primary functionality offered by an ATM.

In a use case diagram, primary actors are usually drawn on the top left side of the diagram.

Use Case

A use case is simply [1] a functionality provided by a system.

Continuing with the example of the ATM, withdraw cash is a functionality that the ATM provides. Therefore, this is a use case. Other possible use cases includes, check balancechange PIN, and so on.

Use cases include both successful and unsuccessful scenarios of user interactions with the system. For example, authentication of a customer by the ATM would fail if he enters wrong PIN. In such case, an error message is displayed on the screen of the ATM.


Subject is simply [iii] the system under consideration. Use cases apply to a subject. For example, an ATM is a subject, having multiple use cases, and multiple actors interact with it.However, one should be careful of external systems interacting with the subject as actors.

Graphical Representation

An actor is represented by a stick figure and name of the actor is written below it. A use case is depicted by an ellipse and name of the use case is written inside it. The subject is shown by drawing a rectangle. Label for the system can be put inside it. Use cases are drawn inside the rectangle and actors are drawn outside the rectangle, as shown in Figure 1.

Use case diagramFigure 1: A use case diagram for a book store

Association between Actors and Use Cases

A use case is triggered by an actor. Actors and use cases are connected through binary associations indicating that the two communicates through message passing.

An actor must be associated with at least one use case. Similarly, a given use case must be associated with at least one actor. Association among the actors are usually not shown. However, one can depict the class hierarchy among actors.

Use Case Relationships

Three types of relationships exist among use cases:

  • Include relationship
  • Extend relationship
  • Use case generalization

Include Relationship

Include relationships are used to depict common behaviour that are shared by multiple use cases. This can be considered analogous to writing functions in a program in order to avoid repetition of writing the same code. Such a function can be called from different points within the program.


For example, consider an email application. A user can send a new mail, reply to an email he has received, or forward an email. However, in each of these three cases, the user must be logged in to perform those actions. Thus, we could have a login use case, which is included by compose mailreply, andforward email use cases. The relationship is shown in Figure 2.

Include relationshipFigure 2: Include relationship between use cases


Include relationship is depicted by a dashed arrow with a «include» stereotype from the including use case to the included use case.

Extend Relationship

Use case extensions are used used to depict any variation to an existing use case. They are used to the specify the changes required when any assumption made by an existing use case becomes false [iv, v].


Let us consider an online bookstore. The system allows an authenticated user to buy selected book(s). While the order is being placed, the system also allows to specify any special shipping instructions [vii], for example, call the customer before delivery. This Shipping Instructionsstep is optional and not a part of the main Place Order use case. Figure 3 depicts such relationship.

Extend relationshipFigure 3: Extend relationship between use cases.


Extend relationship is depicted by a dashed arrow with a «extend» stereotype from the extending use case to the extended use case.

Generalization Relationship

Generalization relationship are used to represent the inheritance between use cases. A derived use case specializes some functionality that it has already inherited from the base use case.


To illustrate this, consider a graphical application that allows users to draw polygons. We can have a use casedraw polygon. Now, rectangle is a particular instance of polygon having four sides at right angles to each other. So, the use case draw rectangleinherits the properties of the use casedraw polygon and overrides its drawing method. This is an example of generalization relationship. Similarly, a generalization relationship exists between draw rectangle and draw square use cases. The relationship has been illustrated in Figure 4.

Generalization relationshipFigure 4: Generalization relationship among use cases.


Generalization relationship is depicted by a solid arrow from the specialized (derived) use case to the more generalized (base) use case.

Identifying Actors

Given a problem statement, the actors can be identified by asking the following questions [2]:

  • Who gets most of the benefits from the system? (The answer would lead to the identification of the primary actor)
  • Who keeps the system working? (This will help to identify a list of potential users)
  • What other software/hardware does the system interact with?
  • Any interface (interaction) between the concerned system and any other system?

Identifying Use cases

Once the primary and secondary actors have been identified, we have to find out their goals — what are the functionality they can obtain from the system? Any use case name should start with a verb like, “Check balance”.

Guidelines for drawing Use Case diagrams

Following general guidelines could be kept in mind while trying to draw a use case diagram [1]:

  • Determine the system boundary
  • Ensure that individual actors have well-defined purpose
  • Use cases identified should let some meaningful work done by the actors
  • Associate the actors and use cases — there should no be any actor or use case floating without any connection
  • Use include relationship to encapsulate common behaviour among use cases, if any

Also look at [ix], for example, for further tips.

Estimation of project matrices

Estimation of Project Metrics

Project Estimation Techniques

A software project is not just about writing hundreds or thousands lines of source code to achieve a particular objective. The scope of a software project is comparatively quite largeand can take several years to complete depending upon its scale. However, the phrase “quite large” can only give some (possibly vague) qualitative information. As in any other science and engineering discipline, one, too, would be interested to measure how complex a project is. One of the major activities of the project planning phase, therefore, is to estimate various project parameters in order to take proper decisions. Some important project parameters that are estimated include:

  • Project size: What would be the size of the code written say, in number of lines, files, and modules?
  • Cost: How much would it cost to develop a software? A software may be just pieces of code, but one has to pay to the managers, developers, and other project personnel.
  • Duration: How long would it be before the software is delivered to the clients?
  • Effort: How much effort from the team members would be required to create the software?

In this experiment we will focus on two methods for estimating project metrics: COCOMO and Halstead’s method.


COCOMO (Constructive Cost Model) was proposed by Boehm. According to him, a software project can be categorized into three types: organic, semidetached, and embedded. The classification is done by considering the characteristics of the software as well as the development team and environment. These product classes typically correspond to application, utility, and system programs, respectively. Data processing programs can be considered as typical application programs. Compilers and linkers, on the other hand, are some examples of utility programs. Operating systems and real-time system programs are examples of system programs. One can easily apprehend that it would take much more time and effort to develop an OS in contrast to, say an attendance management system.

The concept of organic, semidetached, and embedded systems are described below.

  • Organic: A development project is said to be of organic type, if
    • The project deals with developing a well understood application
    • The development team is small
    • The team members have prior experience in working with similar types of projects
  • Semidetached: A development project can be categorized as semidetached type, if
    • The team consists of some experienced as well as inexperienced staff
    • Team members may have some experience on the type of system to be developed
  • Embedded: Embedded type of development project are those, which
    • Aims to develop a software strongly related to machine hardware
    • Team size is usually large
  • Boehm suggested that estimation of project parameters should be done through three stages: Basic COCOMO, Intermediate COCOMO, and Complete COCOMO.

    Basic COCOMO Model

    The basic COCOMO model helps to obtain a rough estimate of the project parameters. It estimates effort and time required for development in the following way: 
    Effort = a * (KDSI)b PM Tdev = 2.5 * (Effort)c Months where

    • KDSI is the estimated size of the software expressed in Kilo Delivered Source Instructions
    • a, b, c are constants determined by the category of software project
    • Effort denotes the total effort required for the software development, expressed in person months (PMs)
    • Tdev denotes the estimated time required to develop the software (expressed in months)

    The value of the constants a, b, and c are given below: 

    Software project a b c
    Organic 2.4 1.05 0.38
    Semi-detached 3.0 1.12 0.35
    Embedded 3.6 1.20 0.32

    Intermediate COCOMO Model

    The basic COCOMO model considers that effort and development time depends only on the size of the software. However, in real-life, there are many other project parameters that influence the development process. The intermediate COCOMO take those other factors into consideration by defining a set of 15 cost drivers (multipliers) as shown in the Table below [i]. Thus, any project that makes use of modern programming practices would have lower estimates in terms of effort and cost. Each of these 15 attributes can be rated on a six-point scale ranging from “very low” to “extra high” in their relative order of importance. Each attribute has an effort multiplier fixed as per the rating. The product of effort multipliers of all the 15 attributes gives theEffort Adjustment Factor (EAF)

    Cost drivers for Intermediate COCOMO (Source:
    Cost Drivers Ratings
    Very Low Low Nominal High Very High Extra High
    Product attributes
    Required software reliability 0.75 0.88 1.00 1.15 1.40
    Size of application database 0.94 1.00 1.08 1.16
    Complexity of the product 0.70 0.85 1.00 1.15 1.30 1.65
    Hardware attributes
    Run-time performance constraints 1.00 1.11 1.30 1.66
    Memory constraints 1.00 1.06 1.21 1.56
    Volatility of the virtual machine environment 0.87 1.00 1.15 1.30
    Required turnabout time 0.87 1.00 1.07 1.15
    Personnel attributes
    Analyst capability 1.46 1.19 1.00 0.86 0.71
    Applications experience 1.29 1.13 1.00 0.91 0.82
    Software engineer capability 1.42 1.17 1.00 0.86 0.70
    Virtual machine experience 1.21 1.10 1.00 0.90
    Programming language experience 1.14 1.07 1.00 0.95
    Project attributes
    Application of software engineering methods 1.24 1.10 1.00 0.91 0.82
    Use of software tools 1.24 1.10 1.00 0.91 0.83
    Required development schedule 1.23 1.08 1.00 1.04 1.10

    EAF is used to refine the estimates obtained by basic COCOMO as follows:Effort|corrected = Effort * EAFTdev|corrected = 2.5 * (Effort|correctedc

    Complete COCOMO Model

    Both the basic and intermediate COCOMO models consider a software to be a single homogeneous entity — an assumption, which is rarely true. In fact, many real life applications are made up of several smaller sub-systems. (One might not even develop all the sub-systems — just use the available services). The complete COCOMO model takes these factors into account to provide a far more accurate estimate of project metrics.

    To illustrate this, let us consider a popular distributed application: the ticket booking system of the Indian Railways. There are computerized ticket counters in most of the railway stations of our country. Tickets can be booked/canceled from any such counter. Reservations for future tickets and cancellation of reserved tickets can also be performed. At a high level, the ticket booking system has three main components:

    • Database
    • Graphical User Interface (GUI)
    • Networking facilities

    Among these, development of the GUI is considered as an organic project type; the database module can be considered as a semi-detached software. The networking module can be considered as an embedded software. To obtain a realistic cost, one should estimate the costs for each component separately, and then add them up.

    Advantages of COCOMO

    COCOMO is a simple model and should help one to understand the basic concepts of project metrics estimation.

    Drawbacks of COCOMO

    COCOMO uses KDSI, which is not a proper measure of a program’s size. Indeed, estimating the size of a software is a difficult task and any miscalculation could cause a large deviation in subsequent project estimates. Moreover, COCOMO was proposed in 1981 keeping the waterfall model of project life cycle in mind [2]. It fails to address other popular approaches like prototype, incremental, spiral, and agile models. Moreover, in present day, a software project may not necessarily consist of coding of every bit of functionality. Rather, existing software components are often used and glued together towards the development of a new software. COCOMO is not suitable in such cases. COCOMO II was proposed later in 2000 to address many of these issues.

    Halstead’s Complexity Metrics

    Halstead took a linguistic approach to determine the complexity of a program. According to him, a computer program consists of a collection of different operands and operators. The definition of operands and operators could, however,vary from one person to another and one programming language to other. Operands are usually the implementation variables or constants — something upon which an operation can be performed. Operators are those symbols that affects the value of operands. Halstead’s metrics are computed based on the operators and operands used in a computer program. Any given program has the following four parameters:

    • n1: Number of unique operators used in the program
    • n2: Number of unique operands used in the program
    • N1: Total number of operators used in the program
    • N2: Total number of operands used in the program

    Using the above parameters, one can compute the following metrics:

    • Program Length: N = N1 + N2
    • Program Vocabulary: n = n1 + n2
    • Volume: V = N * lg n
    • Difficulty: D = (n1 * N2) / (2 * n2)
    • Effort: E = D * V
    • Time to Implement: T = E / 18 (in seconds) [vi]

    The program volume V is the minimum number of bits needed to encode the program. It represents the size of the program while taking into account the programming language. 
    The difficulty metric indicates how difficult a program is to write or understand. 
    Here, effort denotes the “mental effort” required to develop the software or to recreate the same in another programming language [iv].

identifying the requirements from problem statements

Identifying the Requirements from Problem Statements


Sommerville defines “requirement” [1] as a specification of what should be implemented. Requirements specify how the target system should behave. It specifies what to do, but not how to do. Requirements engineering refers to the process of understanding what a customer expects from the system to be developed, and to document them in a standard and easily readable and understandable format. This documentation will serve as reference for the subsequent design, implementation and verification of the system.

It is necessary and important that before we start planning, design and implementation of the software system for our client, we are clear about its requirements. If we do not have a clear vision of what is to be developed and what all features are expected, wewould be faced with problems down the road leading customer dissatisfaction as well.

Characteristics of Requirements

Requirements gathered for any new system to be developed should exhibit the following three properties:

  • Unambiguity: There should not be any ambiguity about what a system to be developed should do. For example, consider that you are developing a web application for your client. The client requires that enough number of people should be able to access the application simultaneously. What is the “enough number of people”? That could mean 10 to you, but perhaps 100 to the client. This is an example of ambiguity.
  • Consistency: To illustrate this, consider the automation of a nuclear plant. Suppose that one of the clients say that if the radiation level inside the plant exceeds R1, all reactors should be shut down. However, another person side suggests that the threshold radiation level should be R2. Thus, there is an inconsistency between the two end users regarding what they consider as threshold level of radiation.
  • Completeness: A particular requirement for a system should specify what the system should do and also what it should not. For example, consider a software to be developed for ATM. If a customer enters an amount greater than the maximum permissible withdrawal amount, the ATM should display an error message and it should not dispense any cash.

Categorization of Requirements

Based on the target audience or subject matter, requirements can be classified into different types, as stated below:

  • User requirements: They are written in natural language so that customers and analysts can verify that their requirements have been correctly identified
  • System requirements: They are written involving technical terms and/or specifications, and are meant for the development or testing teams

Requirements can be classified into two groups based on what they describe:

  • Functional requirements (FRs): These describe the functionality of a system — how a system should react to a particular set of inputs and what should be the corresponding output.
  • Non-functional requirements (NFRs): They are not directly related what functionalities are expected from the system. However, NFRs can typically define how the system should behave under certain situations. For example, an NFR can say that the system should work with 128MB RAM. Under such conditions, an NFR can become more critical than an FR.

Non-functional requirements can be further classified into different types, such as:

  • Product requirements: For example, a specification that the web application should use only plain HTML and no frames
  • Performance requirements: For example, the system should remain available 24×7
  • Organizational requirements: The development process should comply to SEI CMM level 4

Functional Requirements

Identifying Functional Requirements

Given a problem statement, the functional requirements can be identified by focusing on the following points:

  • Identify the high level functional requirements from the conceptual understanding of the problem. For example, a Library Management System, apart from anything else, should be able to issue and return books.
  • Identify the cases where an end-user gets some meaningful work done by using the system. For example, in a digital library a user might use the “Search Book” functionality to obtain information about the books of his/her interest.
  • If we consider the system as a black box, there would be some inputs to it and some output in return. This black box defines the functionalities of the system. For example, to search for a book, user provides the title of the inteded book as an input and gets the book details and location as output.
  • Any high level requirement identified could have different sub-requirements. For example, the “Issue Book” module can behave differently for different categories of users or for a particular user who has issued the book thrice consecutively.

Preparing Software Requirements Specifications

Once all possible FRs and non-FRs have been identified, which are complete, consistent, and non-ambiguous, the Software Requirements Specification (SRS) is prepared. IEEE provides a template [iv], also available here, which can be used for this purpose. The SRS is prepared by the service provider and verified by its client. This document serves as a legal agreement between the client and the service provider. Once the concerned system has been developed and deployed, if a proposed feature was not found to be present in the system, the client can point this out from the SRS. Also, if after delivery, the client says a new feature is required, which was not mentioned in the SRS, the service provider can again point to the SRS. The scope of the current experiment, however, does not cover writing a SRS.