Lessons Learned

Below are important Lessons Learned. Many more Lessons Learned will be provided in the future.
  1. VxWorks tNetTask is not a Task
  2. 80386EX Random Divide By Zero Exception
  3. AOL Browser Accepts 'name' as 'id' for getElementById()
  4. Software Engineering Disasters
  5. Lost
  6. Cutting Corners
  7. Gun Safety LESSONS LEARNED: Rule #1
  8. Gun Safety LESSONS LEARNED: Rule #2
  9. Gun Safety LESSONS LEARNED: Rule #3

VxWorks tNetTask is not a Task

It was learned that the VxWorks tNetTask is not a Task. VxWorks tNetTask tasks cannot be reprioritized, rescheduled, or delayed as taskSpawn tasks can be. The proof is presented below:

Two (2) SMSC LAN91C111 devices are utilized on a single Printed Circuit Board (PCB) for two Ethernet ports. The software employs the VxWorks RTOS and the LAN91C111 software driver provided by SMSC. There is an issue in that when the Ethernet cable of one of the devices is reconnected (or sometimes when disconnected), the task that performs the lengthy Auto-Negotiation process for that device is executed and prevents tasks of the other device from executing.

The netJobAdd() function call below resides in the section of the lan91c111Int() function that processes the INT_MDINT interrupt:

netJobAdd ((FUNCPTR)AdapterReset, (int)pDrvCtrl,0,0,0,0);
Virtually every combination of the following methods have been employed in an effort to force the task of the Auto-Negotiate process (of the AdapterReset function) to be reprioritized and/or rescheduled and/or delayed to allow tasks of the other device to execute:

  1. Employ Round-Robin scheduling to allow tasks of the same priority to execute.
    This is done by calling the function kernelTimeSlice().
  2. Relocate task of Auto-Negotiate process to the end of the task ready queue.
    This is done by calling the function taskDelay(0) (zero time delay).
  3. Set task priority level of Auto-Negotiate process to be lower than other tasks.
    This is done by calling the function taskPrioritySet().
  4. Set spawned task priority level to be higher (and lower) than priority level of netJobAdd() tasks (e.g. task of Auto-Negotiate process).
    This done by setting the priority level in the function taskSpawn().

The following observations have been made while monitoring the LAN91C111 Interrupt pin and General Purpose Output Port pin of both devices on the oscilloscope before, while, and after executing the Auto-Negotiate process within the associated task:

  1. The waveforms on the oscilloscope indicate that there is no interrupt contention between the devices.
  2. Setting the General Purpose pin on both devices before each intLock() function is called and clearing the bit after each intUnlock() function is called indicates that there is no contention between the two LAN91C111 devices while locking and unlocking interrupts.
  3. Setting the General Purpose pin on both devices before each semTake() function is called and clearing the bit after each semGive() function is called indicates that there is no contention between the two LAN91C111 devices while taking and giving semaphores.
  4. Each SMSC LAN91C111 interrupt and each netJobAdd task is assigned a unique number which can be identified on the oscilloscope. At various points in the software, the LAN91C111 General Purpose pin is toggled the number of times equal to the number assigned to the current interrupt and current task. This identifies the most recent interrupt and task of both devices and identifies which of the two (interrupt or task) occurred most recently of both devices. This has demonstrated that the Auto-Negotiate process task is indeed preventing the tasks of the other device from executing. We know this because after the Auto-Negotiate process begins for the reconnected device, the waveforms on the oscilloscope indicate that the tasks associated with the interrupts of the other device are not being executed after the associated interrupt occurred.

It is worthy to note that a simple taskDelay(x) in the task which eventually performs the Auto-Negotiate process will prevent tasks of the other device from executing for the time that the task of the Auto-Negotiate process is delayed.

The oscilloscope photos that follow show three signals of two LAN91C111 devices (#1 and #2) of two Ethernet Channels (#1 and #2). The three signals are acquired when the Ethernet cable of Channel #2 is reconnected which causes the Auto-Negotiation process to run. A brief description of each signal follows:

TOP Interrupt pin (INTR 0) of device #1 (active high)
MIDDLE General Purpose Output Port pin (nCNTRL) of device #1 which is cleared/set in software before/after each sendto() and recvfrom() function call
BOTTOM General Purpose Output Port pin (nCNTRL) of device #2 which is cleared/set in software before/after the Auto-Negotiation process (two while loops in ProgramNationalPHY() function)

The photo below shows the signals that result when tNetTask calls AdapterReset(). Note that when the Auto-Negotiate process for device #2 is performed (when BOTTOM signal is low), the interrupts for device #1 (TOP signal) ARE NOT serviced and the sendto() and recvfrom() function calls for device #1 (MIDDLE signal) ARE NOT called:

The solution is to have the LAN91C111 ISR do a netJobAdd() of a function which then does a taskSpawn() of the AdapterReset() function. The photo below shows the signals that result when tNetTask calls a function which spawns a task for AdapterReset(). Note that when the Auto-Negotiate process for device #2 is performed (when BOTTOM signal is low), the interrupts for device #1 (TOP signal) ARE serviced and the sendto() and recvfrom() function calls for device #1 (MIDDLE signal) ARE called:

The required software changes to the module lan91c111End.c of the SMSC LAN91C111 software driver follow:

  1. In the section of code in the lan91c111Int() function that processes the INT_MDINT interrupt, change the netJobAdd() call:

    FROM: netJobAdd ((FUNCPTR)AdapterReset, (int)pDrvCtrl,0,0,0,0);
    TO: netJobAdd ((FUNCPTR)AdapterResetSpawn, (int)pDrvCtrl,0,0,0,0);

  2. ADD the function AdapterResetSpawn() to the file lan91C111End.c as follows:
    void AdapterResetSpawn(LAN91C111END_DEVICE *Adapter)
    {
    taskSpawn ("resetask", 250, VX_FP_TASK, 0x2000, (FUNCPTR)AdapterReset, (int)Adapter,0,0,0,0,0,0,0,0,0);
    }

80386EX Random Divide By Zero Exception

It was learned that the 80386EX Microprocessor can produce random Divide By Zero Exceptions. The 80386EX Slave Peripheral Interrupt Controller (PIC) absolutely positively must be initialized even if that internal Slave is not to be used.

One of the following erroneous conditions was randomly occurring after applying power to each of three separate and identical 80386EX boards:

  1. Divide By Zero Exception after 10 to 30 milliseconds of operation precisely when a valid PIC ISR is due to activate
  2. Immediate Termination
  3. Dual Port Ram (DPR) data bit 0, and sometimes bit 1 as well, read as a zero after written a one
It was initially believed that the issue that was causing the DPR bit(s) to be pulled low was also causing the Divide By Zero Exception by forcing the D5 bit of the Data Bus low while the valid PIC ISR vector number of 32 (20H) was put on the Data Bus, thus causing a vector number of 0 for the Divide By Zero Exception. A broken Emulator adapter was eventually discovered that is most likely the reason for the DPR bits being read as zero.

The solution to the problem is that the 80386EX Slave PIC must be initialized even if that internal Slave is not to be used. The Intel386 EX EMBEDDED MICROPROCESSOR USER'S MANUAL, Chapter 9 INTERRUPT CONTROL UNIT, describes the 82C259A Master and Slave Peripheral Interrupt Controllers (PIC) residing within the 80386EX microprocessor. This manual describes the Master PIC Initialization Command Word 3 (ICW3) register bit S2 as follows:

S20 = Internal slave not used
1 = Internal slave is cascaded from the master's IR2 signal

When the Internal Slave PIC is not to be used, one might expect that by clearing bit S2 in the Master PIC ICW3 register, the Slave will be disabled and thus there is no need to initialize the Slave PIC. However, the NOTE above the description of the Master PIC ICW3 register states the following:

NOTE Since the internal slave is cascaded from the master's IR2 signal, you must set the S2 bit.
Therefore, a more accurate description for the Master PIC ICW3 S2 bit value of zero would be:
0 = Internal slave is not cascaded from the master's IR2 signal
Since the 80386EX Master PIC IR2 signal is supposedly hardwired to the Slave PIC INT signal and no option exists to disconnect it, the accurate description for the Master PIC ICW3 S2 bit would be:
S2 Set this bit to guarantee proper device operation
It has been observed that if one incorrectly assumes that clearing the MASTER PIC ICW3 S2 bit will cause the Slave PIC to be ignored, Divide By 0 exceptions and other random interrupts will activate randomly during and instead of a valid Master PIC interrupt.

To guarantee proper operation of the 80386EX, it is critical that the Master PIC ICW3 S2 bit be set and the Slave PIC be fully initialized by writing to the Slave ICW1, ICW2, ICW3, and ICW4 registers and then disable all Slave PIC interrupts by setting all bits in the Slave PIC OCW1 register to a 1.

One may question why Intel did not give default values for the above registers, as they did for the PIC Port 3 Configuration Register (P3CFG) and the PIC Interrupt Configuration Register (INTCFG), to initialize and disable the 80386EX Master and Slave PICs.

The required assembly code to properly initialize and disable the 80386EX internal Master and Slave PICs follows:

;Initialize and Disable MASTER
MOV AL, 11H ;ICW1 ACCESSED, EDGE TRIGGERED FOR...
OUT I386EX_M_ICW1, AL ;MASTER ICW1
MOV AL, 20H ;BASE VECTOR OF 32 (20H) FOR...
OUT I386EX_M_ICW2, AL ;MASTER ICW2
MOV AL, 04H ;SLAVE ON MASTER IR2 ONLY FOR...
OUT I386EX_M_ICW3, AL ;MASTER ICW3
MOV AL, 01H ;FULLY NESTED, DISABLE AUTOMATIC EOI, FOR...
OUT I386EX_M_ICW4, AL ;MASTER ICW4
MOV AL, 0FFH ;DISABLE ALL INTERRUPT SOURCES FOR...
OUT I386EX_M_OCW1, AL ;MASTER OCW1
;Initialize and Disable SLAVE
MOV AL, 11H ;ICW1 ACCESSED, EDGE TRIGGERED FOR...
OUT I386EX_S_ICW1, AL ;SLAVE ICW1
MOV AL, 28H ;BASE VECTOR OF 40 (28H) FOR...
OUT I386EX_S_ICW2, AL ;SLAVE ICW2
MOV AL, 02H ;SET S2 TO GUARANTEE DEVICE OPERATION FOR...
OUT I386EX_S_ICW3, AL ;SLAVE ICW3
MOV AL, 01H ;FULLY NESTED, DISABLE AUTOMATIC EOI, FOR...
OUT I386EX_S_ICW4, AL ;SLAVE ICW4
MOV AL, 0FFH ;DISABLE ALL INTERRUPT SOURCES FOR...
OUT I386EX_S_OCW1, AL ;SLAVE OCW1

One may question why Intel did not give default values for the above registers, as they did for the PIC Port 3 Configuration Register (P3CFG) and the PIC Interrupt Configuration Register (INTCFG), to initialize and disable the 80386EX Master and Slave PICs.

AOL Browser Accepts 'name' as 'id' for getElementById()

It was learned that when the America Online (AOL) browser, which utilizes Microsoft Internet Explorer, is processing a HyperText Markup Language (HTML) file that resides on the Internet, the browser will accept a 'name' specified in the HTML code as: name="..." and then successfully execute the JavaScript function document.getElementById() using that 'name' instead of an 'id' as required by document.getElementById() and specified in HTML code as: id="...".
However, when the AOL browser is processing a file that resides on the computer hard drive, the browser will properly reject a 'name' for document.getElementById() which requires an 'id'.

Note that when Internet Explorer is executed directly (i.e. not by the AOL browser), a 'name' will be rejected for document.getElementById(). This is true when Internet Explorer is processing a file that resides on the Internet or on the computer hard drive.
Other browsers, such as Mozilla Firefox, have been shown to properly reject a 'name' for document.getElementById().

The HTML code to zoom in and out of the image below improperly employs the HTML code:
<img name="imgname" src="photos/1982GMC.jpg" height=400>
and thus should be rejected by all browsers but will be accepted by the AOL browser:

The HTML code to zoom in and out of the image below properly employs the HTML code:
<img id="imgid" src="photos/1982GMC.jpg" height=400>
and thus should be accepted by all browsers:

Proof of where the JavaScript code fails is now demonstrated.

In the JavaScript ZoomTest() function below, an alert() function is called after each line of JavaScript code that performs the zoom process. This is done to report each line of zoom code that completes execution. If a line of code should fail to execute due to an error, the code that follows the error will not be executed and the following message will not be reported:
Completed function: ZoomTest(id,state)

var origw=new Array()
var origh=new Array()

function ZoomTest(id,state)
{
alert("Entered function: ZoomTest(id,state)");
var factor;
alert("Executed: var factor;");
var elem=document.getElementById(id);
alert("Executed: var elem=document.getElementById(id);");
var ids=eval("document.images."+id);
alert("Executed: var ids=eval(\"document.images.\"+id);");
var name=id.toString();
alert("Executed: var name=id.toString();");

if (origw[name]==undefined) origw[name]=elem.width;
alert("Executed: if (origw[name]==undefined) origw[name]=elem.width;");
if (origh[name]==undefined) origh[name]=elem.height;
alert("Executed: if (origh[name]==undefined) origh[name]=elem.height;");
if (state=="norm")
{
alert("Executed: if (state==\"norm\")");
ids.style.width=origw[name];
alert("Executed: ids.style.width=origw[name];");
ids.style.height=origh[name];
alert("Executed: ids.style.width=origw[name];");
}
else if (elem.width>10 && elem.height>10)
{
alert("Executed: else if (elem.width>10 && elem.height>10)");
factor=(state=="in")?1.05:0.95;
alert("Executed: factor=(state==\"in\")?1.05:0.95;");
ids.style.width=elem.width*factor;
alert("Executed: ids.style.width=elem.width*factor;");
ids.style.height=elem.height*factor;
alert("Executed: ids.style.height=elem.height*factor;");
}
alert("Completed function: ZoomTest(id,state)");
}

The HTML code to zoom in and out of the image below improperly employs name="..." as:
<img name="imgtst" src="photos/1982GMC.jpg" height=400>
instead of id="..." as required for the document.getElementById() function which is called in the JavaScript ZoomTest() function (above) which is called in the HTML code (below) whenever any of the three buttons are clicked.

<form>
<input type="button" value="ZOOM IN" onClick="javascript:ZoomTest('imgtst','in')";>
<input type="button" value="NORMAL" onClick="javascript:ZoomTest('imgtst','norm')";>
<input type="button" value="ZOOM OUT" onClick="javascript:ZoomTest('imgtst','out')";>
</form>
<img name="imgtst" src="photos/1982GMC.jpg" height=400>

When executed by the AOL browser, the last message reported when any button is clicked should be:
Completed function: ZoomTest(id,state)
However, for any other browser, the last message reported should be:
Executed: var name=id.toString();

Software Engineering Disasters

A Software Engineering Disaster is exposed at 2 minutes and 17 seconds into the video below.
In this Software Engineering Disaster, 28 of our brave soldiers were killed and about 100 were maimed for life or wounded because "a long time" was not defined while attempting to compensate for a known "software flaw" in which the internal clock error of the system would exceed one tenth of a second (i.e. 100 milliseconds) after running for more than 8 hours.
To compensate for the "software flaw" discovered, the system needed to be shut down and restarted after running for "a long time" (i.e. "8 hours") but others interpreted "a long time" to be greater than "100 hours" which resulted in a time error of about 1/3 of a second (i.e. 333.333333->∞(infinite number of 3s follow) milliseconds).
Would "a long time" have been clearly defined and would such "software flaws" have been thoroughly tested for if the son(s) or daughter(s) life or limb(s) of the persons responsible had depended on it?


See from Youtube if not available above

Lost

The duplication of the identifier R (for ROZO) was there for everyone to see on the FMS:

But the pilot "assumed" the one he was to fly to was the closest and thus at the top of the list.
Such known issues with duplicates need to be emphasized so they can be recognized by incompetent and careless pilots in a rush.


See from Youtube if not available above

Cutting Corners

The pilots gave 100% effort right up to the very end, even flying the plane upside down, but it was hopeless.
Why were those who falsified records not prosecuted and imprisoned for criminal negligence, falsifying records, and manslaughter?
But the man who detected and properly reported the faulty part that caused the plane to crash suffers.


See from Youtube if not available above

Web site questions/comments? Contact: TDB Consulting

url and counting visits
<>