<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"><channel><title>SiCortex Knowledge Base</title><link>http://sicortex.com</link><description></description><language>en-US</language><item><pubDate>Wed, 17 Sep 2008 18:26:05 GMT</pubDate><title>SC648 Update</title><link>http://sicortex.com/support/knowledge_base/hardware/sc648_update</link><description>
&lt;p&gt;
We are pleased to inform you that we have replaced the SC648 with an equivalently priced SC1458, packaged in the same cabinet. As with the SC648, the SC1458 can now be configured with 2, 3 or 4 modules – and offers the ability to easily accommodate 9 modules.
&lt;/p&gt;

&lt;p&gt;
Note that the SiCortex software will continue to support SC648 systems.
&lt;/p&gt;
</description></item><item><pubDate>Wed, 17 Sep 2008 17:34:02 GMT</pubDate><title>Software Release Notes Version 3.0 Field Test</title><link>http://sicortex.com/support/knowledge_base/v3_0_field_test_release_note_updates/v3_0_field_test_release_notes</link><description>
&lt;p&gt;

This     &lt;a href="/products/user_documentation/software_release_notes_3_0" target="_self"&gt;document&lt;/a&gt; describes, in this order:&lt;br /&gt;
 • New Features and Other Changes&lt;br /&gt;
 • Active Issues and their temporary workarounds&lt;br /&gt;
 • Fixed Issues&lt;br /&gt; • Tips and Tricks
&lt;/p&gt;
</description></item><item><pubDate>Tue, 16 Sep 2008 13:23:10 GMT</pubDate><title>Fix for ECC bug</title><link>http://sicortex.com/support/knowledge_base/v3_0_field_test_release_note_updates/fix_for_ecc_bug</link><description>
&lt;p&gt;
During our Field Test, we have discovered an issue with the kernel code concerning Error Detection And Correction which can cause nodes to fail by reporting incorrect uncorrectable ECC errors. This is more often found on larger systems (due to the greater number of nodes), but can also affect the SC072-PDS.
&lt;/p&gt;

&lt;p&gt;

To fix this error, download this     &lt;a href="http://sicortex.com/_downloads/5.0.0.88.60314-r5-all.tgz" target="_self"&gt;file&lt;/a&gt;, and unpack it and install it according to the following directions:&lt;br /&gt;
&lt;b&gt;# cd /opt/sicortex/kernel/linux&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;# tar xzf 5.0.0.88.60314-r5-all.tgz&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;# rm default&lt;/b&gt;&lt;br /&gt;&lt;b&gt;# ln -s 5.0.0.88.60314-r5 default&lt;/b&gt;
&lt;/p&gt;
</description></item><item><pubDate>Thu, 11 Sep 2008 14:47:03 GMT</pubDate><title>Installing the V3.0 Field Test Software</title><link>http://sicortex.com/support/knowledge_base/v3_0_field_test_release_note_updates/sc072_v3_0_ft_software_installation_procedure</link><description>
&lt;p&gt;
For instructions click     &lt;a href="/media/files/sc072_v3_0_ft_software_installation_instructions" target="_self"&gt;SC072 V3.0-FT Software Installation Instructions.&lt;/a&gt;
&lt;/p&gt;
</description></item><item><pubDate>Fri, 05 Sep 2008 19:08:44 GMT</pubDate><title>Node NTP Synchronization Failure</title><link>http://sicortex.com/support/knowledge_base/v3_0_field_test_release_note_updates/ntp_node_synchronization_failure</link><description>
&lt;p&gt;
Applies to the SC5832, SC1458, and SC648 only.
&lt;/p&gt;

&lt;p&gt;
The nodes synchronize with the SSP, which synchronizes with an upstream NTP time server. If the SSP cannot synchronize to an official upstream NTP server, the nodes need to synchronize directly with the SSP to maintain time consistency. For that to happen, you need to edit the &lt;b&gt;/etc/ntp.conf&lt;/b&gt; file and add the line:
&lt;/p&gt;
&lt;pre&gt; server 127.127.1.0 &lt;/pre&gt;
&lt;p&gt;
immediately before the line:
&lt;/p&gt;
&lt;pre&gt; fudge 127.127.1.0 stratum 10&lt;/pre&gt;
&lt;p&gt;
 
&lt;/p&gt;

&lt;p&gt;
 
&lt;/p&gt;

&lt;p&gt;
 
&lt;/p&gt;
</description></item><item><pubDate>Fri, 05 Sep 2008 16:42:13 GMT</pubDate><title>Linking netcdf from Fortran</title><link>http://sicortex.com/support/knowledge_base/v3_0_field_test_release_note_updates/netcdf_linking_from_fortran</link><description>
&lt;p&gt;
The C and Fortran 77 interfaces to netcdf work, but the Fortran 90 interface does not. Attempts to link applications with netcdf using Fortran 90 will fail with missing symbols reported.
&lt;/p&gt;
</description></item><item><pubDate>Fri, 05 Sep 2008 13:51:48 GMT</pubDate><title>Avoiding a Lustre Race Condition</title><link>http://sicortex.com/support/knowledge_base/v3_0_field_test_release_note_updates/avoiding_a_lustre_race_condition</link><description>
&lt;p&gt;
A race condition can occur when the Lustre file system comes under heavy load. To avoid this problem, set the &lt;b&gt;sys.timeout&lt;/b&gt; value for the file system to &lt;b&gt;300&lt;/b&gt;. You can do this in one of three ways. The following examples assume that the file system is named &lt;b&gt;fubar&lt;/b&gt;, and the MDT device is &lt;b&gt;/dev/sda&lt;/b&gt;:
&lt;/p&gt;

&lt;p&gt;
During creation of the MDT
&lt;/p&gt;
&lt;div class="FeaturedInfoTop"&gt;
&lt;/div&gt; &lt;!-- .FeaturedInfoTop --&gt;
&lt;div class="FeaturedInfo"&gt;
&lt;div class="Text"&gt;
&lt;pre&gt; mkfs.lustre - -mgs - -mdt - -param sys.timeout=300 .../dev/sda&lt;/pre&gt;
&lt;/div&gt; &lt;!-- .Text --&gt;
&lt;/div&gt; &lt;!-- .FeaturedInfo --&gt;
&lt;div class="FeaturedInfoBottom"&gt;
&lt;/div&gt; &lt;!-- .FeaturedInfoBottom --&gt;
&lt;p&gt;
 
&lt;/p&gt;

&lt;p&gt;
On an existing, but unmounted, MDT
&lt;/p&gt;

&lt;p&gt;
 
&lt;/p&gt;
&lt;div class="FeaturedInfoTop"&gt;
&lt;/div&gt; &lt;!-- .FeaturedInfoTop --&gt;
&lt;div class="FeaturedInfo"&gt;
&lt;div class="Text"&gt;
&lt;pre&gt;tunefs.lustre - -param sys.timeout=300 /dev/sda&lt;/pre&gt;
&lt;/div&gt; &lt;!-- .Text --&gt;
&lt;/div&gt; &lt;!-- .FeaturedInfo --&gt;
&lt;div class="FeaturedInfoBottom"&gt;
&lt;/div&gt; &lt;!-- .FeaturedInfoBottom --&gt;
&lt;p&gt;
 
&lt;/p&gt;

&lt;p&gt;
On a running system, or via the mount script after the MDT has been mounted
&lt;/p&gt;
&lt;div class="FeaturedInfoTop"&gt;
&lt;/div&gt; &lt;!-- .FeaturedInfoTop --&gt;
&lt;div class="FeaturedInfo"&gt;
&lt;div class="Text"&gt;
&lt;pre&gt; lctl conf_param fubar.sys.timeout=300&lt;/pre&gt;
&lt;/div&gt; &lt;!-- .Text --&gt;
&lt;/div&gt; &lt;!-- .FeaturedInfo --&gt;
&lt;div class="FeaturedInfoBottom"&gt;
&lt;/div&gt; &lt;!-- .FeaturedInfoBottom --&gt;</description></item><item><pubDate>Thu, 04 Sep 2008 20:12:06 GMT</pubDate><title>Linking the BLACS library</title><link>http://sicortex.com/support/knowledge_base/v3_0_field_test_release_note_updates/linking_the_blacs_library</link><description>
&lt;p&gt;
Three libraries - libblacs.a, libblacsCinit.a, and libblacsF77init.a - comprise the BLACS library. When you link the BLACS init routines, include either &lt;b&gt;-libblacsCinit &lt;/b&gt;or &lt;b&gt;-libblacsF77init&lt;/b&gt;, but not both, on the link line. Linking a Fortran program to the C entry point produces unexpected behavior.
&lt;/p&gt;
</description></item><item><pubDate>Thu, 04 Sep 2008 19:51:01 GMT</pubDate><title>MPI Sync Cycles %</title><link>http://sicortex.com/support/knowledge_base/v3_0_field_test_release_note_updates/papiex_derived_metric_mpi_sync_cycles</link><description>
&lt;p&gt;
The &lt;b&gt;MPI Sync Cycles % &lt;/b&gt;derived metric in the papiex output files, &lt;b&gt;job_summary.txt&lt;/b&gt; and &lt;b&gt;task_summary.txt&lt;/b&gt;, always report &lt;b&gt;0&lt;/b&gt;. However, the raw values in the individual thread and task files are correct, so you can manually calculate the summary values using the raw &lt;b&gt;MPI Sync Cycles %&lt;/b&gt; value from each of the individual thread and task files.
&lt;/p&gt;
</description></item><item><pubDate>Thu, 04 Sep 2008 19:33:04 GMT</pubDate><title>Peephole Optimization</title><link>http://sicortex.com/support/knowledge_base/v3_0_field_test_release_note_updates/pathscale_compiler_peephole_optimizations</link><description>
&lt;p&gt;
Some of the PathScale compiler's peephole optimizations can generate incorrect code. If you encounter this problem, disable peephole optimization by including &lt;b&gt;-CG:ebo_level=0&lt;/b&gt; on the compile line.
&lt;/p&gt;
</description></item><item><pubDate>Mon, 04 Aug 2008 14:41:37 GMT</pubDate><title>Limiting access to cluster compute nodes (2.2b only)</title><link>http://sicortex.com/support/knowledge_base/sc072_pds/limiting_access_to_compute_nodes</link><description>&lt;a name="eztoc6590_0_0_1" id="eztoc6590_0_0_1"&gt;&lt;/a&gt;&lt;h3&gt;RECONFIGURE VMWARE TO ADD NAT&lt;/h3&gt;
&lt;ol&gt;

&lt;li&gt;Shutdown the VMWare SSP.&lt;/li&gt;

&lt;li&gt;
On Red Hat as root, run vmware-config.pl and pres return until you see the question:&lt;br /&gt;
"Would you like to skip networking setup and keep your old settings as they are?&lt;br /&gt;(yes/no)". Enter &lt;b&gt;no&lt;/b&gt;.&lt;/li&gt;

&lt;li&gt;In response to the question: "Do you want networking for your virtual machines? (yes/no/help), enter &lt;b&gt;yes&lt;/b&gt;.&lt;/li&gt;

&lt;li&gt;
In response to the question: "Would you prefer to modify your existing networking configuration using the &lt;br /&gt;wizard or the editor? (wizard/editor/help)", enter &lt;b&gt;wizard&lt;/b&gt;.&lt;/li&gt;

&lt;li&gt;The system displays the current network configuration listing the bridged interfaces and the question, "Do you want to be able to use NAT networking in your virtual machines? (yes/no)." Enter &lt;b&gt;yes&lt;/b&gt;.&lt;/li&gt;

&lt;li&gt;The system displays something like, "The NAT network is currently configured to use the private subnet 192.168.185.0/255.255.255.0. Do you want to keep these settings?" Enter yes and note the network number and netmask.&lt;/li&gt;

&lt;li&gt;In response to the question: "Do you wish to configure another NAT network?" enter &lt;b&gt;no&lt;/b&gt;. The configure script builds the appropriate modules and restarts the services.&lt;/li&gt;

&lt;/ol&gt;
&lt;a name="eztoc6590_0_0_1" id="eztoc6590_0_0_1"&gt;&lt;/a&gt;&lt;h3&gt;Reconfigure Red Hat&lt;/h3&gt;
&lt;ol&gt;

&lt;li&gt;
Edit /etc/hosts and add a line like: NET.2 ssp&lt;br /&gt;Where NET is the subnet address from step 6 above and ".2" is the host number&lt;/li&gt;

&lt;li&gt;
Edit /etc/sysconfig/iptables and add the following three lines to the end of the file:&lt;br /&gt;
*nat&lt;br /&gt;
--append POSTROUTING --out-interface eth1 -j MASQUERADE &lt;br /&gt;COMMIT&lt;/li&gt;

&lt;li&gt;Run "/etc/init.d/iptables restart" to activate the new settings.&lt;/li&gt;

&lt;/ol&gt;
&lt;a name="eztoc6590_0_0_1" id="eztoc6590_0_0_1"&gt;&lt;/a&gt;&lt;h3&gt;Reconfigure the VMWare SSP&lt;/h3&gt;
&lt;ol&gt;

&lt;li&gt;Start the VMWare SSP&lt;/li&gt;

&lt;li&gt;Correct the virtual network for Ethernet 2 by right clicking the button on the VMWare toolbar and selecting &lt;b&gt;NAT&lt;/b&gt; from the drop-down menu.&lt;/li&gt;

&lt;li&gt;Login to the VMWare SSP.&lt;/li&gt;

&lt;li&gt;Edit /etc/conf.d/net and comment out the line: config_site=("dhcp").&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;
Add the following lines (where NET refers to the subnet address from step 6 above and DNSSERVERS refers to the spece-separated IP addresses from the native Red Hat /etc/resolve.conf file):
&lt;/p&gt;

&lt;p&gt;

config_site="NET.2 netmask 255.255.255.0"&lt;br /&gt;
gateway="NET.1"&lt;br /&gt;
routes_site=("default via NET.1")&lt;br /&gt;
dns_servers="DNSSERVERS"&lt;br /&gt;dns_domain_site="your-domain"
&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;Optional. Edit /opt/sicortex/dnsmasq-hosts.sca to change the IP address for ssp-ws to NET.1 (where NET is the subnet address from step 6 above and ".1" is the host number).&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;
Now, when you reboot the VMWare SSP, you will notice several failures which are all safe to ignore:
&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;Didn't start Apache2 (because there is no fully qualified domain name)&lt;/li&gt;

&lt;li&gt;Failed to set clock (?)&lt;/li&gt;

&lt;li&gt;No automount maps (safe to ignore)&lt;/li&gt;

&lt;/ul&gt;
</description></item><item><pubDate>Wed, 11 Jun 2008 19:32:29 GMT</pubDate><title>SLURM compute partitions should contain nodes exclusively used for computations</title><link>http://sicortex.com/support/knowledge_base/software/marginal_mgtnet_links_can_cause_boot_failure/slurm_compute_partitions_should_contain_nodes_exclusively_used_for_computations</link><description>
&lt;p&gt;
The System ships with several sample partitions that are intended as examples only. The supplied partitions may not match your site’s I/O configuration and system administrative needs.
&lt;/p&gt;

&lt;p&gt;
For example, on the SC5832, the sample partition scx-comp defines a compute partition that excludes the I/O nodes m[0,2,4,6,32,34]n6.
&lt;/p&gt;

&lt;p&gt;
You can define your own partitions that match your site’s I/O and system administrative needs. For a compute partition, be sure to exclude all nodes used for I/O or system administrative tasks, such as head nodes, gateway nodes, file system servers, and so on.
&lt;/p&gt;

&lt;p&gt;
For details on creating partitions, see &lt;i&gt;Creating and Using Partitions&lt;/i&gt; in Chapter 5 of &lt;i&gt;The SiCortex® System Administration Guide&lt;/i&gt;.
&lt;/p&gt;
</description></item><item><pubDate>Wed, 11 Jun 2008 19:25:26 GMT</pubDate><title>Marginal MGTnet links can cause boot failure</title><link>http://sicortex.com/support/knowledge_base/software/marginal_mgtnet_links_can_cause_boot_failure</link><description>
&lt;p&gt;
Typically, SiCortex systems are configured with one (SC648) or four (SC5832) gigabyte Ethernet ports that connect the SSP to management nodes in the system (m[0,2,4,6]n6 on the SC5832, and m0n6 on the SC648).
&lt;/p&gt;

&lt;p&gt;

The SSP uses these links to serve the root file system to the nodes, either via NFS or by copying the file system image to the management nodes, which then serve the image to the other nodes via NBD.&lt;br /&gt;When any of these Ethernet links are disconnected or otherwise flaky, the SSP will fail to deliver the file system image properly. When this happens, the other nodes distribute themselves across the available root file system servers.
&lt;/p&gt;

&lt;p&gt;
For the SC5832, this may result in fewer root file system servers than is desirable. For the SC648, this may prevent the system from booting.
&lt;/p&gt;

&lt;p&gt;
We recommend that system administrators do two things:
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;On the SSP — &lt;/b&gt;Make sure the GigE Ethernet cables are snugly connected to port 2 on the processor module(s) and to the appropriate port(s) on the SSP, then confirm that the links are working.
&lt;/p&gt;

&lt;p&gt;
In ssp:/sbin/ifconfig, these links are named mgt[0,1,2,3]. You can use the Linux ethtool command to confirm that these links are up.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;On the nodes — &lt;/b&gt;Check the console log files for the gateway nodes (typically sc1-m0n6 on the SC648, and scx-m[0,2,4,6]n6 on the SC5832). The log should show the interface is being brought up in the usual Linux way.
&lt;/p&gt;
</description></item><item><pubDate>Tue, 03 Jun 2008 15:41:25 GMT</pubDate><title>Compiler Overview</title><link>http://sicortex.com/support/knowledge_base/software/compiler_overview</link><description>&lt;a name="eztoc4986_0_0_1" id="eztoc4986_0_0_1"&gt;&lt;/a&gt;&lt;h3&gt;Compilers:&lt;/h3&gt;
&lt;p&gt;

C/C++: GNU (gcc/g++) and Pathscale (pathcc/pathCC)&lt;br /&gt;Fortran - pathf95.
&lt;/p&gt;

&lt;p&gt;
The compilers build code by default for 64-bit MIPS 'n64'. The compilers can built for other ABIs, including a 32-bit 'n32', which can be selected using 'mabi' option to the compiler.
&lt;/p&gt;

&lt;p&gt;
The standard options for gcc should include -mips64 -march=5kf. For both GNU and Pathscale compilers there are extensive tuning and optimization flags, with -O3 being a good starting point. For debugging, -g can be used, with a negligible performance penalty.
&lt;/p&gt;

&lt;p&gt;
Standard compiler drivers such as mpicc, mpicxx, mpif77 and mpif90 exist, which automatically link with the MPI library.
&lt;/p&gt;
&lt;a name="eztoc4986_0_0_1" id="eztoc4986_0_0_1"&gt;&lt;/a&gt;&lt;h3&gt;Math libraries&lt;/h3&gt;
&lt;p&gt;
The system ships with numerous standard math libraries, such as BLAS, FFT, etc. To use an optimized math library, which may use numerical approximations, link with 'libscm' /before/ linking with 'libm'. For dynamic executables already linked with libm, where you would like to use the faster libscm, do the following:
&lt;/p&gt;

&lt;p&gt;

$ export LD_PRELOAD=libscm.so # for csh, use setenv&lt;br /&gt;
$ &lt;my serial application&gt;&lt;br /&gt;
$ srun .... &lt;my parallel application&gt;&lt;br /&gt;$ unset LD_PRELOAD # for csh, use unsetenv
&lt;/p&gt;
&lt;a name="eztoc4986_0_0_1" id="eztoc4986_0_0_1"&gt;&lt;/a&gt;&lt;h3&gt;Performance Tools&lt;/h3&gt;
&lt;p&gt;
The system ships with an excellent performance tools' suite. Most tools do not need recompilation of relinking. Here is an incomplete list. For details, refer to the Programming Guide or man pages.
&lt;/p&gt;

&lt;p&gt;
papiex: Provides hardware performance metrics, such as cache misses, TLB misses, etc, for serial, threaded and MPI programs. This is the tool of choice to start a tuning exercise. It also provides time spent in I/O and MPI.
&lt;/p&gt;

&lt;p&gt;
mpipex: Provides time spent in various MPI calls. Very low overhead, based on the Open Source mpiP library for LLNL.
&lt;/p&gt;

&lt;p&gt;
ioex: Gives details on I/O calls and file access patterns exhibited by the application.
&lt;/p&gt;

&lt;p&gt;
gptlex: Based on the open source GPTL library, provides a call-tree. Can cause significant overhead.
&lt;/p&gt;

&lt;p&gt;
TAU: General purpose tuning tool, from University of Oregon. Requires re-compilation.
&lt;/p&gt;

&lt;p&gt;
VAMPIR: MPI tuning tool with a nice GUI.
&lt;/p&gt;

&lt;p&gt;
hpcex/hpcproftt/hpcviewer: Statistical line profiler that correlates user code (function/line of source code), against events such as cache misses. Derived from HPCToolkit from Rice University.
&lt;/p&gt;

&lt;p&gt;
pfmon: Low-level tool for measuring performance metrics. Used system-wide performance measurements as well, such as for measuring OS noise.
&lt;/p&gt;

&lt;p&gt;
DUMA: Memory debugger
&lt;/p&gt;
&lt;a name="eztoc4986_0_0_1" id="eztoc4986_0_0_1"&gt;&lt;/a&gt;&lt;h3&gt;Debuggers&lt;/h3&gt;
&lt;p&gt;
gdb and Totalview.
&lt;/p&gt;
</description></item><item><pubDate>Mon, 19 May 2008 13:54:20 GMT</pubDate><title>Handling oom/malloc Failures</title><link>http://sicortex.com/support/knowledge_base/software/handling_oom_malloc_failures</link><description>
&lt;p&gt;
Because we run individual nodes without swap space, memory demands must be met locally. To do this set the Linux tuning parameters, overcommit_memory and overcommit_ratio, as follows. (For parameter details, see the man page.)
&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;/proc/sys/vm/overcommit_memory = 2 &lt;/li&gt;

&lt;li&gt;/proc/sys/vm/overcommit_ratio = 90&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;
which on our systems, prevents committing over 90% of physical RAM. Malloc will fail if your application exceeds this limit, even if it does not touch all of the memory malloced. 
&lt;/p&gt;
&lt;a name="eztoc4765_0_1" id="eztoc4765_0_1"&gt;&lt;/a&gt;&lt;h2&gt;OOM Killer Basics&lt;/h2&gt;
&lt;p&gt;
The oom killer tries to preserve the system by killing off applications, but it doesn’t always get the right one, and it may leave a node unusable. A node that becomes unusable does not affect the operation of the other nodes in the system because the unusable node’s fabric switch and links continue to operate normally.
&lt;/p&gt;

&lt;p&gt;
Typically, when an oom occurs, the console log of an affected node contains messages, such as:
&lt;/p&gt;

&lt;p&gt;
oom-killer: gfp_mask=0xd0, order=0
&lt;/p&gt;

&lt;p&gt;
When a node becomes unresponsive, check the tail of the Linux console log file for messages. The console log file is located on the SSP in
&lt;/p&gt;

&lt;p&gt;
/var/log/&lt;partition&gt;/&lt;partition&gt;-&lt;module&gt;n&lt;node&gt;.console.
&lt;/p&gt;
&lt;a name="eztoc4765_0_1" id="eztoc4765_0_1"&gt;&lt;/a&gt;&lt;h2&gt;Overcommit Parameters&lt;/h2&gt;
&lt;p&gt;
The overcommit_memory and overcommit_ratio parameters specify if and how to over commit physical memory.
&lt;/p&gt;

&lt;p&gt;
• overcommit_memory = &lt;0|1|2&gt;
&lt;/p&gt;

&lt;p&gt;

0 — Root allowed to heuristically over allocate memory slightly, but any obvious over commitment is refused.&lt;br /&gt;
1 — Always allow applications to over commit physical memory. Useful for some scientific applications, which allocate large amounts of memory, but don't actually touch all of the allocated pages.&lt;br /&gt;2 — Never allow over commitment of memory. Refuse any request greater than overcommit_ratio = ## % of physical RAM. In these cases, malloc will fail.
&lt;/p&gt;

&lt;p&gt;
• overcommit_ratio = &lt;##&gt;
&lt;/p&gt;

&lt;p&gt;
The percentage of physical memory the application is allowed to commit when overcommit_memory = 2.
&lt;/p&gt;
&lt;a name="eztoc4765_0_1" id="eztoc4765_0_1"&gt;&lt;/a&gt;&lt;h2&gt;Setting the Overcommit Parameters&lt;/h2&gt;
&lt;p&gt;
You can change the setting of the overcommit parameters two ways. 
&lt;/p&gt;
&lt;a name="eztoc4765_0_1_1" id="eztoc4765_0_1_1"&gt;&lt;/a&gt;&lt;h3&gt;System-wide change — persistent through subsequent reboots&lt;/h3&gt;
&lt;p&gt;
Edit the vm.overcommit_memory = and vm.overcommit_ratio = parameters in the /opt/sicortex/rootfs/default/etc/sysctl.conf file, then reboot the System. 
&lt;/p&gt;
&lt;a name="eztoc4765_0_1_2" id="eztoc4765_0_1_2"&gt;&lt;/a&gt;&lt;h3&gt;System-wide change — in effect until next reboot&lt;/h3&gt;
&lt;p&gt;

As root, reset the overcommit parameters at runtime, for example:&lt;br /&gt;
srun -p &lt;partition&gt; -N &lt;all&gt; bash -c "echo &lt;0|1|2&gt; &gt; /proc/sys/vm/overcommit_memory"&lt;br /&gt;srun -p &lt;partition&gt; -N &lt;all&gt; bash -c "echo &lt;value&gt; &gt; /proc/sys/vm/overcommit_ratio"
&lt;/p&gt;
&lt;a name="eztoc4765_0_1" id="eztoc4765_0_1"&gt;&lt;/a&gt;&lt;h2&gt;Lustre Considerations&lt;/h2&gt;
&lt;p&gt;
If you allow over committing memory and see processes killed due to oom errors, umount then mount the Lustre file system to regain the file system space used by those processes.
&lt;/p&gt;
</description></item><item><pubDate>Wed, 09 Apr 2008 16:23:58 GMT</pubDate><title>Lustre — Mounting Multiple Sicortex File Systems</title><link>http://sicortex.com/support/knowledge_base/software/lustre_collisions_when_mounting_multiple_file_systems</link><description>
&lt;p&gt;
Collision occurs when two nodes in different SiCortex systems that have the same network id try to mount the external Lustre file system at the same time. Such collisions cause the mount to fail with the message transport endpoint shutdown, which will appear in dmesg on the external server, and in the SSP’s syslog and the node console logs.
&lt;/p&gt;

&lt;p&gt;
There are two remedies:
&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;Avoid having multiple SiCortex systems mount the same external Lustre file system at the same time.&lt;/li&gt;

&lt;li&gt;Change the internal netblock on the SiCortex systems so that each node has a unique network id. For details, see Configuring I/O Nodes and Network Interfaces in Chapter 2, of T &lt;i&gt;he SiCortex System Administration Guide.&lt;/i&gt; &lt;/li&gt;

&lt;/ul&gt;
</description></item><item><pubDate>Wed, 09 Apr 2008 16:18:51 GMT</pubDate><title>Lustre — Reading and Writing the Same File from Two Different Nodes</title><link>http://sicortex.com/support/knowledge_base/software/lustre_reading_and_writing</link><description>
&lt;p&gt;
Writing a file from one node while simultaneously reading it from another causes the file lock to ping-pong rapidly between the two, degrading performance.
&lt;/p&gt;

&lt;p&gt;
On a Lustre file system, avoid simultaneously reading a file (such as doing tail -f) while it is being written to from another node.
&lt;/p&gt;
</description></item><item><pubDate>Mon, 07 Apr 2008 17:58:24 GMT</pubDate><title>Quick Start Guide Errata</title><link>http://sicortex.com/support/knowledge_base/sc072_pds/quick_start_guide_errata</link><description></description></item><item><pubDate>Mon, 07 Apr 2008 17:56:01 GMT</pubDate><title>How do I bring up a virtual console?</title><link>http://sicortex.com/support/knowledge_base/sc072_pds/booting</link><description>
&lt;p&gt;
To bring up a virtual console, do the following:
&lt;/p&gt;

&lt;ol&gt;

&lt;li&gt;Log in as user root, passwd sicortex&lt;/li&gt;

&lt;li&gt;Enter scboot -p sca to start the SC072's CPUs&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;
If the scboot command produces IOX or Gateway failures, type ctrl-C and re-issue the command. Resetting the MSP tmodule service processor) and nodes takes about two minutes.
&lt;/p&gt;

&lt;p&gt;
Note that the /local directory is shared by all 72 nodes and the workstation.
&lt;/p&gt;
</description></item><item><pubDate>Mon, 07 Apr 2008 17:48:38 GMT</pubDate><title>How do I power on the system?</title><link>http://sicortex.com/support/knowledge_base/sc072_pds/power_on</link><description>
&lt;p&gt;

Pressing RESET (front left button) may be needed to produce the required boot/beep tone.&lt;br /&gt;
The beep comes 5 seconds after pressing reset and this assumes the rear power switch and front power (LED) are already switched on. &lt;br /&gt;
At power-on, errors about memory dump space and Eth1 can be ignored at this point; it will log a benign connection failure if no cable is present during boot time.&lt;br /&gt;
After several minutes of startup messages, you log into a Red Hat x86 board. &lt;br /&gt;User: root passwd: sicortex
&lt;/p&gt;
</description></item></channel></rss>
