Minutes of CRAG Meeting #26 held on 15th January 2007 ====================================================== Time and Place: --------------- 13:00 - 14:00, Monday 15th January 2007, Room E1, Physics and Astronomy Present: -------- John Brodholt (Earth Sciences - Chair) Clovis Chapman (Computer Science and Condor Administrator) Clare Gryce (Research Computing Coordinating Manager - Secretary) Simon Clifford (Chemistry - ex Prism Systems Administrator) William Hay (IS-RC Team) Russell Knighton (IS-RC Team) sally Price (Chemistry) Jeremy Yates (Physics and Astronomy and Research Computing Deputy Coordinating Manager) Ben Waugh (Physics and Astronomy) 1) Apologies, Minutes, Actions ============================== Apologies --------- Apologies had been received from Paul Kellam. Minutes ------- The minutes of the last meeting (#25) had been circulated, approved and published on the RC website. Actions from last meeting ------------------------- ACTION 21-2: clare to draft statement on policy relating to misuse of facilities or unacceptable behaviour by users. Ongoing, awaiting discussion by 'IS-RC Liaison Group'. No problems meanwhile other than recent issue on interactive use of Altix (see below) One Miracle user running long jobs on Keter front end. IS-RC to let Clare know if behaviour persists and becomes a problem. ACTION 22-1: IS-RC to add all new users to appropriate announcement lists, and retrospectively add existing users who have not subscribed. Ongoing. Some have been done by IS-RC. Clare still to do retrospective addition of all users to Forum list. CG to chase up IS-RC actions with Austin. ACTION 24-3: Item for agenda of next meeting - job size (CPU's) guidelines on Altix and possible migration to Keter. Pending. Carried forward to next meeting (see agenda item 9). ACTION 24-5: Jeremy to talk to Dario about possible migration of VASP jobs to Keter. Ongoing. Jeremy has talked to Dario and awaiting a response. LCN users are currently running VASP jobs on Keter. Done. ACTION 25-1: clare to email user advising on interactive use of Altix. Done. 2) Condor Status Report & stats =============================== Nov and Dec outstanding. Verbal report from Clovis; quite a lot of demand, Sancerre (main submission machine) getting overloaded. Sancerre needs more RAM and some hard disc to be able to manage queues effectively. Possible use of 'Condor A' machine that hosts Condor Mirror? Accelerate migration of management to IS? ACTION 26-1: Clovis to get quote on necessary upgrades to Clare. ACTION 26-2: Clare to liaise with Andrew Dawson about time for migration of Condor service. 3) C^3 Status Report & stats ============================ October 74% November 76% No stats for Dec. due to SGEE bug (triggered by certain user actions). Some power problems post Xmas, looks like is could be due to overheated/burned cables though some questions over power draw of IBM servers. Looking at removing a few nodes from the system. Usage for Dec over 60% (without unreported jobs) 4) Altix Status Report & stats ============================== No stats for last three months, were generated by hand. Script for reporting being created. System is generally full. Should current unlimited time for <3 CPU's be reviewed? 5) Keter Status Report and stats ================================= October 83% November 60% December 63% Still some jobs waiting as users have not specified queue info on submission. ACTION 26-3: IS-RC to send email to Keter annouce list remining all users about online info on sub submission. 6) Prism Status Report ====================== Has been rebuilt by IS-RC, post diagnosis of firmware problem. Now to be tested with test jobs. Russell Knighton will primarily be looking after the Prism within the IS-RC team. ACTION 26-4: Russell to set up mailing list for Keter users. ACTION 26-5: Russell to look into allocation mechanism for Prism. ACTION 26-6: Russell/clare to check current service contract for Prism. 7) User requests ================ CRAG requests October 06 to 10th January 2006 C^3 – 3 Altix – 4 Keter – 4 Condor – 0 Prism – 0 No problems, all accounts set up. 8) Procurement update ====================== ITT should be issued around the beginning of Feb. CRAG will need to start to consider queues and allocation mechanisms in next few months. 9) Migration of jobs from Altix to Keter ======================================== Jeremy has been looking at all the versions of VASP that are being used. Dario's version (v13) not compiling on Keter; optimisation is affecting results. Also looked at versions being used by LCN. V28 most robust of recent versions, but still cannot be optimised for Keter. Then looked at activity on Altix; also problems here. Tried to remake V28 but no luck, looks like it hasn't been recompiled since upgrade from RHL to Suse. Problems with library calls, some have been deleted/moved. Now in touch with SGI to sort it out. To monitor and review once recommendation for VASP users can be made. ACTION 26-7: clare/Jeremy to email user groups with status and remind users to raise issues arising with IS-RC/ Clare/Jeremy. ACTION 26-8: Jeremy to start VASP mailing list. 10) News from RCSC and Forum ============================ None. 11) AOB and next meeting date ============================= MOnday 12th March LIST OF CURRENT AND ONGOING ACTIONS =================================== ACTION 21-2: clare to draft statement on policy relating to misuse of facilities or unacceptable behaviour by users. Ongoing, awaiting discussion by 'IS-RC Liaison Group'. ACTION 22-1: IS-RC to add all new users to appropriate announcement lists, and retrospectively add existing users who have not subscribed. Ongoing. Some have been done by IS-RC. Clare still to do retrospective addition of all users to Forum list. CG to chase up IS-RC actions with Austin. ACTION 24-3: Item for agenda of next meeting - job size (CPU's) guidelines on Altix and possible migration to Keter. Ongoing, pending resolution of VASP problems. ACTION 26-1: Clovis to get quote on necessary upgrades to Condor RAM to Clare. ACTION 26-2: Clare to liaise with Andrew Dawson about time for migration of Condor service to IS. ACTION 26-3: IS-RC to send email to Keter annouce list remining all users about online info on sub submission. ACTION 26-4: Russell to set up mailing list for Keter users. ACTION 26-5: Russell to look into allocation mechanism for Prism. ACTION 26-6: Russell/clare to check current service contract for Prism. ACTION 26-7: clare/Jeremy to email user groups with status and remind users to raise issues arising with IS-RC/ Clare/Jeremy. ACTION 26-8: Jeremy to start VASP mailing list.