Date: 2015 Nov 23 2:30-3:30PM.
Attendees: G. Oakham, D. Rogers, T. Xu, R. Thomson, R. Gornea, D. Gillberg
W. Hong, M. Hu, S. Wang, M. Hamstra
1. Review minutes and approval
Michael asked that the minute of Oct 21st can be approved as listed on physical website if there are no objections.
2. Common IT issues Discussion and Follow-up
Michael updated Cluster was running very well since zfs tank data usage has been lowered down around 65% on ngoma, which approved NFS has good performance under the usage threshold (80%). Michael supplemented a good example for this: Nelson (one of medical physics PhD students) usually lost 10% submission jobs, he run 100% jobs on cluster after ngoma zpool usage lower down to 65% by cleaning up (compressing ROOT data by Wade and moving data to another place from ngoma by Michael).
Michael indicated SGE crashed couple of times on cluster head node two weeks ago, however it is stable now.
Dave asked what is problem on last weekend about the cluster. Michael said it is NFS issue happened again on cisk on Wednesday because of one of medical group students filling home area to 100% in short period with cluster jobs, and cisk is running back normally and stably short after the home area has been cleaned up by the student.
Rowan asked what is problem about Qt4 (her group can’t run Qt4 on egs01/02, only on atlas04/04 now). Wade answered Michael was moving all data (which contains Qt4 package) on data060 to different location to free disk space for DEAP group, egs01/02 are running SL5.5, which doesn’t support Qt4 compiling, that is why point to Qt4 on data060. Rowan asked the reason keeping SL5.5 on egs01/02. Wade recalled the only reason is mat-lab running on those machines and Carleton has campus wide mat-lab license. Rowan asked if mat-lab on egs01/02 can be covered by University license, if so, we should upgrade egs01/02 to SL6.5. Stephen promised he would check after meeting.
3. Ever-green Project Discussion.
First of all, Michael presented disk inventory and cluster nodes. In the slides, there is around 50Tera-bytes free disk available in total (Wade pointed it should be only 80% of real data can be used for all ZFS tanks, Michael said the total free disk space should less than 50T since he used 80% of raw data). Tusker has around 41T data available. Tusker is for faculty and staff home folder, so 41T can’t be used for cluster data area, but can be used as storage only. Cisk (student home area) has 4.6TB available. Michael indicate there is little oversubscribed on home tank, he will move some data to tusker in next couple of weeks to low down the usage to less than 80%. Ngoma (cluster data area) only has around 6T disk space available. Michael said the challenge is most data on ngoma can’t be moved to another place since most of data are using by cluster. Michael said DEAP group requested 5TB space for their project analysis. Rowan mentioned her group would need around 1TB for the cluster analysis soon. Wade asked Mike Hamstra if DEAP really need 5TB for analysis or just for storage. M.Hamstra said DEAP is running jobs on Computing Canada for now, but Mark prefer to move everything to Carleton local gradually and those 5TB data should be for analysis not just for storage. Michael showed the cluster nodes inventory (620 cores in total: egs--212 cores; atlas--352 cores; ottawahospital -- 32 cores; theory -- 24 cores.
Michael said there is high pressure on data storage in current situation. To follow up last computer meeting, one option is to upgrade 1 TB disk to 2TB disk on Ngoma which requires around 10K in total for 48 disks, another option is to purchase new disk server (new chassis plus disks). Dave said it should collect the total amount of money from all groups for the even-green project. Gerald indicated the Ever-Green project dead line of place order should be earlier of February instead of end of February. Michael said the reason we set end of February is waiting for Wade’s purchase plan to make a single big purchase. Dave said we should keep our time line. Michael said we could set end of January. Tong suggested to get quotes ready before end of January, if faculty of science can’t make purchase before the time, we can go ahead on our own plan.
For computer nodes, Rowan expressed EGS group likes to contribute money on it, Mike also said DEAP likes to have their own interactive node and compute node for DEAP group. Wade asked the reason to have interactive node. Mike said he’d like to install some particular software for his group need. Wade explained to Mike interactive node could be shared among the groups. Razvan said it is good to setup a dedicate queue for DEAP group.
Gerald asked Michael to write down each group’s requirement and make proposal for the ever-green project. Michael asked how much money roughly we have. Gerald asked Michael to make a proposal to purchase 3 twenty-four cores computing nodes and disk storage with two options within total budget 40K. Michael said he would work on proposal with Wade and Stephen after meeting and send to Gerald for review.
4. IT Infrastructure Changes
Michael showed the 2015-2016 CMO budget (it becomes department budget now), in the budget, there is one item (18 disks) for virtualization project. Now it is about time to get those disks to start to move servers to virtualization platform and first infrastructure server will be our new mail server. Wade indicated we don’t need to buy all 18 disks for now since we can’t use all for now. We should buy what we need now and this option is also good for disk warranty. Michael said he agreed this and we may buy 6 disks for now and he will get new quote and send for approval.
Michael updated the status of testing a new mail server: all major components have done on both server side (smtp server: postfix, impa server: dovecot, anti-spam+anti-virus, policy server, web mail) and client side (thunderbird, alpine, web access, mobile devices configuration). Michael said he would work with Wade on migration plan soon. Wade said we probably could do the mail server migration at earlier of January.
5. AOB
Razvan asked the date of new physics web site. Wade said it probably would be in 2 or 3 months. Razvan asked any change for Joanne Martin since she is responsible for web-content management. Wade said she is supposed to remain same role. Tong asked who is going to take care of migrate web content to new website. Wade said it is possible to hire co-op students to do that. Razvan said he totally agreed with that.
Razvan also asked the progress of web-base calendar (own-cloud). Michael said he could work on own-cloud after mail migration completed. Wade said he still have some issue on user authentication with school active directory.
6. Proposal next meeting
Michael asked if it is to schedule meeting next month since it is holiday season. Gerald said there is no meeting on December.