Date: 2014 October 8 2-3PM.
Attendees: G. Oakham,D. Rogers, P. Kalyniak, R.Thomson, A. Bellerive, T. Gregorie.
Staff: Wade, Michael, Stephen
1. Office moves
Due to space pressures, Stephen will be moving to a single office
with his current large office being used to house Theory RAs.
Yeremia has moved to his new office in Earth Sciences.
Mikhail departed at the end of September and will move his
computers out of the room previously shared with Stephen and
Yeremia by the end of the week.
2. Research Compute Cluster
The Theory group, Pat, specifically, has some funds available to be
applied to supplement the departmental research computing cluster
with the primary objective of supporting their computational
requirements. The available funds would provide about an additional
16 cores based on recent server purchases in the Faculty of Science.
A discussion was held to determine if there was interest from other
research groups to also supplement the computing infrastructure to
leverage purchasing power to obtain better pricing. There appears
to be interest, but the timing may not be optimal, so it was
proposed that a proposal be developed with pricing to purchase the
computing for the Theory group and that this information be shared
with the committee to further gauge interest in a larger purchase.
The funds need to have been spent by the end of the calendar year,
so the action item is to develop a configuration and obtain a quote
or quotes by the end of October.
Dave is going to follow up with his colleagues at the Ottawa
Hospital to determine if there was interest in supplementing their
contribution to the compute cluster. Gerald had spoken to Kevin and
reported that there was no requirements at this time from Exo.
Dave also expressed interest in the Intel Phi co-processor and has
funds earmarked to obtain a server with these co-processors. His
colleagues at NRC are looking to evaluate running EGS code on the
Phi co-processors. Pending these results, Dave will determine how
to best to supplement the cluster for his groups computing requirements.
A broader issue raised by Rowan was with regards to the aging
compute cluster and whether there is a plan to keep the cluster
current and relevant or allow it to die slowly. With the current
CFI and NSERC rules, funding for new local computing infrastructure
is generally not permitted. As a consequence and with high demand
for computing, Rowan has submitted a request to the Compute Canada
Resource Allocation committee for computing resources on Compute
Canada's available platforms. A determination is expected by the
end of the year.
Gerald proposed an evergreening approach similar to the current
departmental computing infrastructure approach adopted for the past
few years with the CMO.
Another discussion focused on the efficacy of the current queuing
discipline implemented for batch queuing. Currently there is a
single monolithic queue with first in and first out. With specific
owners of resources and large scale job submissions users were
blocked from their own resources. Wade proposed a return to the
resource owner-based priority queuing that was implemented
before. However, there were two principal stakeholders, but now
there are more resource owners, specifically, the Ottawa Hospital,
EXO, SNO, and soon Theory. Alain also suggested that the queuing
system be flexible to accommodate quick short jobs from
non-resource owner groups as well, without being relegated to the
bottom of the queues. Gerald asked for a proposal for the next
Computer Committee meeting.
Also a request was received from a graduate student of Tong's to
run OpenMPI parallel jobs was raised and a short discussion of how
to accommodate this request. It was suggested that the student
should perhaps look at using HPCVL which has support for OPenMPI.
3. Current IT issues
Alain reported tyr/thor is very slow, especially with processes
like firefox which have been running for a long time. He suggested
to setup daemon to kill those heavy processes if they had been
running for a while.
Alain reported a Sunray lab issue about users having locked their
screen and leaving, which prohibits users from the next class from
logging in from the specific terminal. Wade suggested that this was
an implementation issue and that the user screen locks should be
disabled.
Another issue raised was the slowness of pine due to the spam
filter defined for pine. With large inboxes in excess of 200
messages, it takes a long time to open mail with pine. Dave
indicated that Bill had fixed this for him previously by removing
the pine defined spam filter(s). The proposed solution is to remove
the pine pattern filter for spam. With Spamassassin and grey
listing, the volume of spam has greatly been reduced.
4. CMO
Gerald asked for evergreening proposals for the current CMO funds
for the current academic cycle.
5. State of current departmental infrastructure
Alain asked a question about the adequacy of tyr and thor. It was
agreed that they were long past their best before date. Wade
indicated that there needs to be overhaul of many of the
services. This included the bastien and interactive nodes - tyr and
thor, the SunRay service, mail service, web service, etc. There are
planned new services such as a web-based calendar booking system
and a locally hosted ownCloud service similar to DropBox as well.
Gerald questioned what resources would be required to do this. Wade
indicated that these resources are available through the
evergreening plan of the departmental infrastructure for the past
couple of years, but due to time constraints this plan had not been
realized. The plan was to virtualized many of these services on
virtual machines removing the dependency on the physical
infrastructure. The cloudshare service used for hiring new Physics
faculty is a service is an example of a service developed on a
virtual server. Older hardware will be phased out as these services
are migrated.
6. Proposed Agenda for next meeting on November 5th 2-3PM
o Evergreening of Research Computing Cluster Proposal
o Queuing Discipline Proposal
o Departmental IT Infrastructure Changes Proposal