October 4, 2017

In August the world’s premier interdisciplinary conference on data science, data mining, knowledge discovery, large-scale data analytics, and big data, KDD2017, came to Halifax, chaired by Stan Matwin, Director of the Institute for Big Data Analtyics in collaboration with Shipeng Yu of LinkedIn and a team of organizers including Local Chair Evangelos Milios. This was no small accomplishment; the conference has typically been hosted in some of the biggest cities in the world, including Beijing, Sydney, New York and Paris. Being invited to host this year’s conference was a significant recognition of the strength of our team in this field and the strength of the proposal they had put forward. It was also a big event for the city of Halifax and the Province of Nova Scotia. In his opening remarks Mayor Mike Savage conferred upon Stan and Evangelos the honorary title of Ambassadors for the City of Halifax. As the conference progressed researchers in both academia and industry in this province benefited from access to the latest research and the leading pracitioners in the field. 

For 5 days over 1600 participants from 51 countries met for a wide range of activities and networking opportunities. This was the largest KDD conference held outside of the United States and the third largest overall, after KDD2016 in San Francisco and KDD2014 in New York. It was held at the World Trade and Convention Centre on Argyle Street, with the largest events taking place in the Scotiabank Centre and the Cunard Centre on Marginal Road.

The conference continued its tradition of a strong tutorial and workshop program on leading issues of data mining during the first two days, including some hands-on tutorials on practical data science tools. The last three days were devoted to contributed technical papers, describing both novel, important research contributions and deployed, innovative solutions. An outstanding lineup of industry speakers shared their expertise in deploying industrial data mining solutions. Three keynote talks, by Cynthia Dwork, Bin Yu and Renée J. Miller touched on some of the hard, emerging issues before the field of data mining. A KDD panel brought together industry experts to discuss the future of artificially intelligent assistants. There was also a “Meet the Editors” Panel – to make KDD researchers aware of the academic journal opportunities for publishing. And a new feature this year was the series of “Meet the Experts” sessions – informal roundtable meetings where graduate students and early career researchers could discuss research programs and career development plans with established experts.

Recognizing the importance of student participation, KDD2017 awarded a record amount of $145,000 to support student travel. Students were also offered the opportunity to work at the conference, doing a range of essential tasks such as working at the registration desk, putting swag bags together, helping with events, or helping delegates to find their way around. In exchange for 8 hours of work, free access to all conference activities was provided. Some 40 graduate students from at least three Nova Scotia universities took this opportunity to participate in the conference. Many students also benefitted from the Broadening Participation in Data Mining (BPDM) workshop, receiving mentoring in how to more effectively pursue their ambitions in the field. Zahraalsadat Alavizadeh, currently doing a Masters in Electrical Engineering at Dalhousie, spoke to industry representatives about trends, job requirements and how to beome more eligible to employers. The Google mentor helped her to format and optimize her CV. She left with a fresh idea about changing her field of study to Computer Science for her PhD. Another Dalhousie student, Sara Khanchi, currently working on her PhD in Computer Science was given helpful advice on how to present a pitch to others and gained some valuable insight into the issue of whether to pursue a career in academia or industry.

Some of the research presented at KDD2017 has already begun to receive attention in the media. IT World Canada mentioned a research paper on embedding based online local event detection in geo-tagged Tweet streams. Business Insider published a feature on the presentation from Amazon Labs researchers on their work on an algorithm that can learn about styles from images and recreate similarly styled garments from scratch. The Daily California and Berkeley News highlighted the presentation by UC Berkeley student Rebecca Portnoff on her creation of two algorithms aimed to scan through online sex advertisements and identify human trafficking circles. ZDNet profiled the analogy gap research conducted by Carnegie Mellon University researchers and presented at the conference.

Halifax has proven itself to be ready to host premier international events in the high tech area; it is hoped that more such world-class events will follow. KDD2018 will be held in London, England, in August, 2018.

July 5, 2016

Congratulations to Erico Neves De Souza, Kristina Boerder, Stan Matwin and Boris Worm for their recent publication in PLOS ONE.  Their paper can be found online at: 

The abstract is below:

A key challenge in contemporary ecology and conservation is the accurate tracking of the spatial distribution of various human impacts, such as fishing. While coastal fisheries in national waters are closely monitored in some countries, existing maps of fishing effort elsewhere are fraught with uncertainty, especially in remote areas and the High Seas. Better understanding of the behavior of the global fishing fleets is required in order to prioritize and enforce fisheries management and conservation measures worldwide. Satellite-based Automatic Information Systems (S-AIS) are now commonly installed on most ocean-going vessels and have been proposed as a novel tool to explore the movements of fishing fleets in near real time. Here we present approaches to identify fishing activity from S-AIS data for three dominant fishing gear types: trawl, longline and purse seine. Using a large dataset containing worldwide fishing vessel tracks from 2011–2015, we developed three methods to detect and map fishing activities: for trawlers we produced a Hidden Markov Model (HMM) using vessel speed as observation variable. For longliners we have designed a Data Mining (DM) approach using an algorithm inspired from studies on animal movement. For purse seiners a multi-layered filtering strategy based on vessel speed and operation time was implemented. Validation against expert-labeled datasets showed average detection accuracies of 83% for trawler and longliner, and 97% for purse seiner. Our study represents the first comprehensive approach to detect and identify potential fishing behavior for three major gear types operating on a global scale. We hope that this work will enable new efforts to assess the spatial and temporal distribution of global fishing effort and make global fisheries activities transparent to ocean scientists, managers and the public.

April 27, 2016

On Monday April 25 Dr. Marina Sokolova took part in a panel organized by the Interdisciplinary Research Group in Organizational Communication at the University of Ottawa.    Other invited speakers were Lewis Eisen from the Treasury Board of Canada and Karim Bechane from the Canadian Food Inspection Agency.  The topic was"The Challenge of Managing Information in Organizations: From Big Data to Thick Data".

March 1, 2016

Problems should be solved. Pipe dreams should be pursued. Pitch us your data project and we could put a team led by one of Canada’s top data scientists to work for your small or medium-sized business for six months — for free.

For more details go to the competition page.

December 7, 2015

A team of 3 students from the Institute for Big Data Analytics have walked away with the prize of the "Most Innovative Use of Big Data", and a cheque for $800 from the Sports Hack 2015 competition, held Nov 27-29th in Vancouver, Toronto and Halifax and sponsored by a range of big name tech companies, universities and the CFL.  Forty-one teams competed in the challenge to produce an app to encourage fan engagement using datasets from the CFL.  Our team created an app that tracked the Tweets of CFL fans providing them with rankings of their activity levels and a points system to reward their activity and their participation in mini-games about the football matches, making use of both machine learning techniques and sentiment analysis.  The app could also provide CFL organizations with information about their fans activities and locations and provide the opportunity to advertize to fans, to convert fan points into merchandise of the organizations or their sponsors.  It would be fan-centric rather than team-centric .  The contest was a big challenge, given the hard work, the lack of sleep, and the tight competition.









Gurcan Gercek, Hossein Sarshar and Behrouz Haji Soleimani

November 16, 2015

Congratulation to Masters student Hossein Sarshar and Research Assistant Pedram Adibi from the Institute for Big Data Analytics, and their colleague, Dalhousie alumnus Mehran Zamani, who, as Team Sol-Ops, won first prize in the Smart Energy Apps Challenge sponsored by Innovacorp, HRM, NSCC, Shiftkey Labs and Dalhousie University.  The challenge was to create an app which made use of the data collected by Halifax Solar City on a number of parameters relating to installed solar hot water systems.  Team Sol-Ops created an app which made use of techniques of computer science, data science and engineering to provide customers with an analysis of their usage of hot water, and recommendations for changing that behaviour to maximize the environmental and economic benefits they could receive from their systems.  It took 6 weeks to create the winning app, beating 10 other teams and claiming the $6000 first prize.  Success in the competition may also be the stepping stone to other opportunities as the team finds themselves in conversations with investors and other organizations who are taking in interest in the commercial potential of their creation.

July 14, 2015















Rob Warren gave a talk on the Panel on Linked Open data at the Digital Humanities Conference, Sydney Australia, June 29 - July 3. 

His paper explored the notion of place, feature and geometry in the context of the Great War using Linked Open Data. In previous works, the translation of obsolete military coordinates through API’s (Application Program Interface) was previously covered. He reviewed their use as an efficient and effective means of indexing archival documents about the war. Most war diaries, operations orders and dispatches in British and Dominions records refer to locations using both named features and coordinates. This permits the geo-referencing of each statements within a document to find the current location in question while segmenting the document according to different spatial component.

March 17, 2015

Data-Driven Augmented Reality for Museum Exhibits and Lost Heritage Sites.
Museums on the Web 2015 (
Palmer House Hilton, Chicago, IL, USA
April 8-11, 2015

We review the possibilities, pitfalls, and promises of recreating lost heritage sites and historical events using augmented reality and "Big Data" archival databases. We define augmented reality as any means of adding context or content, via audio/visual means, to the current  physical space of a visitor to a museum or outdoor site. Examples range from simple prerecorded audio to graphics rendered in real time and displayed using a smartphone.

Previous work has focused on complex multimedia museum guides, whose utility remains to be evaluated as enabling or distracting. We propose the use of a data­-driven approach where the exhibits' augmentation is not static but dynamically generated from the totality of the data known about the location, artifacts, or event. For example, at Bletchley Park, reenacted audio conversations are played within rooms as visitors walk through them. These can be called "virtual contents," as the audio recordings are manufactured. Given that a number of documentary sources, such as meeting minutes, are available concerning the events that occurred within the site, a dynamic computer-generated script could add to the exhibits.

Visitors' experiences can therefore react to their movements, provide a different experience each time, and be factually correct without requiring any expensive redesign. Furthermore, the use of a data-driven approach allows for the updating of exhibits on the fly as researchers create or curate new data sources within the museum. If artifacts need to be removed from an exhibit, pictures, descriptions, or three-dimensional printed copies can be substituted, and the augmented reality of visitor experience can adapt accordingly.

March 11, 2015

Over 1,000 of the world's leading edge researchers and practitioners in big data are coming to Halifax for the 2017 Conference on Knowledge Discovery and Data Mining.

Stan Matwin, Canada Research Chair (Tier 1) at the Faculty of Computer Science, Dalhousie University and the director of the Institute for Big Data Analytics, announced today, March 9, that Halifax was the successful bidder. The conference, with Dr. Matwin as the general chair and Evangelos Milios of Dalhousie University as the local chair, will be held in the new Halifax Convention Centre, which opens in 2017.

The bid was led by the Institute for Big Data Analytics and the Halifax Convention Centre, in collaboration with a local host committee of academic, government and industry representatives.

"This announcement is further evidence that the Institute for Big Data Analytics has established Dalhousie and Halifax as leaders in the global fields of data science and big data analytics," said Dalhousie president Richard Florizone.

"This is a great opportunity for Dalhousie -- as well as other local organizations and institutions -- to showcase the world-class research, ongoing collaboration and pool of talent we have here in the region to national and international audiences," said Dr. Matwin

Halifax now ranks amongst top cities who have previously hosted the conference, including:

-- Sydney, Australia

-- New York City, New York

-- Beijing, China

-- Paris, France

"We're proud to partner and collaborate with our local experts to host this conference and showcase Nova Scotia's strengths in big data to the world," said Scott Ferguson, president and CEO of Trade Centre Limited, the Crown Corporation that manages the convention centre. "This is an exciting opportunity for our industry, academic and research communities to highlight their work and connect with their global counterparts."

Nova Scotia has a booming information technology sector and the province is quickly establishing itself as an international hub of excellence in big data research. Hosting this conference will allow the local sector to benefit from top academic research and industrial presence aimed at promoting collaboration and growth in big data analytics.

First established in 1995, the Knowledge Discovery and Data Mining conference is the premier international forum for data mining and big data research, bringing together practitioners from academia, industry, and government to share their ideas, research results and experiences. Sponsors of note include Microsoft, Yahoo, Deloitte, Accenture, Facebook, LinkedIn, Google and IBM. The 2016 conference will take place in San Francisco.

March 4, 2015

The 18th International Conference on Discovery Science (DS 2015) will be held in Banff (Canada), on 4-6 October 2015, and provides an open forum for intensive discussions and exchange of new ideas among researchers working in the area of Discovery Science. The scope of the conference includes the development and analysis of methods for discovering scientific knowledge, coming from machine learning, data mining, and intelligent data analysis, as well as their application in various scientific domains.  We welcome papers that focus on the analysis of different types of complex data, such as structured, spatio-temporal and network data. We particularly welcome papers addressing applications. Finally, we would like to encourage contributions from the areas of computational scientific discovery, mining scientific data, computational creativity and discovery informatics.
For more information: