Search Computing Course - Projects
List of available projects (see PDF):
- SECO APPLICATIONS
-
GOOGLE-TAB: Registration of Google Fusion TablesProject type: design and developmentNumber of students: 2Tutor(s):M. Brambilla, A. BozzonDescription: Google fusion tables (http://www.google.com/fusiontables/Home/) is a modern data management and publishing web application that makes it easy to host, manage, collaborate on, visualize, and publish data tables online. API provide programmatic access to Tables, Columns, Styles, and Templates. The proposed project consists in accessing thousands of public tables (possibly through direct support of the Google Fusion Table group) and apply to them SeCo registration techniques so as to test: 1. If registration heuristics work properly, 2. How many tables can be clustered into a good quality ER Model, 3. If SeCo exploratory techniques may build interesting and new results on top of Google fusion tables.
-
BIO-SECO: Advanced tools for biomedical-molecular data query and visualizationProject type: design and developmentNumber of students: 2-3, max 2 teamsTutor(s): M. Masseroli, A. BozzonDescription: The purpose of this project proposal is to design and develop a tool for bioinformatics data query, exploration and analysis. The starting point will be the User Interface currently developed in the Search Computing project (http://www.search-computing.org/WWWExploratorySearchDemo/), that will need to be adapted in order to suite the needs of an advanced biomedical-molecular data analysis interface. In particular, the student will 1) investigate which are the main requirements for a biomedical-molecular tools; 2) build a Search Computing Web applications upon the jQuery and JavascriptMVC frameworks by extending the Search Computing User Interface currently available; 3) make the application robust by providing it with advanced features, such as pluggable input validation rules and custom data type management; 4) enhance the application with some visualization widgets (e.g. the table, xy plot, etc. widgets already available); 5) test the application over a real-world biomedical-molecular data analysis scenario.
-
SECO-RE: Design and Prototype a Search Computing Application: Real EstateProject type: designNumber of students: 2-3Tutor(s): A. Bozzon, M. BrambillaDescription: The goal of this project is to design and prototype a vertical search application focused on Real Estate Search in the london Area. Participants are required to analyze the current offer of real estate search engines operating in the UK (e.g. http://www.nestoria.co.uk/, http://www.zoopla.co.uk/, http://www.home.co.uk/) in order to perform a comparative evaluation of the discovered engines. The evaluation must address 1) the offered search and result browsing features (data schema, search entry points, etc.); 2) the additional information provided with each result/result set (e.g., census data, aggregated data, criminality record, nearby sources, etc.); 3) advanced functionalities. The results of the analysis will be used to design a novel RE application, which overcome the limitations of the existing ones. The design must include a list of the data sources to be included (for each data source, students must detail how the integration would occur with existing data), the design of the user experience, and a Mock-up of the user interface.
-
SECO-TN: Design and Prototype a Search Computing Application: Leisure in TrentinoProject type: designNumber of students: 2Tutor(s): A. Bozzon, M. BrambillaDescription: The goal of this project is to design and prototype a vertical application focused on "leisure" activities in the "Trentino Alto Adige" area. The idea is that by using this application, users will make up their mind when they don’t really know what to do or where to head to. Students are required to scout a list of the data sources exposing data about activities, events, and places located in Trentino. For each data source, students must 1) detail how the integration would occur with existing data, 2) design of the user experience for the new application, and 3) produce a Mock-up of the user interface. Participants can inspire their design to existing, similar applications, such as http://www.goby.com/ or http://www.zvents.com/.
-
- NATURAL LANGUAGE PROCESSING
-
NLP-QA: Classification and Information Extraction on a huge natural language question/answer datasetProject type: research, design, developmentNumber of students: 2Tutor(s): A. Bozzon, S. QuarteroniThesis: yesDescription: the Yahoo! Answers (answers.yahoo.com) service allows users to post questions of any category in natural language and get answers from other users. In the project, students will analyze the huge corpus of questions and answers released by Yahoo! 2. develop machine learning models (based on e.g. Support Vector Machines, Conditional Random Fields) to classify questions according to their categories; 3. apply information extraction techniques to interpret answers according to known response schemas (e.g. Restaurant name, Address, Rating, Price, etc.).
-
NLP-GEO: Geo-related Information Extraction from natural language using YAGOProject type: design, developmentNumber of students: 1-2, max 2 teamsTutor(s): S. QuarteroniThesis: yesDescription: YAGO (http://www.mpi-inf.mpg.de/yago-naga/yago/) is a huge repository of facts derived automatically from Wikipedia (en.wikipedia.org) and WordNet (princeton.edu/wordnet). Many of these represent geographical information that could be precious to extract information from natural language queries combining different data services in the Search Computing project (e.g. sushi bar near Milan city center). In this project, students will devise robust methods to 1. identify geographical entities (e.g. Milan = city) 2. identify ways to refer to geographical entities (near Milan, Milan city center).
-
NLP-WOZ: Design and development of a Wizard-of-Oz experimentProject type: design, developmentNumber of students: 2Tutor(s): S. QuarteroniThesis: yesDescription: Devising a search system that interacts in natural language and guides users through the query process is a complex task, especially in a case where users can submit complex queries (I need a 3-star hotel close to a Japanese restaurant, not far from Piazza Duomo). Some hypotheses should be tested before the full-fledged system is available (What kinds of queries will users submit? What kind of behavior will be most appreciated by users? ) For these reasons, part of the system can be simulated beforehand by a human operator (Wizard) in what is called a Wizard-of-Oz interface. Experimenting with such simulations leads to 1. Data collection, 2. Understanding of implementation strategies. In this project, students will develop a Wizard-of-Oz interface to investigate query submission to the SeCo system.
-
- INFORMATION VISUALIZATION AND EXTRACTION, SOCIAL SEARCH
-
VIZ-BIG: Dynamic generation of visualizations for big dataProject type: design and (limited) development of code as a proof-of-conceptNumber of students: 2-3Tutor(s): M. Matera, A. BozzonDescription: This project focuses on the dynamic generation of visualizations for huge data sets retrieved through search services. Students will be required to understand in which measure the task of generating visualizations can be facilitated by the adoption of ready-to-use public resources (for example the visualization APIs available on manyeys - http://www-958.ibm.com/software/data/cognos/manyeyes/, or any other API collection), and how such resources can be integrated into an existing prototype that is able to analyze data and extract “interesting” properties to guide the selection of the most adequate visualization. In particular, students will need to: i) understand the set of “visual-oriented” properties that can be extracted from data, according to a model that we have already defined; ii) identify techniques to exploit such extracted properties to select the most adequate class of visualization tools; iii)define techniques to invokes the corresponding APIs to achieve a “concrete” presentation for the visualization model.
-
SOC-SRC: SocialSearch: harnessing social relationship for information seekingProject type: design, developmentNumber of students: up to 3 teams of 1-2Tutor(s): A. Bozzon, M. BrambillaDescription: The goal of this project is to create an information seeking application working on top of the most known social networks. The application will 1) allow a user to send a question to a social network-based audience and gather the returned answers; 2) select the most proficient social network according to the question focus; 3) select the proper individuals or groups within that network, according to their personal or social profile (e.g. according to their hometown, interests, skill, etc); 4) store the retrieved answers in a database for later querying. The application must exploit existing APIs in order to interact with the social platforms. The project can be undertaken by 3 distinct groups, as each group will address one social platform among Facebook, LinkedIn, and Doodle, and will implement a different search scenario.
-
- SERVICE DESIGN
-
PHDS: Profiling of Heterogeneous Data SourcesProject type: technicalNumber of students: 2Tutor(s): S. Vadacca, D. BarbieriThesis: yes (design and development of adaptive strategies)Description: The Search Computing infrastructure allows the execution of multi-domain queries, which are translated into executable query plans expressed in Panta Rhei. The key components of the Panta Rhei model are the two strategy nodes, whose aim is to orchestrate the invocation of heterogeneous according to pluggable join strategies. Adaptive strategies could be plugged-in to react to unpredicted events, such as failures, unexpected number of results and response times, clustered results, etc. The aim of the work is to profile heterogeneous data sources, designing and implementing a log-based environment. Real-time vs. batch analysis of the log will be discussed.
-
P2P: Peer to Peer Integration of Heterogeneous Data SourcesProject type: technical/architecturalNumber of students: 2 -3Tutor(s): S. Vadacca, L. Tettamanti, D. Braga, D. BarbieriThesis: yesDescription: We want to investigate the adoption of a peer to peer architecture in the context of Search Computing. The following aspects will be considered: (i) motivations of the distributed deployment; (ii) how to register services and take into account routing aspects; (iii) how to exploit an instance of SeCo engine to be the server of another instance; (iv) query planning driven by the distribution of the instances; (v) distributed caching; (vi) server-side deployment vs. client-side deployment.
-
MQA: Multi-Query Allocation of Limited Information Accesses: a Bankruptcy PerspectiveProject type: technical/architecturalNumber of students: 1Tutor(s): S. Vadacca, D. BragaThesis: noDescription: We want to investigate the case of concurrent queries accessing to data providers with limited availability (e.g. web services, public APIs, etc.). Each query requires access to possibly more than one data provider in order to get an integrated result. The problem can be modeled as a bankruptcy situation, where multiple users concurrently claim for a limited estate. Besides, there is more than one estate whose combined allocation could affect the degree of satisfaction of the claimants. The aim of the project is to evaluate existing approaches to bankruptcy situations adapted to the case of multiple estates with allocation dependency. The solution will be complemented with optimization steps, which better exploits the available data providers.
-
