Gerald Friedland, Adjunct Assistant Professor

Closed (1) Machine Learning for Multimedia Big Data

Closed. This professor is continuing with Spring 2017 apprentices on this project; no new apprentices needed for Fall 2017.

Our research, the Multimedia Commons project, is an NSF-sponsored collaborative project between Yahoo, Lawrence Livermore National Lab, UC Berkley, Amazon and the International Computer Science Institute to develop research tools and resources around a massive corpus of more than 99 million images and nearly a million videos (multimediacommons.org).

We are looking for a few undergraduate researchers for
(1) unsupervised event detection from 100 million images and videos dataset;
(2) audio/music processing (content-based recommendation); and
(3) multimedia retrieval system





Undergraduate researcher will learn to design and implement experiments using tools and machine learning frameworks used in the state-of-the-art research in Multimedia Computing.

The student is expected to meet with mentor at least twice a week and attend weekly lab meeting at which we will discuss the results and direction of ongoing experiments. In addition, the apprentice will be responsible for reading several primary research articles and presenting them to the research group, in order to increase his/her familiarity with the field.

The emphasis of this position will be on exposing the apprentice to the experimental research process through hands on experience, and developing skills and knowledge of techniques that may be used in future academic and research endeavors. Successful research apprentices complete their assignments in a timely manner, maintain open communication with other members of the research group and with the research coordinator, ask questions when they need help or guidance, and actively ensure (through communicating with the research coordinator) that they are getting the experience they want from the URAP program.

Day-to-day supervisor for this project: Jaeyoung Choi, Ph.D. candidate

Qualifications: Major is preferably computer science, statistics or any other science connected to large scale data analysis. - Requires introductory linear algebra (e.g., Math 54) and introductory probabilities and statistics (e.g., Stat 20). Previous experience and/or coursework in machine learning is strongly preferred. - We have a strong preference for sophomores or juniors who may participate for at least one year. - GPA of at least 3.5 - Required: Python, Linux - Desirable: Machine Learning, Intro to Artificial Intelligence

Weekly Hours: more than 12 hrs

Off-Campus Research Site: Students can either work from BIDS or at ICSI (1947 Center Street).

Related website: http://mmcommons.org/

Closed (2) Piloting Audio Annotation Techniques

Closed. This professor is continuing with Spring 2017 apprentices on this project; no new apprentices needed for Fall 2017.

AudioNet will be a corpus of audio annotations on the ~800,000 videos in the open YFCC100M corpus at http://mmcommons.org, including annotations relevant to particular video analysis and retrieval tasks like automatic scene classification, event detection, and sentiment analysis. Sounds like "crowd cheering" or "fire alarm" can be used to detect situations that may be difficult to identify using visual means alone.

We are currently in the planning phase for the full (very ambitious!) project, in which we need to test out different approaches to obtaining and organizing the highest-quality, most useful annotations in the most efficient way.

Student researchers will help coordinate experiments with several different annotation scenarios, collecting data to determine which technique(s) and annotation schemes we will use for our large-scale effort. In particular, student researchers will be responsible for experiments in using crowdsourcing platforms to obtain annotations under various conditions.

While some work can be done from home, you will need to be present regularly at our ICSI lab space (in downtown Berkeley) to work directly with your supervisor.

Day-to-day supervisor for this project: Julia Bernd, Staff Researcher

Qualifications: Experience creating crowdsourcing tasks required.

Weekly Hours: 3-6 hrs

Off-Campus Research Site: International Computer Science Institute, 1947 Center St., Berkeley

Related website: http://mmcommons.org/audionet

Closed (3) Dark Data Flows and Personal Privacy

Closed. This professor is continuing with Spring 2017 apprentices on this project; no new apprentices needed for Fall 2017.

We are investigating the global network of unseen programs that gather and combine every scrap of personal data they can, without clear public knowledge. The “Dark Data” project seeks to illuminate this complex ecosystem through various experiments that attempt to measure and perturb unseen data pools by selectively adding or retrieving information, and to examine the effects of these hidden data flows on people’s employment, economic, and social activities. For example, we're looking at how different behavioral advertising mechanisms can lead to discriminatory ad placement: http://possibility.cylab.cmu.edu/adfisher/.

A critical piece of the project is education and outreach. Firstly, we are seeking to empower the general public by illuminating what these data pools are and what they do. Secondly, we want to enrich computer science education by developing classroom exercises modeled on our experiments (to be published at http://teachingprivacy.org).

This is a collaborative project between several researchers at UC Berkeley and the International Computer Science Institute. We are looking for student researchers to assist with several aspects, including in designing, running, and analyzing experiments to study how companies collect and use information about people, and in developing educational materials.

While some work can be done from home, you will need to be present regularly at our ICSI lab space (in downtown Berkeley) to work directly with your supervisor.

Day-to-day supervisor for this project: Michael Tschantz, Staff Researcher

Qualifications: Proficiency in Python and Linux are preferred. Experience with API programming, machine learning, and/or classroom teaching are also pluses.

Weekly Hours: 9-12 hrs

Off-Campus Research Site: International Computer Science Institute, 1947 Center St., Berkeley

Closed (4) Multimedia Big Data Field Studies

Applications for Fall 2017 are now closed for this project.

Once scientists have the idea for a field study, they must develop and execute a plan for recording data. This process is often very time-consuming; recruitment and recording setups can be complex, and rules about experiments with human subjects must be followed. However, in the age of big data, there are some new alternatives. People are increasingly sharing all kinds of data about the world. They do this for their own reasons, not to support field studies-but it presents a great opportunity for scientists. The Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset comprises 99.2 million images and nearly 800,000 videos from Flickr, all shared under Creative Commons licenses. Even larger datasets are expected in the future. To enable scientists to leverage this data for field studies, we are working on a new framework that extracts the required information from this huge dataset, in a format usable by researchers who are not experts in big data retrieval and processing. There is a gap between the separate aspects of what multimedia researchers have shown to be possible with consumer-produced big data and the follow-through of creating a comprehensive field study framework that supports scientists across other disciplines.


There are two possible roles.

For the first role, the undergraduate has to apply with a specific and convincing idea of a field study with the dataset and will implement it under the guidance of the team. The outcome is the experience with improved practical experiance with machine learning on a larger scale and the contact to another research disciplin as chosen by the student.

The second role is more directly connected to the framework and the development of a novel frontend for intuitive data search and data cleaning. The undergraduate will start with an existing Solr search backend and make it accessible for Multimedia Big Data Field Studies by an easy to use webpage. Experience will be gained in accessing a Solr search backend and working with the YFCC100M.

Day-to-day supervisor for this project: Dr. Mario Michael Krell, Post-Doc

Qualifications: - All new applicants are required to be seniors. - Major is preferably computer science, statistics or any other science connected to large scale data analysis. - The applicants are required to have attended the basic machine learning course. It is desirable but not essential to have visited one advanced course in machine learning. - For the first role, the undergraduate needs to apply with an idea of a field study she/he wants to implement. - The second role will be filled by only one applicant who is required to have experience in search, databases, machine learning, and frontend development. - Applicants without a clear reference in their application to this project (Multimedia Big Data Field Studies) or not commenting on the required qualification will not be considered. - GPA of at least 3.5 - Required: Python, Linux, Machine Learning, Intro to Artificial Intelligence

Weekly Hours: more than 12 hrs

Off-Campus Research Site: ICSI (1947 Center Street)

Related website: http://mmcommons.org/