Seminars and Events
Volgenau School of Engineering: Spring Graduate Student Orientation
Wednesday, January 16, 2013
Oral Defense of Doctoral Dissertation: Delay-Based Methods for Robust Geolocation of Internet Hosts
Tuesday, January 22, 2013
Accurate geolocation of IP addresses, is increasingly important in many applications, such as targeted delivery of localized content over Internet, prevention of Internet, detection and prevention of cyberattacks and cyberterrorism, etc. The current geolocation algorithms can be divided into several classes according to the data that is used for determining the geographic location.
Use of round trip delay measurements for geolocation has proved not very reliable in the past, because of the non-linear correlation between distances and delays generated by the network congestion, queuing delay and circuitous routes. This thesis contributes to the advancement of two classes of delay-based geolocation methods. The first contribution is a family of pure delay-based algorithms based on a general class of proximity measures. When such measures are carefully chosen to discard the data which contains little information about the geographical location of a target IP address, the resulting algorithms have improved accuracy over the existing pure-delay based schemes.
The second contribution is the development of a statistical geolocation scheme based on the application of kernel density estimation to delay measurements amongst a set of landmarks. An estimate of the target IP location is then obtained by maximizing the likelihood of the distances from the target to the landmarks, given the measured delays. This is achieved by an algorithm which combines gradient ascent and force-directed methods. We compare the proposed geolocation schemes with the previous methods by developing a measurement framework using thePlanetLab infrastructure. Our experimental results show that the proposed geolocation algorithms have superior accuracy to the prior art.
Oral Defense of Doctoral Dissertation: Detecting Polymorphic and Mutated Malicious Access in Online Ad Serving Systems
Thursday, January 31, 2013
With the emergence of full-scale electronic commerce, the World Wide Web has been quickly and aggressively realized as an effective advertising medium. Indeed, advertising itself has become an important commodity on the web. The current model allows unethical and dishonest intermediaries to defraud the system by simulating fake traffic to increase revenue.
The development of intelligent data analysis methods for fraud detection can be well motivated from an economic point of view. Additionally, as the number of fraud cases increases, the reputation of Ad Networks suffers. This dissertation focuses on two aspects of the malicious access problem largely unexplored by the research community. These include heuristic analysis and the complexity of detecting polymorphic and mutated fraudulent traffic patterns, as well as the close world assumption generally associated with such problems, where some identities of a given malicious hit are necessarily represented in the training set.
This research investigates the application of novel machine learning techniques to the detection of heterogeneous and complex malicious traffic through a novel heuristic analysis approach aimed at detecting the underlying patterns of malicious hits within a given Click-Stream span.
This thesis proposes a heuristic-based feature selection and classification technique using Artificial Neural Networks (ANN) to detect polymorphic and mutated traffic inflation attacks. A heuristic ANN based feature selector has been investigated. The novelty of the proposed feature selection approach is that it integrates the capability of the wrapper approach to find better feature subsets by combining the filter’s ranking score with the wrapper-heuristic’s score to take advantage of both filter and wrapper heuristics. In addition, hybrid training algorithms for ANN have also been developed to uncover indicators of fraudulent patterns using heuristic analysis and previous conversion data elements. The indicators are used to create a set of patterns which profile legitimate traffic and indicate outliers and anomalies.
Experiments on real life fraud data demonstrate the advantage of the proposed approaches over some of the most frequently used detection techniques. This advantage is demonstrated by a significant decrease in false positive and general stability of the results.
CS Seminar: Dynamically Heterogeneous Cores Through 3D Resource Pooling
Monday, February 11, 2013
3D die stacking is a recent technological development which makes it possible to create chip multiprocessors using multiple layers of active silicon bonded with low latency, high-bandwidth, and very dense vertical interconnects. 3D die stacking technology provides very fast communication, as low as a few picoseconds, between processing elements residing on different layers of the chip. The rapid communication network in a 3D stack design, along with the expanded geometry, provides an opportunity to dynamically share on-chip resources among different cores. This research describes an architecture for a dynamically heterogeneous processor architecture leveraging 3D stacking technology. Unlike prior work in the 2D plane, the extra dimension makes it possible to share resources at a fine granularity between vertically stacked cores. As a result, each core can grow or shrink resources, as needed by the code running on the core. This architecture, therefore, enables runtime customization of cores at a fine granularity and enables efficient execution at both high and low levels of thread parallelism. This architecture achieves performance gains of up to 2X, depending on the number of executing threads, and gains significant advantage in energy efficiency.
Speaker's BioHouman Homayoun is an Assistant Professor of the Department of Electrical and Computer Engineering at George Mason University. He also holds a Courtesy appointment with the Department of Computer Science.Prior to joining George Mason University, He spent two years at the University of California, San Diego, as National Science Foundation Computing Innovation (CI) Fellow awarded by the Computing Research Association (CRA) and the Computing Community Consortium (CCC). Houman's research is on power-temperature and reliability-aware memory and processor design optimizations and spans the areas of computer architecture, embedded systems, circuit design, and VLSI-CAD, where he has published more than 30 technical papers on the subject, including some of the earliest work in the field to address the importance of cross-layer power and temperature optimization in memory peripheral circuits. He is currently leading a number of research projects, including the design of next generation 3D heterogeneous multicores, low power hybrid SRAM-NVM memory hierarchy design, reliability-aware cache design, and power management in data centers. Houman was a recipient of the four-year University of California, Irvine Computer Science Department chair fellowship. He received his PhD degree from the Department of Computer Science at the University of California, Irvine in 2010, an MS degree in computer engineering in 2005 from University of Victoria, Canada and his BS degree in electrical engineering in 2003 from Sharif University of Technology.
GRAND Seminar: Heterogeneous Face Recognition
Tuesday, February 12, 2013
A chief benefit of face recognition technology is the extensive collection of face photographs available to populate target galleries. From sources such as driver's licenses, passports, and mug shots, a (generally) high quality gallery seed exists for a large percentage of the developed world's population. While these gallery images are visible light photographs, many face recognition scenarios exist where probe images used to be match against such galleries are only available from some alternate imaging modality. For example, in environments with adverse illumination conditions (such as nighttime), face images must be captured in the infrared spectrum. In other cases, a lack of a face image requires the use of a forensic sketch to depict a subject. The task of matching face images across image modalities is called heterogeneous face recognition. In this talk, the problem of heterogeneous face recognition will be introduced. Different approaches for performing heterogeneous face recognition will be introduced, including methods for (i) directly measuring the similarity between heterogeneous face images, and (ii) using prototype similarities to performing matching without needing a direct comparison. The follow research topics will also be discussed: (i) the effect of demographics (race, gender, and age) on face recognition performance, (ii) studies on training face recognition system for time lapse invariance, and (iii) designing facial features and matching algorithms for matching caricature sketches to photographs.
Speaker's BioBrendan Klare is a scientist at Noblis. He received the Ph.D. degree in Computer Science from Michigan State University in 2012, and received the B.S. and M.S. degrees in Computer Science and Engineering from the University of South Florida in 2007 and 2008. From 2001 to 2005 he served as an airborne ranger infantryman in the 75th Ranger Regiment, U.S. Army. Brendan has authored several papers on the topic of face recognition, and was the recipient of the Honeywell Best Student Paper Award at the 2010 IEEE Conference on Biometrics: Theory, Applications and Systems (BTAS). His other research interests include pattern recognition and computer vision.
SWE Seminar: Applying Empirical Software Engineering to Software Architecture: Challenges and Lessons Learned
Thursday, February 14, 2013
Software architecture community has developed numerous methods, techniques, and tools to support the architecture process (analysis, design, and review). Historically, most advances in software architecture have been driven by talented people and industrial experience, but there is now a growing need to systematically gather empirical evidence about the advantages or otherwise of tools and methods rather than just rely on promotional anecdotes or rhetoric.
The aim of this presentation is to promote and facilitate the application of the empirical paradigm to software architecture. To this end, Davide Falessi will describe several challenges and lessons learned when assessing software architecture research that used controlled experiments, replications, expert opinion, systematic literature reviews, observational studies, and surveys. Before presenting scientific details, Forrest Shull will introduce the Fraunhofer Institute of Experimental Software Engineering, a not-for-profit applied research and technology transfer organization.
Speaker's BioDr. Davide Falessi joined the Fraunhofer Center for Experimental Software Engineering in Maryland (CESE) in 2012 as a Research Scientist in the Measurement and Knowledge Management Division. He currently serves as a program committee member in several international conferences including ESEM, WICSA, ICSR, SEKE, PROFES, EASE, and MTD. His main research interest is in devising and empirically assessing scalable solutions for the development of complex software-intensive systems with a particular emphasis on architecture, requirements, and quality. He received the PhD and the “Laurea” degrees in Computer Engineering from the University of Rome “TorVergata”.
Dr. Forrest Shull is a senior scientist at the Fraunhofer Center for Experimental Software Engineering in Maryland (CESE), a nonprofit research and tech transfer organization, where he leads the Measurement and Knowledge Management Division. At Fraunhofer CESE, he has been a lead researcher on projects for NASA's Office of Safety and Mission Assurance, the NASA Safety Center, the U.S. Department of Defense, the National Science Foundation, the Defense Advanced Research Projects Agency (DARPA), and companies such as Motorola and Fujitsu Labs of America. He is the Editor-in-Chief of IEEE Software.
CS Distinguished Lecture Series: “A Walk in the Dark: Random Walks and Network Discovery”
Friday, February 15, 2013
We rely on a wide variety of digital networks in our daily lives, to provide a rich set of services in commerce, government, communications, and to feel connected. These networks include the Internet, the World Wide Web, on-line social networks such as Facebook, Twitter, etc., and cellular networks. They are large, complex, very richly structured, and constantly changing over time. Moreover, because of their size and complexity, very little is known about them.
In this talk we focus on the problem of how to discover the structure of these networks. Traditional methods for network discovery and exploration include crawling the network using breadth first search (BFS). We will show, however, that such methods introduce significant biases unless almost all of the network is crawled. Instead, we focus on random walks as a method for exploring the network. We will show that random walks can be used to solve a number of network discovery tasks including characterizing degree distributions, identifying important nodes, finding short paths, locating content, etc., while exploring only a very small portion of network. Last, we do this in the context of networks whose underlying graphs are either directed or undirected.
Speaker's BioProfessor Towsley's research spans a wide range of activities from stochastic analyses of queueing models of computer and telecommunications to the design and conduct of measurement studies. He has performed some of the pioneering work on the exact and approximate analyses of parallel/distributed applications and architectures. More recently, he pioneered the area of network tomography and the use of fluid models for large networks. He has published extensively, with over 150 articles in leading journals.
PhD Comptuer Science, University of Texas (1975), BA Physics, University of Texas (1971). Professor Towsley first joined the faculty at the University of Massachuseets in the Department of Electrical and Computer Engineering in 1976 and moved to the Department of Computer Science in 1986. He was named University Distinguished Professor of Computer Science in 1998. Professsor Towsley was a Visiting Scientist at the IBM T.J. Watson Research Center, (1982-83, 2003), INRIA and AT&T Labs - Research (1996-97), and Cambridge Microsoft Research Lab (2004); a Visiting Professor at the Laboratoire MASI, Paris, (1989-90).
Computer Science Colloquium: Improving Text Retrieval Applications in Software Engineering: A Case on Concept Location
Tuesday, February 26, 2013
The source code of large scale, long lived software systems is difficult to change by developers. Finding a place in the code where to start implementing a change, also known as concept location, can be particularly challenging. Recent approaches based on Text Retrieval leverage the textual information found in the identifiers and comments found in source code in order to guide developers during this task. In addition, text retrieval has been used to address many other software engineering tasks, but wide adoption in industry and education is still ahead of us. Among the factors that deter broader adoption, researchers observed the difficulties of developers to formulate good queries in unfamiliar software and the quality of identifiers present in software. In order to support developers for writing better queries we propose two query reformulation techniques. One is based on feedback provided by the developers, whereas the other one is completely automated and employs machine learning techniques to learn from past user queries. Empirical validation shows that the queries reformulated using our approaches lead to better results in concept location, compared to the original queries and to previous techniques. In order to improve the quality of identifiers in source code we define a catalog of the most common identifier problems, called lexicon bad smells, and propose a series of refactoring operations in order to correct them. We show that the refactored identifiers lead to an improvement in the results of Text Retrieval-based concept location. In addition, we also investigated how to present to developers the information retrieved from source code during concept location. We studied the use of automated text summarization techniques to
Speaker's BiographySonia Haiduc is a PhD candidate at Wayne State University, in Detroit, MI, USA, where she performs research in software engineering. Her research interests include software maintenance and evolution, program comprehension, and source code search. Her work has been published in several highly selective software engineering venues. She has also been awarded the Google Anita Borg Memorial Scholarship for her research and leadership.
Computer Science Colloquium: Solving the Search for Source Code
Thursday, February 28, 2013
Programmers frequently use keyword searches to find source code in large repositories. However, to do this effectively, programmers must specify keyword queries that capture implementation details of their desired code. I propose that code search should be about behavior, not about keywords.
In this talk, I will present an approach to code search that allows programmers to provide inputs and outputs that define the behavior of their desired code. This approach indexes source code repositories by symbolically analyzing the programs and program fragments and transforming them into constraints representing their behavior. Results are identified using an SMT solver, which, given an input/output specification and the constraint representation of a program fragment, determines if the fragment matches the desired behavior. While promoting code reuse, my approach enables reuse where it was not possible before: the constraints can be relaxed, identifying code that approximately matches the specification. Further, the solver can then guide the instantiation of the code to produce the desired behavior. I will illustrate the generality of the approach by showing its instantiation in subsets of three languages, the Java String library, Yahoo! Pipes mashups, and SQL select statements. I will conclude by sharing my vision for new research directions related to this semantic approach to code search.
Speakers BioKathryn Stolee is a Ph.D. candidate and NSF Graduate Research Fellow in the Department of Computer Science and Engineering at the University of Nebraska-Lincoln. She has been awarded an ESEM Distinguished Paper Award and two departmental outstanding research awards. Her research is in software engineering with a focus on program analysis. Extracting useful information from software artifact repositories is a broad theme of her research, most recently through semantic code search.
SANG Seminar: A Comparative Study of Android and iOS for Accessing Internet Streaming Services
Friday, March 01, 2013
Android and iOS devices are leading the mobile device market. While various user experiences have been reported from the general user community about their differences, such as battery lifetime, display, and touchpad control, few in-depth reports can be found about their comparative performance when receiving the increasingly popular Internet streaming services.
Today, video traffic starts to dominate the Internet mobile data traffic. In this talk, focusing on Internet streaming accesses, we set to analyze and compare the performance when Android and iOS devices are accessing Internet streaming services. Starting from the analysis of a server-side workload collected from a top mobile streaming service provider, we find Android and iOS use different approaches to request media content, leading to different amounts of received traffic on Android and iOS devices when a same video clip is accessed. Further studies on the client side show that different data requesting approaches(standard HTTP request vs. HTTP range request) and different buffer management methods (static vs. dynamic) are used in Android and iOS mediaplayers, and their interplay has led to our observations.
Speaker's BioYao Liu is a Ph.D. student of Computer Science Department at George Mason University.
GRAND Seminar: Discovery of Novel Patterns in Massive Time Series Data
Tuesday, March 05, 2013
Massive amounts of data are generated daily at a rapid rate. As a result, the world is faced with unprecedented challenges and opportunities on managing the ever-growing data, and much of the world's supply of data is in the form of time series. One obvious problem of handling time series databases concerns with its typically massive size—gigabytes or even terabytes are common, with more and more databases reaching the petabyte scale. Most classic data mining algorithms do not perform or scale well on time series data due to their unique structure. In particular, the high dimensionality, very high feature correlation, and the typically large amount of noise that characterize time series data present a difficult challenge. As a result, time series data mining has attracted an enormous amount of attention in the past two decades.
This presentation gives an overview of my contributions in the field of time series data mining. The first part of the presentation discusses time series data mining fundamentals - more specifically, the two aspects that hugely determine the efficiency and effectiveness of most time series data mining algorithms: data representation and similarity measure. The second part of the presentation will focus on the discovery of novel and non-trivial patterns in time series data, including frequently encountered (or repeated) patterns, rare (or anomalous) patterns, or latent structure.
Speaker's BioDr. Jessica Lin is an Associate Professor in the Department of Computer Science at George Mason University. She received her PhD degree from University of California, Riverside in June, 2005. Her research interests encompass broad areas of data mining, especially data mining for large temporal and spatiotemporal databases, text, and images. Over the years, she has collaborated with researchers from various domains including medicine, earth sciences, manufacturing, national defense, and astronomy. Her research is partially funded by NSF, US Army and Intel Corporation.
Empirical Evaluation of the Statement Deletion Mutation Operator
Thursday, March 07, 2013
Mutation analysis is widely considered to be an exceptionally effective criterion for designing tests. It is also widely considered to be expensive in terms of the number of test requirements and in the amount of execution needed to create a good test suite. This paper posits that simply deleting statements, implemented with the statement deletion (SDL) mutation operators in Mothra, is enough to get very good tests. A version of the SDL operator for Java was designed and implemented inside the muJava mutation system. The SDL operator was applied to 40 separate Java classes, tests were designed to kill the non-equivalent SDL mutants, and then run against all mutants.
Speaker's BioLin Deng is a first year PhD student at George Mason University. He received the BS degree in Computer Science from Renmin University of China, and the MS degree in Computer and Information Science from Gannon University. He worked as a computer engineer at State Intellectual Property Office of China for four years before joining the MS program of Gannon University. His research interests are in software testing and security.
CS Distinguished Lecture Series: Automatic Annotation of Protein Function
Wednesday, March 20, 2013
Speaker's BioLydia E. Kavraki is the Noah Harding Professor of Computer Science and Bioengineering at Rice University. She also holds an appointment at the Department of Structural and Computational Biology and Molecular Biophysics at the Baylor College of Medicine in Houston. Kavraki received her B.A. in Computer Science from the University of Crete in Greece and her Ph.D. in Computer Science from Stanford University working with Jean-Claude Latombe. Her research contributions are in physical algorithms and their applications in robotics (robot motion planning, hybrid systems, formal methods in robotics, assembly planning, micromanipulation, and flexible object manipulation), as well as in computational structural biology, translational bioinformatics, and biomedical informatics (modeling of proteins and biomolecular interactions, large-scale functional annotation of proteins, computer-assisted drug design, and systems biology). Kavraki has authored more than 180 peer-reviewed journal and conference publications and is one of the authors of a robotics textbook titled "Principles of Robot Motion" published by MIT Press. She is heavily involved in the development of The Open Motion Planning Library (OMPL), which is used in industry and in academic research in robotics and bioinformatics. Kavraki is currently on the editorial board of the International Journal of Robotics Research, the ACM/IEEE Transactions on Computational Biology and Bioinformatics, the Computer Science Review, and Big Data. She is also a member of the editorial advisory board of the Springer Tracts in Advanced Robotics. Kavraki is the recipient of the Association for Computing Machinery (ACM) Grace Murray Hopper Award for her technical contributions. She has also received an NSF CAREER award, a Sloan Fellowship, the Early Academic Career Award from the IEEE Society on Robotics and Automation, a recognition as a top young investigator from the MIT Technology Review Magazine, and the Duncan Award for excellence in research and teaching from Rice University. Kavraki is a Fellow of the Association of Computing Machinery (ACM), a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), a Fellow of the American Institute for Medical and Biological Engineering (AIMBE), a Fellow of the American Association for the Advancement of Science (AAAS), and a Fellow of the World Technology Network (WTN). Kavraki was elected a member of the Institute of Medicine (IOM) of the National Academies in 2012. She is also a member of the Academy of Medicine, Engineering and Science of Texas (TAMEST) since 2012. Current projects at Kavraki's laboratory are described under http://www.kavrakilab.org and http://www.cs.rice.edu/~kavraki.
Oral Defense of Doctoral Dissertation: Decision Guidance for Sustainable Manufacturing
Thursday, March 21, 2013
Sustainable manufacturing has significant impacts on a company’s business performance and competitiveness in today’s world. A growing number of manufacturing industries are initiating efforts to address sustainability issues; however, to achieve a higher level of sustainability, manufacturers need methodologies for formally describing, analyzing, evaluating, and optimizing sustainability performance metrics for manufacturing processes and systems. Currently, such methodologies are missing.
This dissertation developed the Sustainable Process Description and Analytics (SPDA) formalism and a systematic decision guidance methodology to fill the research gaps. The methodology provides step-by-step guidance for sustainability performance analysis and decision optimization using the SPDA formalism. The SPDA formalism provides unified syntax and semantics for querying, what-if analysis, and decision optimization; is modular, extensible, and reusable; supports built-in process and sustainability metrics modeling that enable users using data from production, energy management, life cycle assessment reference for modeling and analysis; is easy to use by manufacturing and business users; and also provides a reduction procedure that enables the translations of the SPDA query into specialized models such as optimization or simulation model for decision guidance. Two real world sustainable manufacturing case studies have been performed to demonstrate the use of formalism and the methodology.
CS Colloquium-NEW TIME: Automatic Program Repair Using Genetic Programming
Monday, March 25, 2013
"Everyday, almost 300 bugs appear...far too many for only the Mozilla programmers to handle" --Mozilla developer, 2005 Software quality is a pernicious problem. Although 40 years of software engineering research has provided developers considerable debugging support, actual bug repair remains a predominantly manual, and thus expensive and time-consuming, process. I will describe GenProg, a technique that uses evolutionary computation to automatically fix software bugs. My empirical evidence demonstrates that GenProg can quickly and cheaply fix a large proportion of real-world bugs in open-source C programs. I will also briefly discuss the atypical evolutionary search space of the automatic program repair problem, and the ways it has challenged assumptions about software defects.
Speaker's BioClaire Le Goues is a Ph.D. candidate in Computer Science at the University of Virginia. Her research interests lie in the intersection of software engineering and programming languages, with a particular focus on software quality and automated error repair. Her work on automatic program repair has been recognized with Gold and Bronze designations at the 2009 and 2012 ACM SIGEVO "Humies" awards for Human-Competitive Results Produced by Genetic and Evolutionary Computation and several distinguished and featured paper awards.
CS Colloquium: Searching for Relevant Functions and Their Usages in Millions of Lines of Code
Thursday, March 28, 2013
Different studies show that programmers are more interested in finding definitions of functions and their uses than variables, statements, or ordinary code fragments. Therefore, developers require support in finding relevant functions and determining how those functions are used. Unfortunately, existing code search engines do not provide enough of this support to developers, thus reducing the effectiveness of code reuse. We provide this support to programmers in a code search system called Portfolio that retrieves and visualizes relevant functions and their usages. We have built Portfolio using a combination of models that address surfing behavior of programmers and sharing related concepts among functions. Currently, Portfolio is instantiated on two large source code repositories with thousands of projects spanning 270 Million C/C++ and 440 Million Java lines of code. In order to evaluate Portfolio, we conducted two experiments. First, an experiment with 49 professional C/C++ programmers to compare Portfolio to Google Code Search and Koders using a standard methodology for evaluating Information Retrieval-based engines. And second, an experiment with 19 Java programmers to compare Portfolio to Koders. The results show with strong statistical significance that users find more relevant functions with higher precision with Portfolio than with Google Code Search and Koders. We also demonstrate that by using PageRank, Portfolio is able to rank returned relevant functions more efficiently.
Speaker's BioDr. Denys Poshyvanyk is an Assistant Professor at the College of William and Mary in Virginia. He received his Ph.D. degree in Computer Science from Wayne State University in 2008. He also obtained his M.S. and M.A. degrees in Computer Science from the National University of Kyiv-Mohyla Academy, Ukraine and Wayne State University in 2003 and 2006, respectively. Since 2010, he has been serving on the steering committee of the International Conference on Program Comprehension (ICPC). He has been elected as a chair of the ICPC steering committee in 2012. He serves as a program co-chair for 21st IEEE International Conference on Program Comprehension (ICPC'13). He also served as a program co-chair for the 18th and 19th International Working Conference on Reverse Engineering (WCRE 2011 and WCRE 2012). Dr. Poshyvanyk received NSF CAREER award and several Best Paper Awards, including ICPC’06, ICPC'07, ICSM’10, and SCAM’10. His research interests are in software engineering, software maintenance and evolution, program comprehension, reverse engineering, software repository mining, source code analysis, and metrics.
GRAND Seminar: Multi-target Tracking by Rank-1 Tensor Approximation
Friday, March 29, 2013
Multi-target tracking (MTT) is an important problem in computer vision and has many applications. We introduce a novel framework for MTT using the rank-1 tensor approximation and propose an L1 norm tensor power iteration solution. In particular, a high order tensor is constructed based on trajectories in the time window, with each tensor element as the affinity of the corresponding trajectory. The assignment variables are the L1 normalized vectors, which are used to approximate the rank-1 tensor. Our approach provides a flexible and effective formulation where both pairwise and high-order association energy can be used expediently. We also show the close relation between our formulation and the multi-dimensional assignment (MDA) model. To solve the optimization in the rank-1 tensor approximation, we propose an algorithm that iteratively powers the intermediate solution followed by an L1 tensor normalization. Aside from effectively capturing high-order motion information, the proposed solver runs efficiently with proved convergence. The experimental validations are conducted on two challenging datasets and our method demonstrates promising performances on both of them.
Speaker's BioHaibin Ling received the B.S. degree in mathematics and the MS degree in computer science from Peking University, China, in 1997 and 2000, respectively, and the PhD degree from the University of Maryland, College Park, in Computer Science in 2006. From 2000 to 2001, he was an assistant researcher at Microsoft Research Asia, Beijing, China. From 2006 to 2007, he worked as a postdoctoral scientist at the University of California Los Angeles. After that, he joined Siemens Corporate Research, Princeton, NJ, as a research scientist. Since fall 2008, he has been an Assistant Professor at Temple University. Dr. Ling's research interests include computer vision, medical image analysis, human computer interaction, and machine learning. He received the Best Student Paper Award at the ACM Symposium on User Interface Software and Technology (UIST) in 2003.
Inter-Disciplinary Computing Seminar: Using GIS to Optimize Liver Transplant Locations in the US and Other Health GIS Projects
Wednesday, April 17, 2013
The first part of the talk will discuss the main topics covered by health GIS research. This will be illustrated with case studies from my own research. The second half will describe my work with Naoru Koizumi and others at GMU on optimizing the geography of liver transplantation in the US. Specifically this research discusses geographic disparities in access to and outcomes in transplantation that have been a persistent widely discussed problem among transplant researchers and members of the transplant community. One of the alleged causes of these disparities in the United States is the administratively determined organ allocation boundaries that limit organ sharing across regions. This talk will describe the work of our research team in applying mathematical programming models to construct alternative liver allocation boundaries that achieve more geographic equity in access to transplants than the current system. The performance of the optimal boundaries was evaluated and compared to that of current allocation system using discrete event simulation.
CS Distinguished Lecture Series: Machine Learning Approaches to Network and Social Media
Friday, April 19, 2013
Across the sciences, a fundamental setting for representing and interpreting information about entities, the structure and organization of communities, and changes in these over time, is a stochastic network that is topologically rewiring and semantically evolving over time and space. While there is a rich literature in modeling invariant networks, until recently, little has been done toward modeling the dynamic processes underlying rewiring networks, and on recovering such networks when they are not observable.
In this talk, I will present some recent developments in analyzing what we refer to as the dynamic tomography of evolving networks. I will present new algorithms for estimating the topological structures of latent evolving networks underlying nonstationary time-series of nodal attributes, along with theoretical results on the asymptotic sparsistency of the proposed methods; and I will present a family of new Bayesian model and scalable inference algorithms for estimating the trajectories of latent multi-functionality of nodal states, and for learning community structures, in both static and evolving social and biological networks. Case studies of a breast cancer network, and the Facebook network will be presented to highlight current capability and future challenge.
Speaker's BioDr. Eric Xing is an associate professor in the School of Computer Science at Carnegie Mellon University. His principal research interests lie in the development of machine learning and statistical methodology; especially for solving problems involving automated learning, reasoning, and decision-making in high-dimensional and dynamic possible worlds; and for building quantitative models and predictive understandings of biological systems. Professor Xing received a Ph.D. in Molecular Biology from Rutgers University, and another Ph.D. in Computer Science from UC Berkeley.
His current work involves, 1) foundations of statistical learning, including theory and algorithms for estimating time/space varying-coefficient models, sparse structured input/output models, and nonparametric Bayesian models; 2) computational and statistical analysis of gene regulation, genetic variation, and disease associations; and 3) application of statistical learning in social networks, data mining, vision. Professor Xing has published over 150 peer-reviewed papers, and is an associate editor of the Journal of the American Statistical Association, Annals of Applied Statistics, the IEEE Transactions of Pattern Analysis and Machine Intelligence, the PLoS Journal of Computational Biology, and an Action Editor of the Machine Learning journal. He is a recipient of the NSF Career Award, the Alfred P. Sloan Research Fellowship in Computer Science, the United States Air Force Young Investigator Award, and the IBM Open Collaborative Research Faculty Award.
Inter-Disciplinary Computing Seminar: Motor Primitives with Spindle-like Properties Predict the Ability to Learn Different Types of Motor Adaptations
Wednesday, April 24, 2013
The mammalian brain generates motor commands to initiate movement. Through interactions with our environment this motor output is adapted in order to reduce the error between the planned and actual movement. One readily learned motor adaptation involves the predictive compensation for changes in the physical dynamics of the environment. These dynamics result in time-varying physical perturbations that are a function of the motion state, such as velocity. Early in compensating for velocity-dependent perturbations, subjects exhibit a force profile with a shape resembling a muscle spindle firing pattern (a transient, velocity-dependent peak followed by a static, position-dependent offset). As adaptation progresses, this force profile transforms into one that is almost entirely velocity-dependent (task goal specific), as the static ‘tail’ fades and the transient peak increases. Analysis of the initial and final force profiles suggests that these high-dimensional force-profiles can be represented as simple combinations of position and velocity signals linearly modified with stiffness and viscosity gains. We therefore constructed a simple neural network model with spindle-like primitives (having positive position and velocity dependence) and a simple gradient descent learning rule to test whether such a model would show primitive-dependent early learning but task-dependent late learning. This model accurately predicts that (1) the direction of initial learning is generally biased towards the center of the primitive distribution (2) the asymptotic learning state mostly reflects the task goal and (3) the adaptation rate for learning goals in the directions parallel or orthogonal to the center of the primitive distribution is enhanced or slowed, respectively, compared to other directions. These results suggest that adaptation to predictable force perturbations takes place in a spindle-like coordinate system that dictates the degree of difficulty for learning different types of perturbations and predicts the evolution of adaptive responses.
Speaker's BioDr. Wilsaan Joiner is an Assistant Professor in the Department of Bioengineering. He received his PhD in Biomedical Engineering from the Johns Hopkins University School of Medicine in 2007. From 2007-2012, he was a postdoctoral fellow at Harvard University and the National Eye Institute (The Laboratory of Sensorimotor Research). Dr. Joiner’s Sensorimotor Integration Laboratory at the Volgenau School of Engineering conducts translational research investigating human sensory integration, motor learning and control using computational and experimental approaches. Ongoing projects include the influence of eye movements and internal monitoring signals in guiding goal-directed movements and the neural processes underlying motor adaptation and memory consolidation.
Oral Defense of Doctoral Dissertation: Towards Power-Efficient Internet Streaming to Mobile Devices
Wednesday, April 24, 2013
Internet streaming services are very popular today. As a fact, video traffic now accounts for more than 51 percent total Internet traffic. With the pervasive adoption of various mobile devices in practice in the past several years, today Internet streaming services are receiving a rapidly growing number of requests from various mobile devices. As a result, more than 50 percent of the data consumed by mobile devices is video streaming traffic. However, streaming delivery to mobile devices is more challenging than to its desktop counterpart.
In this dissertation, we first empirically investigate Internet mobile streaming practices. We investigate mobile streaming from various perspectives, including hardware and software heterogeneity, different characteristics of mobile videos, and different user access patterns. The results provide us in-depth understanding on the current Internet mobile streaming services. A critical constraint on mobile devices for receiving Internet streaming services is that they have limited battery capacities. Among different power-consuming sources, transmission power consumption is very significant: for a mobile device receiving streaming services, about 30 percent to 40 percent of the power is consumed by the WNIC for streaming data transmission. So in order to prolong the battery lifetime, it is important to save the battery power consumed by the WNIC. For this purpose, we design and implement new schemes that can effectively save battery power consumption while maintaining good streaming quality to mobile devices. In particular, we focus on P2P streaming and client-server streaming as they are widely used. We aim to save battery power consumption from two aspects: (1) how the data is received, and (2) how much data is received. Our techniques have been implemented and experimental results show they are effective in reducing the battery power consumption on mobile devices without degrading the streaming quality.
Oral Defense of Doctoral Dissertation: A Cost-Effective Distributed Architecture for Content Delivery and Exchange Over Emerging Wireless Technologies
Thursday, April 25, 2013
Opportunities in education are lacking in many parts of the developed nations and are missing in most parts of the developing nations. This is, in significant part, due to shortages of classroom instructional resources such as quality teaching staff, hardware and software. Distance education (DE) has proved to be a successful teaching approach and overcomes some of the barriers imposed by classroom instruction, primarily due to the shortage of teachers. Many DE software tools have been developed and are in use today. Most require high network capacity, not supported by common long distance wireless network infrastructures in many places of the world. To address obstacles related to network infrastructures of developing countries for content delivery and exchange, this research develops the design of a cost-effective distributed architecture for content delivery and exchange over emerging limited capacity wireless technologies. The design of the proposed target architecture includes an overlay network with distributed peer-to-peer systems. Simulation is used to explore parameters and metrics in order to validate the effectiveness and scalability of this architecture. An n-tier hierarchical training model to train local instructors by experienced instructors, using DE resources supported by this architecture, is discussed as a way to mitigate the teacher shortages in developing countries. The situation in Bangladesh is used to provide examples, based on the author’s familiarity with it.
Oral Defense of Doctoral Dissertation: Data Mining Framework for Metagenome Analysis
Tuesday, April 30, 2013
Advances in biotechnology have dramatically changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments. The process of “metagenomics” involves sequencing of the genetic material of organisms, co-existing within ecosystems ranging from ocean, soil and human body. Researchers are trying to determine the collective microbial community or population of microbes that coexist across different environmental and clinical samples. Several researchers and clinicians have embarked on studying the pathogenic role played by the microbiome (i.e., the collection of microbial organisms within the human body) with respect to human health and disease conditions.
There is a critical need to develop new methods that can analyze metagenomes and correlate heterogeneous microbiome data to clinical metadata. Lack of such methods is an impediment for the identification of the function and presence of microbial organism within different samples, reducing our ability to elucidate the microbial-host interactions and discover novel therapeutics. From another perspective, comparing metagenomes across different ecological samples allows for the characterization of biodiversity across the planet. The goals of this dissertation are to develop novel data mining algorithms that allow for the accurate and efficient analysis of metagenome data obtained from different environments.
Specific contributions have included the development of a suite of clustering algorithms for handling large-scale targeted and whole metagenome sequences. We develop a novel locality sensitive hashing (LSH) based method for clustering metagenome sequence reads. Our method achieves efficiency by approximating the pairwise sequence comparison operations by using a randomized hashing technique. We incorporate this clustering approach within a computational pipeline (LSH-Div) to estimate the species diversity within an ecological sample. We also developed an algorithm called MC-MinH that uses the min-wise hashing approach, along with a greedy clustering algorithm to group 16S and whole metagenome sequences. We represent unequal length sequences using contiguous subsequences or k-mers, and then approximate the computation of pairwise similarity using independent min-wise hashing. Further, MC-MinH was extended as a distributed algorithm implemented within the Map-Reduce based Hadoop platform. The distributed clustering algorithm can perform a greedy iterative clustering as well as an agglomerative hierarchical clustering and can handle large volumes of input sequences.
We also developed a novel sequence composition-based taxonomic classifier using extreme learning machines referred to as TAC-ELM. This algorithm uses the framework of extreme learning machines to quickly and accurately learn the weights for a neural network model. TAC-ELM when combined with BLAST (Basic Local Alignment Search Tool) has shown improved taxonomy classification results.
In order to make these developed computational tools accessible to a broad group of researchers, we have also developed a web-based analysis portal. The portal implements a LIMS database using the open source Drupal content management system to store and retrieve the multi-modal microbiome data. For analysis and development of workflows, we use the Galaxy platform. This provides a web-based user friendly platform for integrating the tools developed to create user-customized pipelines and a batch-based job submission system.
To summarize, this dissertation has contributions in the area of metagenome sequence clustering and classification which can be easily integrated within computational workflows for species diversity estimation and large-scale microbiome analysis.
Oral Defense of Doctoral Dissertation: An Autonomic Framework for Integrating Security and Quality of Service Support in Databases
Thursday, May 02, 2013
The back-end databases of multi-tiered applications are a major data security concern for enterprises. The abundance of these systems and the emergence of new and different threats require multiple and overlapping security mechanisms. Therefore, providing multiple and diverse database intrusion detection and prevention systems (IDPS) is a critical component of the defense-in-depth strategy for DB information systems. At the same time, an e-business application is expected to process requests with a certain service quality to maintain current customers and attract new ones. To meet both objectives it is necessary to use a combination of IDPSs that best meets the security and QoS requirements of the system stakeholders for each workload intensity level. Due to the dynamic variability of the workload intensity, it is not feasible for human beings to continuously reconfigure the system. It is therefore important that current systems be built with adaptive capabilities that can dynamically respond to changes in it is surroundings. This research presents a self-optimizing and self-protecting database system environment that captures dynamic and fine-grained tradeoffs between security and QoS by using a multi-objective utility function. The utility functions considers the performance impact of IDPSs on the overall system under a certain workload, the detection and false detection rates of the IDPSs, and high level stakeholder preferences and constraints. The model was validated in a simulated environment. The feasibility of the approach is also demonstrated in an experimental environment.
Oral Defense of Doctoral Dissertation: Emergency Communications via Handheld Devices
Thursday, May 02, 2013
Ensuring effective communications during emergencies is an important issue for any functional government. One way to address this issue is to ensure the availability of the emergency responders capable of making the appropriate decisions and taking timely actions with sufficient resources. Many XML-based languages such as the Emergency Data Exchange Language (EDXL) and associated Common Alert Protocol (CAP) have been de- signed to provide a basis for such communications.
To ensure that messages are delivered in a timely manner, I propose some role- and task-based ontological enhancements for these languages. I address this availability problem further by proposing a Role-based model Availability Emergency Responder Framework (AERF). This AERF ensures that a list of personnel for a particular role in an organization is always reachable to handle an emergency call. I develop a working prototype of the AERF framework for a local hospital that provides emergency cases. The prototype demonstrates the feasibility and security of the AERF framework and addresses the availability of emergency responders based on their assigned roles.
In order to inform the general public of nearby emergencies, the Department of Homeland Security initiated the Commercial Mobile Alert System (CMAS), which utilized existing commercial telecommunication infrastructures to broadcast emergency alert text messages to all mobile users in an area affected by an emergency. One of the limitations of the Cell Broadcast Service (CBS) is that the smallest area that CMAS can broadcast its message is a cell site, which is, in most cases, quite large for small-scale emergencies.
I propose an enhancement to CMAS by using CMAS as a transport protocol to distribute small-scale emergency alerts to areas that are smaller than a cell site. I also suggest a proper enhancement to the CAP 1.2 message structure for CMAS emergency alerts. Another limitation of CMAS messages is the maximum message size of 90 characters of clear text. I propose an enhancement to CMAS by using a combination of different encoding techniques and emergency protocol standard including the Common Alerting Protocol (CAP 1.2) to provide alert messages with meaningful and rich content. I show the viability of our solution using a prototype implementation that can generate and broadcast CMAS emergency alerts through Emergency Response Alert System (ERAlert) to Android phones where an Emergency Response Application (ERApp) will intercept, decode, and display meaningful alerts to users. Lastly, I propose a Navigation Assistance Framework (NAF) that allows emergency organizations to provide emergency information that can be filtered through the traffic patterns in order to assist victims navigate out of the emergency and reach their intended destinations in a reasonable amount of time. I develop a ERSimMon to simulate this capability in a small scale to show the effectiveness of my solution.
SWE Seminar: GuideArch: Guiding the Exploration of Architectural Solution Space Under Uncertainty
Thursday, May 02, 2013
A system's early architectural decisions impact its properties (e.g., scalability, dependability) as well as stakeholder concerns (e.g., cost, time to delivery). Choices made early on are both difficult and costly to change, and thus it is paramount that the engineer gets them "right". This leads to a paradox, as in early design, the engineer is often forced to make these decisions under uncertainty, i.e., not knowing the precise impact of those decisions on the various concerns. How could the engineer make the "right" choices in such circumstances? This is precisely the question we have tackled in this talk. We present GuideArch, a framework aimed at quantitative exploration of the architectural solution space under uncertainty. It provides techniques founded on fuzzy math that help the engineer with making informed decisions.
Speaker's BioNaeem Esfahani is a Ph.D. candidate in Computer Science Department, Volgenau School of Engineering. He got his B.Sc. degrees on Electrical and Computer Engineering from University of Tehran in 2005. He also received a M.Sc. degree in Computer Engineering from Sharif University of Technology in 2008. His current research mainly focuses on Software Architecture, Self-Adaptive Software Systems, and Software Quality of Service Analysis & Improvement.
Project Presentation for Engineer Degree: A TeC Implementation on the Android Platform
Friday, May 03, 2013
This thesis describes a software system developed based upon the Team Computing (TeC) model. TeC belongs to Ubiquitous computing, which emphasizes the coordination of teams comprised of software components, devices and human operation.
This particular system is implemented on the popular Android platform using Java. It allows the teams designed by end users to be deployed and operating on the Android devices. The system also implemented a decentralized, lightweight protocol for self- healing.
Volgenau School of Engineering: Convocation
Thursday, May 16, 2013
Event InfoIN CASE OF RAIN students should report directly to the Patriot Center floor, via the lower entrance on the "East" side of the Patriot Center (i.e. loading dock door), to be directed to their seats. There will be no procession in the event of rain.
Graduates traditionally assemble at 1:15 p.m. in the Patriot Center parking lot at your department's designated area (indicated by department signs). It is very important that you be at the appropriate location by 1:15 p.m. so we can line up all graduates by degree program for the processional.
Oral Defense of Doctoral Dissertation: An Extensible Framework for Generating Ontology from Various Data Models
Thursday, May 30, 2013
In the Information Technology field, Ontology is concerned with the use of formal representation to describe concepts and relationships in a domain of knowledge. Using ontologies, organizations can facilitate processes such as integrating heterogeneous systems, assessing data quality, validating business rules, and discovering hidden facts. Ontology engineering, however, is not a trivial process. Developing ontologies is highly dependent on the availability and knowledge of ontology modelers and domain experts. Moreover, the development process is often lengthy and error-prone. In this dissertation, I developed an extensible framework for generating ontologies from data models. For this dissertation, the framework is limited to generating ontology from two types of data models: the Relational Database (RDB) and Object-Relational Database (ORDB) models. The framework, however, is extensible to support the generation of ontologies from other types of data models (e.g. XML). The derived ontology is expressed in the OWL Web Ontology Language, a W3C recommendation.
The proposed framework has been validated by implementing it as a prototype, and by examining the ontologies it generates from a syntactic and semantic perspective. For the semantic examination, domain requirements were used to compute the recall and precision for the ontologies generated by my framework and that of a similar tool. Moreover, the relative amount of terminological content (which I call the relative explicitness) of these ontologies was measured as well using a methodology that I developed in my research. The results showed the ability of my framework to generate ontologies that are closely aligned with the domain.
SWE Seminar: Adaptive Program Repair via Program Equivalence: A Duality With Mutation Testing
Wednesday, June 05, 2013
Software bugs remain a compelling problem. Automated program repair is a promising approach for reducing cost, and many methods have recently demonstrated positive results. However, success on any particular bug is variable, as is the cost to find a repair.
This talk focuses on generate-and-validate repair methods that enumerate candidate repairs and use test cases to define correct behavior. We formalize repair cost in terms of test executions, which dominate most test-based repair algorithms. Insights from this model lead to a novel deterministic repair algorithm that computes a patch quotient space with respect to an approximate semantic equivalence relation. Generate-and-validate program repair is shown to be a dual of mutation testing, directly suggesting possible cross-fertilization. Evaluating on 105 real-world bugs in programs totaling 5MLOC and involving 10,000 tests, our new algorithm requires an order-of-magnitude fewer test evaluations than the previous state-of-the-art and is over five times more efficient monetarily. This talk presents work that is currently under submission.
Speaker's BioWestley Weimer is an Associate Professor of Computer Science at the University of Virginia. His main research interest lies in advancing software quality by using both static and dynamic programming language approaches. http://www.cs.virginia.edu/~weimer/
Oral Defense of Doctoral Dissertation: Scheduling Algorithms Optimizing Throughput and Energy for Networked Systems
Wedensday, June 26, 2013
Scheduling problems consider allocating limited resources under constraints among competing requests in order to fulfill their obligations. Practical resource management algorithms with provable performance guarantees are of great importance. In this dissertation, we study scheduling algorithms for resource management in networked systems. Mainly, we design, analyze, and implement two types of scheduling algorithms: (1)throughput-aware scheduling algorithms, and(2)energy-aware scheduling algorithms.
Throughput is a main metric that scheduling algorithms are designed to optimize. We study algorithms for network routers to schedule weighted packets with time constraints over a wireless fading channel. We design both offline and online algorithms to maximize weighted throughput, which is defined as the total value of the packets successfully sent before their respective deadlines.
Energy consumption has become an important performance metric in designing scheduling algorithms for computing systems and networked systems. For network routers, we first study a problem of scheduling jobs with values and deadlines to maximize net profit, which is defined as the difference between the revenue obtained from the jobs sent before their respective deadlines and the cost of total energy consumption during this course. Another model for network routers that we study is the trade-off between energy consumption and jobs’ flow time or stretch in an online setting. We design bi-criteria power-down strategies optimizing both and analyze their performance using competitive ratio. For Point-of-Presence (PoP) design, current IP core networks operate at a nearly constant power rate independent of the traffic load. Thus, the gap between the available network capacity and the temporal traffic demand presents opportunities for reducing network power consumption by deactivating network components without noticeably affecting network performance. We study a theoretical model for PoP design. The objective is to find out an assignment between traffic links and chassis within a PoP, such that the total energy cost of the PoP is minimized. We analyze the hardness of this model and design several approximation algorithms with provable near-optimal performance. For a given processor, each speed change involves time and energy overhead, as well as a negative impact on its lifetime reliability. Motivated by this observation, we study theoretical energy-aware scheduling problems by considering the number and the cost of processor’s speed changes. Four related problems based on this framework are studied.
Oral Defense of Doctoral Dissertation: Money Laundering Evolution Detection, Transaction Scoring, and Prevention Framework
Wednesday, July 03, 2013
Money laundering is a major and ongoing global issue that has not been addressed with a dynamic approach by authorities using multiple systems. Made powerful by modern tools and resources available to them, money launderers are adopting more sophisticated schemes, spanning across many countries, to avoid being detected by anti-money laundering systems. Consequently, money laundering detection and prevention techniques must be multi-layered, multi-method, and multi-component to be ahead of the evolving laundering schemes. Handling such a multifaceted problem involves a large amount of unstructured, semi-structured and transactional data that stream at speeds requiring a high level of analytical processing to discover unraveling business-complexities, and discover deliberately concealed relationships. Therefore, I developed the money laundering evolution detection framework (MLEDF) to capture the trail of the dynamic and evolving schemes. My framework uses sequence matching, case-based analysis, social network analysis, and more importantly, complex event processing to link the fraud trails. My system capture a single scheme as an event in a trail in “real-time”, and then using detection algorithms, associate the captured event with other ongoing events.
A comprehensive Anti Money Laundering system must incorporate a risk modeling that calculates the dynamic attributes of transactional relationships and the potential social relationships among seemingly unrelated entities from a financial perspective. Therefore, I developed an industry-wide system to assign a risk score for any transaction being a part of a larger money laundering scheme. This score should be valid across every financial domain, continuously updated, and it is not specific to the evaluating financial institution.
Additionally, I developed a transaction scoring exchange and money laundering prevention framework that uses a transaction messaging system and assigns scores to the transactions, where the score is derived from the dynamics risk of the transaction and the statically computed risk score. The transaction score is correlated to the static and dynamic risk scores, in order to identify transactions score pertaining to money laundering, and to prevent transaction sequences from being executed. The transaction score uses, dynamic risk score obtained from the analytics of results of the real-time detection algorithms, to produce valid results.
The recommended money laundering prevention system relies upon the finding of an accurate detection system, supported by dynamic risk modeling systems for transaction scoring. My prevention framework includes a protocol to exchange the information among the framework participants, and it incorporates two levels of cooperation and information sharing.
The developed three level systems in this study consist of multi-levels and multi-components, and they can be easily incorporated within existing structure financial institutions. My system allows financial investigators to overcome the long processes and time-consuming characteristics of their investigations, to prevent money laundering schemes, or at least be aware of such schemes in their early stages.
I validated the accuracy of calculating the money laundering evolution detection framework, dynamic risk scoring, and transactions scoring framework using a multi-phase test methodology. My test used data generated from real-life cases, and extrapolated to generate more varying scenarios of money laundering evolution and risk data from real-life schemes and patterns generator that I implemented.
Oral Defense of Doctoral Dissertation: On Using Meta-Modeling and Multi-Modeling to Address Complex Problems
Wednesday, July 03, 2013
Models, created using different modeling techniques, usually serve different purposes and provide unique insights. While each modeling technique might be capable of answering specific questions, complex problems require multiple models interoperating to complement/supplement each other; we call this Multi-Modeling. To address the syntactic and semantic challenges of this multi-modeling approach for solving complex problems, a systematic methodology for developing multi-modeling workflows is presented. The approach is domain specific: Identification of the domain and the supporting modeling techniques is the first step. Then a Domain Specific Multi-Modeling Workflow Language (DSMWL), supported by a Domain Ontology, is developed and then used to construct workflows that capture interoperations between various models. The domain ontology provides semantic guidance to effect valid model interoperation.
The approach is illustrated using a case study from the Drug Interdiction and Intelligence domain. The Joint Inter-Agency Task Force (JIATF) - South, an agency well known for interagency cooperation and intelligence fusion, receives large amounts of disparate data regarding drug smuggling efforts. Analysis of such data using various modeling techniques is essential in identifying best Courses of Action (COAs). The proposed methodology is applied to the Drug Interdiction domain by performing domain analysis, developing a Domain Specific Multi-Modeling Workflow Language (DSMWL) and a Domain Ontology, and then using the DSMWL and the Domain Ontology to create workflows of model interoperations involving Social Networks, Timed Influence Nets, and Geospatial models.
Oral Defense of Doctoral Dissertation: Evolving Local Minima in the Protein Energy Surface
Wednesday, July 24, 2013
Proteins are the molecular tools of the living cell and the path to unraveling their function is through modeling and understanding their structure. Many diseases occur when a protein loses its intended function due to its inability to form the appropriate structure with which it binds to other molecules in the cell. A holistic approach to protein modeling would be to characterize all possible structural states accessible by a protein under native, physiologic conditions. However, this task is infeasible. The question then becomes, how can we model the subset of these structural states most relevant to the function or disfunction of a protein?
This thesis proposes a novel computational framework to obtain an expansive view of the protein conformational space relevant for function while controlling computational cost. The framework complements experimental and high-resolution computational methods which limit their focus to a single region of the conformational space. The framework employs the knowledge that functionally-relevant conformations are those low in energy and the framework incorporates the latest understanding of protein structure and energy from biophysics. Specifically, this thesis proposes a novel stochastic search framework for exploring a diverse ensemble of conformations which capture low-energy basins in the protein energy surface.
The proposed search framework employs a hybrid or memetic approach for explicit sampling of local minima in the protein energy surface. This hybrid search framework combines a global evolutionary search approach with a local search component to take advantage of the latest advances from the computational biology community. Specifically, the following questions are addressed to effectively model the protein conformational space: (1) How to balance limited computational resources between exploration of the conformational space in global search with exploitation of local minima in local search? The hybrid search framework combines a global evolutionary search to explore the breadth of the conformational space with a local search for efficiently exploiting local minima in the underlying energy surface. (2) How to sample new conformations at the global level? Two complementary approaches are investigated. One approach proposes an enhanced fragment selection method for sampling a new conformation based on an existing structure. The other approach employs a genetic algorithm to combine features from multiple existing structures to sample a new conformation. (3) How to employ energy to better discriminate between interesting conformations and noise in the conformational search space? A multi-objective decomposition of the energy function is employed to guide the search towards more biologically relevant, low-energy conformations by focusing on the energy terms with the most discriminatory power.
Work in this thesis shows that, by combining advanced algorithmic components with the latest understanding of protein biophysics, the proposed search framework is able to more effectively model functionally-relevant conformational states. A direct comparison between the proposed framework and a state-of-the-art coarsegrained sampling algorithm shows that the enhanced sampling strategies lead to a more comprehensive picture of the underlying protein energy surface. By taking this more comprehensive view, the framework is able to capture the protein native state as well as or better than methods relying primarily on protein-specific sampling strategies.
Oral Defense of Doctoral Dissertation: Latent Variable Models of Sequence Data for Classification and Discovery
Tuesday, August 20, 2013
The need to operate on sequence data is prevalent across a range of real world applications including protein/DNA classification, speech recognition, intrusion detection, and text classification. Sequence data can be distinguished from the more-typical vector representation in that the length of sequences within a dataset can vary and that the order of symbols within a sequence carries meaning. Although it has become increasingly easy to collect large amounts of sequence data, our ability to infer useful information from these sequences has not kept pace. For instance, in the domain of biological sequences, experimentally determining the order of amino acids in a protein is far easier than determining the protein's physical structure or its role within a living organism. This asymmetry holds over a number of sequence data domains, and, as a result, researchers increasingly rely on computational techniques to infer properties of sequences that are either difficult or costly to collect through direct measurement.
This work explores a number of latent variable models over sequence data. These models were designed to produce alternate representations of sequences that distill relevant information, making them both easier to process with traditional machine-learning tools and potentially improving on benchmarks over standard inference tasks such as classification and motif finding.
In this presentation, I will discuss two latent variable models that incorporate structure from the Profile Hidden Markov Model (HMM), a model commonly used to represent biological sequences. These methods both simplify and enrich the mechanisms by which standard Profile HMMs operate.
The first model relaxes the discrete Profile HMM hidden state space to a continuous one. Placing a regularizer that encourages sparsity on this new continuous space produces a new model that shares many characteristics with a set of successful techniques known as Sparse Dictionary Learning. This relaxation is the basis of our Relevant Subsequence Sparse Dictionary Learning (RS-DL) model. Applied to continuous sequences, RS-DL is effective at extracting human-recognizable motifs. In addition, subsequences extracted using RS-DL can improve on classification performance over standard nearest neighbor and dynamic time warping techniques.
The next model I discuss involves incorporating Profile HMM structure into a family of purely discriminative models. These models, which we call Subsequence Networks, are similar to convolutional neural networks, which have garnered state-of-the-art results in a number of tasks in computer vision. Subsequence Networks compare favorably to state-of-the-art sequence Kernel approaches on protein sequence classification problems while using a significantly different mode of operation.
CS Seminar: Knowledge Discovery for Computational Intelligence Research (KDCIR)
Wednesday, August 28, 2013
Data is crucial for computational modeling to solve complex real world problems. Data acquisition through sensors, encoding/representation as data, extraction of patterns/information/knowledge all are crucial steps in modeling. The evolving field of ‘Big Data’ combines artificial intelligence, signal processing, data representation, data analytics, knowledge discovery from big data for analytics and predictive modeling. It includes representing and analyzing structured and unstructured data sourced from various sources. Distributed computational paradigms such as multi-agents are argued as efficient approaches to distributed knowledge discovery and modeling. The talk will summarize the KDCIR research motivated by problems from optimization, planning, training-performance modeling in elite sports (cycling), obesity/diabetes modeling, renal transplantation, stress/depression modeling and education analytics. Knowledge discovery from brain-computer interaction has also been studied to decode and learn from brain signals to provide a multichannel interaction for human-centered applications. The research and innovation from KDCIR projects will be summarized.
Speaker's BioProfessor Dharmendra Sharma has been the Dean of the Faculty of Information Sciences and Engineering at the University of Canberra from 2004-2012. He has assumed various senior leadership roles in universities for over eighteen years. He has been awarded the position of University Distinguished Professor by the University of Canberra in 2012. Prof Sharma’s research background is in the Artificial Intelligence areas of Planning, Data Analytics and Knowledge Discovery, Predictive Modeling, Constraint Processing, Fuzzy Reasoning, Brain-Computer Interaction, Hybrid Systems and their applications to health, education, security, digital forensics and sports. He has published over 210 research papers and has supervised to completion 23 higher degrees research students. He has received several competitive research awards and recognitions for his research leadership initiatives. He is a Fellow of the Australian Computer Society, a Fellow of the South Pacific Computer Society, and a Senior Member of IEEE. Prof Sharma has served on several industry, academic, and research bodies including government advisory committees. He had completed his PhD from the Australian National University and has been an academic for over 30 years. He may be contacted at Dharmendra.Sharma@canberra.edu.au.
Oral Defense of Doctoral Dissertation: Path Planning in Similar Environments
Friday, September 06, 2013
Path planning aims at navigating a robot from an initial configuration to a goal configuration without violating various constraints. The problem of path planning is theoretically intractable (PSPACE hard), but in everyday life we (as human beings) navigate in our environment without much difficulty. This is partially due to the fact that most objects we encounter today are similar or identical to the objects we encountered yesterday or even years ago.
Environments with similar objects are quite common. For example, desks and chairs in a classroom or in an office may be moved around from one place to another frequently, but unfamiliar items are seldom introduced. A dynamic environment where obstacles are allowed to move can be considered as a continuous sequence of similar static environments due to motion coherence. We term “discrete similar-workspace problem” for static environments and “continuous similar-workspace problem” for dynamic environments. In this thesis, I designed path planners that address both problems. These planners significantly improve not only the efficiency but also robustness over existing planners.
More specifically, I have developed a path planner which exploits similarity across different static environments. This planner can remember and reuse the computation for every obstacle encountered. To address the “continuous similar-workspace problem”, existing methods have explored the temporal coherence (i.e. similarity). However, all these methods repair blindly and periodically at fixed time intervals with little attempt to analyze the similarity across different time instances. This results in either redundant updates or failure to detect invalid edges and nodes. To address these issues, I designed two path planners for dynamic environments which can detect critical events such as the topological changes in configuration space for known environments or predicted collisions amidst obstacles with unknown motion. The experimental results show that our planners which explore similarity across different environments not only provide significant time efficiency, but also improves the chances of finding a solution.
SWE Seminar: Combinatorial Testing-Based Fault Localization
Wednesday, September 25, 2013
Combinatorial testing has been shown to be a very effective testing strategy. After a failure is detected by testing, the next task is fault localization, i.e., how to locate the fault that causes the failure. In this talk, I will discuss a fault localization approach we have developed that leverages the result of combinatorial testing.
Our approach consists of two major steps. At the first step, we identify failure-inducing combinations in a combinatorial test set. A combination is failure-inducing if its existence causes a test to fail. Based on the execution result of a combinatorial test set, we produce a ranking of suspicious combinations in terms of their likelihood to be inducing. At the second step, we create a small group of tests from a given failure-inducing combination. In the group, one test is referred to as the core member, and it produces a failed execution. The other tests are referred to as the derived members which are similar to the core member but produce passed executions. The traces of these test executions are then analyzed to locate the faults.
Experimental results show that our approach is very effective in that only a small number of additional tests are needed to locate the faults. To the best of our knowledge, our approach is the first effort to perform code-based fault localization based on combinatorial testing.
Speaker's BioYu Lei is currently an associate professor of computer science at the University of Texas at Arlington. He was a Member of Technical Staff in Fujistu Network Communications, Inc. from 1998 to 2001. He obtained his PhD degree from North Carolina State University in 2002. His research interests are in the area of automated software analysis, testing and verification, with a special focus on combinatorial testing and concurrency testing. His current research is supported by the US National Institute of Standards and Technology.
SANG Seminar: Hardware in Cybersecurity: from the Weakest Link to Great Promises
Friday, October 04, 2013
It is well-known that hardware implementation can outperform the software implementation of the same application, including security primitives such as encryption, by up to several magnitudes. However, hardware implementation may also make security primitives vulnerable despite their mathematical soundness. In this talk, we will discuss the role of hardware in cybersecurity.
First, we will use the finite state machine (FSM) model to demonstrate that the systems built with today's design flow and tools are vulnerable against a simple random walk attack. We further show that a malicious designer can embed Hardware Trojan Horse (HTH) into the system to gain unauthorized control of the system. We then describe a practical circuit level technique to guarantee the trustworthiness of the circuit implementation of a given FSM.
Second, we describe our recent work on physical unclonable function (PUF), a unique feature embedded in the chip during fabrication process. PUF has many promising applications in security and trust such as device authentication and secret key generation and storage. We will focus on the following questions: how to push the amount of the PUF information we can extract to the theoretical upper bound; how to ensure that the PUF information is random (and thus secure against attacks); how coding, including error correction coding, can impact the hardware efficiency when implementing a PUF.
Finally, we will show very briefly a couple of projects on hardware-software co-design in building security and trust to demonstrate the great promise that hardware can bring to cybersecurity.
Speaker's BioDr. Gang Qu is an Associate Professor in the Department of Electrical and Computer Engineering and the Institute for Systems Research at the University of Maryland, College Park. He is the co-director of the embedded system research lab and the wireless sensor lab. His current research interests are on VLSI design automation and wireless sensor networks, with special focus on security and energy efficiency. He has published more than 100 journal articles and conference papers in these fields with best paper awards in MobiCom (2001) and ASAP (2006). Dr. Qu is currently on the editorial boards of IEEE Transactions on Computers, IEEE Embedded Systems Letters, and Integration, the VLSI Journal.
VSE Seminar: Machine Learning Approaches for Annotating Biological Data.
Wednesday, October 09, 2013
Biological systems are complex and not completely understood. New generation of high-throughput (“Big Data”) technologies capture large volumes of complex, multi-modal data associated with these systems. Scientific discovery and advancement requires extracting useful information from these datasets, which presents unique and challenging computing problems. Complexity within biological data arises due to heterogeneity, incompleteness, missing information, noisy nature and inter-dependencies between the input and output domains.
In this talk, I will provide an overview of my contributions related to the development of accurate and efficient mining approaches for annotating these biological datasets. I will present a multi-task learning approach that seeks to leverage the hierarchical structure present within multiple biological archives for classification. I will also describe an approach for modeling of sequential data. I will provide a highlight of how these developed approaches are integrated within computational pipelines to solve biological problems as they relate to the fields of metagenomics (or community genomics), protein function prediction and drug discovery.
Speaker's BioHuzefa Rangwala is an Assistant Professor at the department of Computer Science & Engineering, George Mason University. He holds an affiliate appointment with the Bioengineering Department and the School of Systems Biology, George Mason University. He received his Ph.D. in Computer Science from the University of Minnesota in the year 2008. His research interests include machine learning, bioinformatics and high performance computing. He is the recipient of the NSF Early Faculty Career Award in 2013, the 2013 Volgenau Outstanding Teaching Faculty Award, 2012 Computer Science Department Outstanding Teaching Faculty Award and 2011 Computer Science Department Outstanding Junior Researcher Award. He is Mason's 2014 SCHEV Outstanding Faculty Award, Rising Star nominee. His research is funded by NSF, NIH, DARPA, USDA and nVidia Corporation.
CS Interdisciplinary Seminar: Harvesting Geospatial Intelligence from Social Media Feeds
Wednesday, October 16, 2013
The remarkable success of online social media sites marks a shift in the way people connect and share information. Much of this information now contains some form of geographical content due to the proliferation of location-aware devices, thus fostering the emergence of geosocial media - a new type of user-generated geospatial information. Through geosocial media we are able, for the first time, to observe human activities in scales and resolutions that were so far unavailable. Furthermore, the wide spectrum of social media data and service types provides a multitude of perspectives on real-world activities and happenings, thus opening new frontiers in geosocial knowledge discovery. However, gleaning knowledge from geosocial media is a challenging task, as they tend to be unstructured and thematically diverse. In this presentation I will present work performed by Mason's geosocial team (http://geosocial.gmu.edu) on harvesting and analyzing information from such content, offering representative application examples and an overview of our system prototype.
VSE Seminar: Advancing Biomolecular Modeling and Simulation: A Probabilistic Approach for Characterizing Complex Systems in the Presence of Constraints
Thursday, October 17, 2013
A fundamental issue in our understanding of biology and treatment of disease concerns elucidating the underlying determinants of biological function in biomolecules. The main biomolecules at the center of most chemical reactions in the living and diseased cell, DNA, RNA, and proteins, are complex modular systems composed of many heterogeneous and often highly-coupled building blocks operating under physics-based constraints. In DNA and RNA, building blocks at the sequence level combine in non-trivial ways to give rise to complex functional signals. Modeling proteins adds additional algorithmic challenges due to the need for spatial reasoning to capture the dynamic nature of these systems as they flex their structures to modulate function. Yet, modeling is a central tool in understanding the molecular basis of many proteinopathies, such as cancer and neurodegenerative disorders. In this talk, I will provide an overview of algorithmic challenges and our contributions in physical algorithms for modeling states and state transitions in complex systems in the presence of constraints. The main focus of the talk will be on the novel probabilistic approach we have proposed for structure and motion computation. The underpinnings of this approach are in sampling-based robot motion planning and evolutionary computation to efficiently search high-dimensional configuration (solution) spaces with non-linear cost (objective) functions. The approach makes several innovations, including (1) novel use of connectivity and embeddings of the search space for an adaptive search of high exploration capability, (2) exploitation of the interplay between global and local search for better coverage, and (3) incorporation of multi-objective optimization to attenuate reliance on noisy cost functions. Comprehensive evaluations show that this approach, when informed by a representation of molecular geometry grounded in biophysics, is highly effective. The talk will conclude with selected applications demonstrating the ability of this work to formulate hypotheses and guide biological research, followed by an outline of my future research agenda.
Speaker's BioAmarda Shehu is an Assistant Professor in the Department of Computer Science with affiliated appointments in the Department of Bioengineering and the School of Systems Biology. Shehu received her Ph.D. in Computer Science from Rice University in Houston, TX, where she was an NIH fellow of the Nanobiology Training Program of the Gulf Coast Consortia. Shehu's general research interests are in the field of Artificial Intelligence. Her research contributions to date are in computational structural biology, biophysics, and bioinformatics with a focus on issues concerning the relationship between sequence, structure, dynamics, and function in biological molecules. Shehu's research is currently supported by the NSF, the Jeffress Trust Program in Interdisciplinary Research, and the Virginia Youth Tobacco Program. Shehu is also the recent recipient of an NSF CAREER award. She is a member of ACM, IEEE, Biophysical Society, International Society for Computational Biology, the American Chemical Society, and the Council on Undergraduate Research. Research and educational materials resulting from Shehu's work, including images, videos, publications, and software, can be found at: http://cs.gmu.edu/~ashehu.
SANG Seminar: There's Always Room for Improvement: Dissecting Bad Codes with Chatter and AVMeter
Friday, October 18, 2013
Malware classification and family identification are not new problems. However, the rapid evolution of the malware attack/defense ecosystem has enabled much more fruitful research. In this talk, our contributions to the domain will be summarized by presenting two systems that we at Verisign have named Chatter and AVMeter.
Chatter is a system for representing and leveraging the sequence of events in a malware execution. Whereas calculating and exposing low-level feature values might have ill scalability or gamesmanship effects, Chatter tersely and efficiently captures execution patterns. By creating an alphabet/language to represent runtime behavior, techniques from n-gram processing are used to train a binary classifier that is capable of distinguishing different malware samples with high accuracy.
AVMeter is a system for evaluating the performance of antivirus scans and labels. Researchers rely heavily on these outputs in establishing ground-truth for their methods and companies use them to guide mitigation and disinfection efforts. However, there is a lack of research that validates the performance of these antivirus vendors. Utilizing malware samples that have been manually labeled by expert analysts, we reveal dramatic errors in the correctness, coverage, and consistency of current antivirus offerings. We invite the community to challenge assumptions about relying on AV scans and labels as a ground truth for malware analysis and classification.
Speaker's BioAziz Mohaisen is a research scientist at VeriSign Labs. His research interests are broadly focused on the security, privacy, measurement, and analysis of complex and emerging network systems. His recent work has emphasized data-driven security and its applications in malware analysis, network routing, information sharing, and Internet-scale reputation. He obtained his Ph.D. in computer science from the University of Minnesota in 2012 where he wrote his dissertation on trustworthy social computing systems.
CS Distinguished Lecture Series: Understanding Global Climate Change: A Data Driven Approach
Wednesday, October 23, 2013
Climate change is the defining environmental challenge facing our planet, yet there is considerable uncertainty regarding the social and environmental impact due to the limited capabilities of existing physics-based models of the Earth system. This talk will present an overview of research being done in a large interdisciplinary project on the development of novel data driven approaches that take advantage of the wealth of climate and ecosystem data now available from satellite and ground-based sensors, the observational record for atmospheric, oceanic, and terrestrial processes, and physics-based climate model simulations. These information-rich datasets offer huge potential for monitoring, understanding, and predicting the behavior of the Earth's ecosystem and for advancing the science of climate change. This talk will discuss some of the challenges in analyzing such data sets and our early research results.
Speaker's BioVipin Kumar is currently William Norris Professor and Head of Computer Science and Engineering at the University of Minnesota. His research interests include High Performance computing and data mining, and he is currently leading an NSF Expedition project on understanding climate change using data driven approaches. He has authored over 250 research articles, and co-edited or coauthored 10 books including the widely used text book ``Introduction to Parallel Computing", and "Introduction to Data Mining" both published by Addison-Wesley. Kumar co-founded SIAM International Conference on Data Mining and served as a founding co-editor-in-chief of Journal of Statistical Analysis and Data Mining (an official journal of the American Statistical Association). Kumar is a Fellow of the AAAS, ACM and IEEE. He received the 2009 Distinguished Alumnus Award from the Computer Science Department, University of Maryland College Park, and 2005 IEEE Computer Society's Technical Achievement Award for his contributions to the design and analysis of parallel algorithms, graph-partitioning, and data mining. Kumar's foundational research in data mining and its applications to scientific data was honored by the ACM SIGKDD 2012 Innovation Award, which is the highest award for technical excellence in the field of Knowledge Discovery and Data Mining (KDD).
SWE Seminar: Architectural Decay in Software Systems: Symptoms, Causes, and Remedies
Tuesday, October 29, 2013
Engineers frequently neglect to carefully consider the impact of their changes to a software system. As a result, the software system's architectural design eventually deviates from the original designers' intent and degrades through unplanned introduction of new and/or invalidation of existing design decisions. Architectural decay increases the cost of making new modifications and decreases a system's reliability, until engineers are no longer able to effectively evolve the system. At that point, the system's actual architecture may have to be recovered from the implementation artifacts, but this is a time-consuming and error-prone process, and leaves critical issues unresolved: the problems caused by architectural decay will likely be obfuscated by the system's many elements and their interrelationships, thus risking further decay. In this talk, I will focus on pinpointing locations in a software system's architecture that reflect architectural decay. I will discuss the reasons why that decay occurs.
Specifically, I will present an emerging catalog of commonly occurring symptoms of decay -- architectural "smells". I will illustrate the occurrence of smells identified in the process of recovering the architectures of several real-world systems. Finally, I will provide a comparative analysis of a number of automated techniques that aim to recover a system's architectural design from the system's implementation.
Speaker's BioJoshua Garcia is a Ph.D. candidate in Computer Science at the University of Southern California (USC). He is a member of the Software Architecture Research Group at the Center for Systems and Software Engineering at USC. His research interests include architectural decay, software-architecture recovery, distributed event-based systems, and self-adaptive systems. He has worked for a variety of organizations including the NASA Jet Propulsion Laboratory, the Southern California Earthquake Center at USC, and Xerox Special Information Systems.
SANG Seminar: Wireless Security and Privacy: From Crypto-based Protection to Cross-Layer Security Enhancement
Friday, November 01, 2013
The past decade has witnessed an explosive deployment of wireless technologies. The vast expansion of connectivity by wireless networks combined with the rapid evolution of highly-programmable mobile devices, have had strong impacts on our life. Security has arisen as a major concern in wireless networks. Many crypto-based solutions have been developed to provide the first line of defense, ranging from fundamental security services such as authentication and privacy, to the protection of the infrastructure and the various network components. Recently, cross-layer security enhancement has emerged as a promising approach to the unique challenge facing mobile wireless networks, i.e. mobile devices may be compromised (stolen, reverse engineered, or forged). The idea is to extract unique and unforgeable information from wireless communications or mobile devices and supply such information to other layers for enhanced security. In this talk, we will discuss this transition and several promising research directions.
Speaker's BioWenjing Lou received her Ph.D. in Electrical and Computer Engineering from University of Florida. She is currently an associate professor in the Computer Science department at Virginia Polytechnic Institute and State University, and a co-director of the Complex Networks and Security Research (CNSR) lab. Prior to joining Virginia Tech in 2011, she was on the faculty of Worcester Polytechnic Institute from 2003 to 2011.
Her current research interests include wireless network security and data security and privacy in cloud computing. She is currently serving on the editorial boards of IEEE Transactions on Parallel and Distributed Systems and IEEE Wireless Communications Letters. She has served as TPC co-chair for the security symposium of several IEEE conferences, including IEEE Globecom 2007, IEEE ICCCN 2009, IEEE ICC 2010, IEEE PIMRC 2011, and IEEE Globecom 2012. She is the mini-conference co-chair for IEEE INFOCOM 2013, and the panel co-chair for IEEE INFOCOM 2014. She serves as TPC Area Chair for IEEE INFOCOM 2013 & 2014, IEEE ICNP 2013, and IEEE SECON 2014. She is the lead co-founder and TPC co-chair for the First IEEE Conference on Communications and Network Security (IEEE CNS), held in Washington DC in October 2013.
GRAND Seminar: From AND/OR Search to AND/OR Sampling
Monday, November 11, 2013
Sampling is one of the main approaches for approximate reasoning in graphical models. In this work we show that while sampling can be structure-blind, exploiting the graph-structure can reduce sampling variance and hence speed-up convergence. Specifically, I will show how “AND/OR Importance sampling” can exploit problem decomposition, yielding significantly improved estimators. Moreover, combining AND/OR sampling with cutset-sampling to reduce the effective dimensionality yields further variance reduction. Extensive empirical evaluation demonstrates the power of the new estimators, often showing an order of magnitude improvements over previous schemes. In particular, these schemes were part of a solver that won first place in the recent UAI 2010 competition and first place in the Pascal 2011 approximate reasoning challenge.
Speaker's BioRina Dechter is a professor of Computer Science at the University of California, Irvine. She received her PhD in Computer Science at UCLA in 1985, an MS degree in Applied Mathematics from the Weizmann Institute and a B.S in Mathematics and Statistics from the Hebrew University, Jerusalem. Her research centers on computational aspects of automated reasoning and knowledge representation including search, constraint processing and probabilistic reasoning.
Professor Dechter is an author of Constraint Processing published by Morgan Kaufmann, 2003, has authored over 150 research papers, and has served on the editorial boards of: Artificial Intelligence, the Constraint Journal, Journal of Artificial Intelligence Research (JAIR) and Journal of Machine Learning Research (JMLR). She was awarded the Presidential Young Investigator Award in 1991, is a fellow of the American Association of Artificial Intelligence since 1994, was a Radcliffe Fellow 2005-2006 and received the 2007 Association of Constraint Programming (ACP) Research Excellence Award. She has been Co-Editor-in-Chief of Artificial Intelligence, since 2011.
SANG Seminar: Defeat Information Leakage from Browser Extensions Via Data Obfuscation
Friday, November 15, 2013
Today web browsers have become the de facto platform for Internet users. This makes browsers the target of a lot of attacks. With the security considerations from the very beginning, Chrome offers more protection against exploits via benign-but-buggy extensions. However, more and more attacks have been launched via malicious extensions while there is no effective solution to defeat such malicious extensions. As user's sensitive information is often the target of such attacks, in this talk, I will discuss how to proactively defeat information leakage with our iObfus framework. A proto-type has been implemented and iObfus works seamlessly with the Chromium 25. Evaluation against malicious extensions shows the effectiveness of iObfus, while it only introduces trivial overhead to benign extensions.
Speaker's BioWentao Chang is currently a PhD candidate in Computer Science at George Mason University. He received his B.S. degree (Hons.) in Computer Science and Technology from Nanjing University, China in 2006, M.S. degree in Computer Science from George Mason University in 2010. His research interests include web browsing security, malware analysis and mobile security.
CS Seminar: Guiding Evolutionary Learning by Behavioral Evaluation
Monday, November 18, 2013
An intelligent agent can display behavior that is not directly related to the task it learns. Depending on the adopted AI framework and task formulation, such behavior is sometimes attributed to environment exploration, or ignored as irrelevant, or even penalized as undesired. We postulate here that virtually every interaction of an agent with its learning environment can result in outcomes that carry information which can be potentially exploited to solve the task. To support this claim, we present Pattern Guided Evolutionary Algorithm (PANGEA), an extension of genetic programming (GP), a genre of evolutionary computation that aims at synthesizing programs that display the desired input-output behavior. PANGEA uses machine learning to search for regularities in intermediate outcomes of program execution (which are ignored in standard GP), more specifically for relationships between these outcomes and the desired program output. The information elicited in this way is used to guide the evolutionary learning process by appropriately adjusting program fitness. An experiment conducted on a suite of benchmarks demonstrates that this architecture makes agent learning more effective than in conventional GP. In the paper, we discuss the possible generalizations and extensions of this architecture and its relationships with other contemporary paradigms like novelty search and deep learning. In conclusion, we extrapolate PANGEA to postulate a dynamic and behavioral learning framework for intelligent agents.
Speaker's BioKrzysztof (Chris) Krawiec is an associate professor in the Institute of Computing Science, Poznan University of Technology, Poland, and currently Fulbright Visiting Scholar at Computer Science and Artificial Intelligence Laboratory at MIT. His publications in evolutionary computation concern genetic programming, semantic genetetic programming, coevolution and modularity, as well as applications in image analysis, medical decision support, games, and climate science. He cochaired European Conference on Genetic Programming (EuroGP) in 2013 and 2014, and is a member of the Editorial Board of Genetic Programming and Evolvable Machines journal, and the president of Polish Chapter of IEEE Computational Intelligence Society.
CS Distinguished Lecture Series: Designing Software Systems that Comply with Privacy and Security Regulations
Wednesday, November 20, 2013
Properly protecting information is in all our best interests, but it is a complex undertaking. The fact that regulation is often written by non-technologists, introduces additional challenges and obstacles. Moreover, those who design systems that collect, store, and maintain sensitive information have an obligation to design systems holistically within this broader context of regulatory and legal compliance.
There are questions that should be asked when developing new requirements for information systems. For example ..... How do we build systems to handle data that must be kept secure and private when relevant regulations tie your hands? When building a system that maintains health or financial records for a large number of people, what do we need to do to protect the information against theft and abuse, keep the information private, AND at the same time, satisfy all governing privacy/security laws and restrictions? Moreover, how do we know that we've satisfied those laws? How do we monitor for compliance while ensuring that we're monitoring the right things? And, how do you accomplish all this in a way that can be expressed clearly to end-users and legislators (or auditors) so they can be confident you are doing the right things?
We've been working on technologies to make these tasks simpler, and in some senses, automatic. In this talk, I will describe some of the research that we have been conducting to address these problems.
BioDr. Annie I. Antón is a Professor in and Chair of the School of Interactive Computing at the Georgia Institute of Technology in Atlanta. She has served the national defense and intelligence communities in a number of roles since being selected for the IDA/DARPA Defense Science Study Group in 2005-2006. Her current research focuses on the specification of complete, correct behavior of software systems that must comply with federal privacy and security regulations. She is founder and director of ThePrivacyPlace.org. Antón currently serves on various boards, including: the U.S. DHS Data Privacy and Integrity Advisory Committee, an Intel Corporation Advisory Board, and the Future of Privacy Forum Advisory Board. She is a former member of the CRA Board of Directors, the NSF Computer & Information Science & Engineering Directorate Advisory Council, the Distinguished External Advisory Board for the TRUST Research Center at U.C. Berkeley, the DARPA ISAT Study Group, the USACM Public Council, the Advisory Board for the Electronic Privacy Information Center in Washington, DC, the Georgia Tech Alumni Association Board of Trustees, the Microsoft Research University Relations Faculty Advisory Board, the CRA-W, and the Georgia Tech Advisory Board (GTAB). Prior to joining the faculty at Georgia Tech, she was a Professor of Computer Science in the College of Engineering at the North Carolina State University. Antón is a three-time graduate of the College of Computing at the Georgia Institute of Technology, receiving a Ph.D. in 1997 with a minor in Management & Public Policy, an M.S. in 1992, and a B.S. in 1990 with a minor in Technical and Business Communication.
Oral Defense of Doctoral Dissertation: Spam, Phishing, and Fraud Detection Using Random Projections, Adversarial Learning, and Semi-Supervised Learning
Friday, November 22, 2013
Spam, phishing, and fraud detection are security applications that impact most people. Challenges for building spam, phishing, and fraud detection models include difficulty in obtaining annotated data, increased computational complexity for robust detection methods, annotation errors, and changes in the underlying data distribution. We address the above challenges as follows. Clustering and active learning are combined to make efficient use of annotated data, yielding state of the art spam detection performance using only 10 percent of the annotated data employed by previously published methods. Social Network Analysis (SNA) reputation features for mail transfer agents are introduced to evaluate paths from sender to receiver, increasing the detection rate by 70 percent (with the same false positive rate) for state of the art spam detection. Random projections with boosting achieve state of the art spam detection with a 75 percent reduction in computational cost for message classification. The Randomized Hough Transform - Support Vector Machine updates training set annotations, increasing the (precision, recall) F measure by 9.3 percent compared to a state of the art method for handling adversarial noise. Spectral clustering of URL n-grams and transductive semi-supervised learning are used to increase the detection rate by 100 percent (doubling the detection rate with the same false positive rate) for state of the art phishing detection under adversarial modification of message text. Reputation and similarity features are used to enhance the ability to withstand changes in underlying data distributions, producing a 13.5 percent increase in cost savings for state of the art fraud detection. Future research possibilities include the application of these methods to identify deception in social media channels.
Oral Defense of Doctoral Dissertation: Mitigating Denial-of-Service Attacks in Contested Network Environments
Monday, January 06, 2014
In an increasingly connected world, ensuring availability of the services and applications carried by computer networks is critical. As the most far-reaching computer network, the Internet is the home to millions of services that have profound influences on people’s life and work. Additionally, mobile ad hoc networks (MANETs), often used to support critical military and civilian projects, rely on the availability of all participating nodes to fulfill their missions. However, the availability of these networks and services are constantly threatened by denial-of-service (DoS) attacks with growing intensity and sophistication. In this thesis, we study different DoS combating mechanisms to protect services and hosts in the contested Internet and MANET environments.
To mitigate distributed denial-of-service (DDoS) attacks bombarded by powerful botnets on the Internet, we propose a moving target mechanism that progressively separates benign clients from the mingled attackers. This is achieved by endowing mobility to the defense system to evade naive attackers, and smartly shuffling clients to quarantine advanced attackers. We present two mobile defense architectures tailored for different threat and application models.
One architecture, named MOTAG, is built on secret moving network proxies that act as the intermediate layer between authenticated clients and the protected services. By only disclosing the active proxies to the authenticated clients and quickly replacing the attacked ones, this intermediate layer becomes a moving target that continues to escape from network flooding attacks.
Another architecture, enabled by the resource elasticity and network space of cloud computing, replicates open web servers to partition incoming clients’ workload. By dynamically instantiating new replica servers scattered in the cloud and re-shuffling clients’ assignments, we are able to quarantine flooding attacks targeting both network and computational resources. Under both architectures, advanced attackers may follow the moving targets to persist their attacks. To isolate the following attackers from benign clients, we perform elaborate shuffling operations that intelligently redistribute clients among different proxies or replica servers. For guiding the shuffling operations, we design an optimal dynamic programming algorithm that expectedly saves the maximum number of benign clients from each shuffle. We also introduce a much faster greedy algorithm that can generate near-optimal shuffling plans in realtime. Furthermore, a maximum-likelihood algorithm is employed to accurately estimate the number of following attackers. Results from extensive simulations show that our defense mechanism can save a vast majority of benign clients from persistent and intense DDoS attacks in a few rounds of shuffling. Experiments that study the overhead of the shuffling operation demonstrate that clients can be re-assigned among different proxies or replica servers in several seconds.
In addition, this thesis introduces a capability-based mechanism that inhibits MANET DoS attacks in the context of multi-path routing. Existing solutions are limited because they assume a single path is used to route traffic for each flow. To prevent attacks that multiply their throughput by employing multiple different routes, we present CapMan, an enhanced capability-based mechanism that enforces per-flow limits across all employed routes. To ensure overall capability compliance, CapMan not only informs all intermediate nodes of a flow about the assigned capability, but also provides them with a global throughput perspective via periodical summary exchange. Results show that CapMan is able to maintain flow-wide capability constraints consistently and distributedly, even when multiple colluding insiders attempt to exacerbate the attack. In the meantime, the impact of CapMan on well-behaved flows is shown to be small.
Test-Out Exam: INFS 501
Wednesday, January 15, 2014
RegistrationPlease register by sending an email to email@example.com with the following REQUIRED information:
*Your full name and G number.
*Your program and academic advisor.
*Which exams you wish to register for.
*If spring 2014 will be your first semester as a student. If this is not your first semester as an MS student, please explain why you are are taking the test out exams now.
Test-Out Exam: INFS 515
Wednesday, January 15, 2014
RegistrationPlease register by sending an email to firstname.lastname@example.org with the following REQUIRED information:
*Your full name and G number.
*Your program and academic advisor.
*Which exams you wish to register for.
*If spring 2014 will be your first semester as a student. If this is not your first semester as an MS student, please explain why you are are taking the test out exams now.
Test-Out Exam: INFS 519
Wednesday, January 15, 2014
RegistrationPlease register by sending an email to email@example.com with the following REQUIRED information:
*Your full name and G number.
*Your program and academic advisor.
*Which exams you wish to register for.
*If spring 2014 will be your first semester as a student. If this is not your first semester as an MS student, please explain why you are are taking the test out exams now.
Test-Out Exam: SWE 510
Wednesday, January 15, 2014
RegistrationPlease register by sending an email to firstname.lastname@example.org with the following REQUIRED information:
*Your full name and G number.
*Your program and academic advisor.
*Which exams you wish to register for.
Oral Defense of Doctoral Dissertation: A Dynamic Dialog System Using Semantic Web Technologies
Thursday, January 16, 2014
A dialog system or a conversational agent provides a means for a human to interact with a computer system. Dialog systems use text, voice and other means to carry out conversations with humans in order to achieve some objective. Most dialog systems are created with specific objectives in mind and consist of pre-programmed conversations.
The primary objective of this dissertation is to show that dialogs can be dynamically generated using semantic technologies. I show the feasibility by constructing a dialog system that can be specific to control physical entry points, and that can withstand an attempt to misuse the system by playing back previously recorded conversations. As a solution my system randomly generates questions with a pre-specified difficulty level and relevance, thereby not repeating conversations. In order to do so, my system uses policies to govern the entrance rules, and Item Response Theory to select questions derived from ontologies relevant to those policies. Furthermore, by using contextual reasoning to derive facts from a chosen context, my dialog system can be directed to generate questions that are randomized within a topic, yet relevant to the task at hand. My system design has been prototyped using a voice interface. The prototype demonstrates acceptable performance on benchmark data.
Computer Science Department: GTA Orientation
Thursday, January 16, 2014
C4I Seminar Series: Efficient User Assets Management by Trade-based Asset Blocks and Dynamic Junction Tree for Combo Prediction Markets
Friday, January 24, 2014
We have often heard that the collective wisdom of an informed and diverse group usually out-performs individual experts on forecasting and estimation tasks. The key question is how best to aggregate those diverse judgments. In 2010, it wasn't known whether any system could reliably beat the simple average. Mason is among the teams that reliably did so for two years in IARPA's ACE forecasting challenge. In June 2013 we closed down our geopolitical prediction market to create SciCast, a new and improved science & technology market. This talk discusses ongoing improvements to the SciCast forecasting engine. Unlike other prediction markets, SciCast allows forecasters make conditional forecasts: the chance that China's lunar rover would deploy can be made to depend on a successful soft lunar landing. To avoid a combinatorial explosion, SciCast uses Bayesian networks as the underlying probability model. But tracking the joint probability structure is not enough: markets also must track assets for each user, awarding users for correct forecasts and ensuring there is no possible world where they go negative. Previously, we tracked assets using the same junction tree structure as the joint probability model. This approach provides fast computation of the minimum value and expected value. However, it wastes a lot of space: the majority of users trade sparsely relative to the total number of questions, and even more sparsely compared to the whole joint probability space. Therefore most of the asset junction tree remains untouched. Worse, every time a question is added or resolved, we have to update the asset tree for all users, just in case. We think a trade-based method can overcome this problem and be computationally efficient as well. It turned out that we can build asset blocks involving the questions being traded only, then collect them in an organized manner such as merging sub-block to its super set. Further, when computing user score and cash, we can construct an asset junction tree dynamically, based on the collection of asset blocks then use the asset junction tree for efficient computations. When questions are resolved, it is straightforward to update user's asset blocks accordingly. Basically, for any asset block which contains the resolved questions, we realize the resolving state and truncate the block. In this presentation, I will explain in detail how the trade-based asset blocks are built and how to construct the corresponding asset junction tree dynamically. Computational examples will be demonstrated and compared with other alternative methods. For general questions about prediction markets or other background knowledge, please visit https://scicast.org/.
Speaker's BioDr. Wei Sun is a research assistant professor of the Sensor Fusion Lab and the C4I Center at George Mason University, where he works on stochastic modeling, probabilistic reasoning, optimization, decision support systems, data fusion and general operations research. Dr. Sun focuses his research on inference algorithm for hybrid Bayesian networks, nonlinear filtering, and information fusion. He is an expert in Bayesian inference and developer of several efficient inference algorithms. He is also a contributor/committer of the open-source Matlab BN toolbox. Applications of his research include tracking, fusion, bioinformatics, classification, diagnosis, etc. Prior to joining GMU, Dr. Wei Sun was a Senior Analyst with United Airlines, Inc. and a professional Electrical Engineer in China. He is the recipient of the GMU’s Academic Excellence Award in 2003 and PhD Fellowship during 2003-2007.
GRAND Seminar: Re-Projection of Terabyte-sized 3D Images for Interactive Visualization
Tuesday, February 18, 2014
This talk will present the computational challenges and approaches to knowledge discovery from terabyte-sized images. The motivation comes from experimental systems for imaging and analyzing human pluripotent stem cell cultures and material science specimens at the spatial and temporal coverage that lead to terabyte-sized image data. The objective of such an unprecedented cell study and material imaging is to characterize specimens at high statistical significance in order to guide a repeatable growth of high quality stem cell colonies and to understand metallurgical processes. To pursue this objective, multiple computer and computational science problems have to be overcome including image correction (flat-field, dark current and background), stitching, segmentation, tracking, re-projection, feature extraction and then representation of large images for interactive visualization and sampling in a web browser.
In this presentation, we will focus on the problem of re-projecting terabyte-sized 3D images for interactive visualization from multiple orthogonal viewpoints. The current solutions are limited to gigabyte-sized images using specialized hardware to achieve interactivity and are lacking the ability to share data for collaborative research. We overcome these limitations by pre-computing re-projected views of terabyte-sized images and by using the Deep Zoom framework for accessing multiple orthogonal views. Our approach is based on researching extensions to Amdahl’s law for Map-Reduce computations, establishing benchmarks for image processing on a Hadoop platform, and introducing a computer cluster node utilization coefficient for re-projection computations running on a computer cluster/cloud. The theoretical models of algorithmic complexity and cluster utilization at terabyte scale are applied to selecting an optimal computer cluster configuration. Additional interactive measurement capabilities are added as plugins to the open source OpenSeadragon project with the Deep Zoom capabilities. This presentation will conclude with illustrations of enabled scientific discoveries, as well as with several collaboration opportunities to create reference resources for scientific discoveries from terabyte-sized images.
Speaker's BioPeter Bajcsy received his Ph.D. in Electrical and Computer Engineering in 1997 from the University of Illinois at Urbana-Champaign (UIUC) and a M.S. in Electrical and Computer Engineering in 1994 from the University of Pennsylvania (UPENN). He worked for machine vision, government contracting, and research and educational institutions before joining the National Institute of Standards and Technology (NIST) in 2011. At NIST, he has been leading a project focusing on the application of computational science in biological metrology, and specifically stem cell characterization at very large scales. Peter’s area of research is large-scale image-based analyses and syntheses using mathematical, statistical and computational models while leveraging computer science fields such as image processing, machine learning, computer vision, and pattern recognition. He has co-authored more than more than 24 journal papers, eight book chapters and close to 100 conference papers
CS Interdisciplinary Seminar: Combining Genetic Algorithms and Simulation to Search for Failure Scenarios in System Models
Wednesday, February 19, 2014
Large infrastructures, such as clouds, can exhibit substantial outages, sometimes due to failure scenarios that were not considered during system design. We define a method that uses a genetic algorithm (GA) to search system simulations for parameter combinations that result in system failures, so that designers can take mitigation steps before deployment. We apply the method to study an existing infrastructure-as-a-service cloud simulator. We characterize the dynamics, quality, effectiveness and cost of GA search, when applied to seek a known failure scenario. Further, we iterate the GA search to reveal previously unknown failure scenarios. We find that, when schedule permits and failure costs are high, combining GA search with simulation proves useful for exploring and improving system designs.
Speaker's BioKevin Mills is a senior scientist with the Information Technology Laboratory at the US National Institute of Standards & Technology. In 1996, he received a PhD in Information Technology from George Mason University, where he served on the adjunct faculty through 2006. From 1996 to 1999, he served as a program manager at DARPA. His current research interests include understanding and predicting behavior in complex information systems.
SWE Seminar: What's Wrong With the Program I Haven't Written Yet?
Friday, February 21, 2014
Software developers primarily rely on experience and intuition to make development decisions. I will describe speculative analysis, a new technique that helps developers make better decisions by informing them of the consequences of their likely actions. As a concrete example, I will consider collaborative development and the conflicts that arise when developers make changes in parallel. This is a serious problem. In industry, some companies hire developers solely to resolve conflicts. In open-source development, my historical analysis of over 140,000 versions of nine systems revealed that textual, compilation, and behavioral conflicts are frequent and persistent, posing a significant challenge to collaborative development. Speculative analysis can help solve this problem by informing developers early about potential and existing conflicts. Armed with this information, developers can prevent or more easily resolve the conflicts. I will demonstrate Crystal, a publicly available tool that detects such conflicts early and precisely. Crystal has inspired a collaboration with Microsoft and some Microsoft teams now use a version of the tool in their everyday work
Speaker's BioYuriy Brun is an assistant professor at the University of Massachusetts, Amherst. Yuriy's research interests are in software system modeling, design, and development. Previously, he served as a CI Fellow at the University of Washington, received his PhD degree in 2008 from the University of Southern California, as an Andrew Viterbi Fellow, and received his MEng degree in 2003 from MIT. He was recognized with the IEEE TCSC Young Achiever in Scalable Computing Award in 2013, and his doctoral research was a finalist in the ACM Doctoral Dissertation Competition in 2008. His work on speculative analysis, the subject of his talk, won a 2011 ACM SIGSOFT Distinguished Paper Award and was the spotlight paper in the October 2013 issue of IEEE Transactions on Software Engineering. http://cs.umass.edu/~brun
GRAND Seminar: 3D Shape Retrieval Benchmarking and Contests
Tuesday, March 18, 2014
Benchmarking of 3D Shape retrieval allows developers and researchers to compare the strengths of different methodologies on a standard dataset. Here we describe the procedures involved in developing a benchmark and issues involved. We then discuss some of the current 3D shape retrieval benchmarking efforts of our group and others. We also review the different performance evaluation measures that are developed and used by researchers in the community. After that we give an overview of the 3D shape retrieval contest (SHREC) tracks run under the EuroGraphics Workshop on 3D Object Retrieval and give details of the tracks that we have organized for SHREC.Then we present some of the results based on the different SHREC contest tracks and the other NIST shape benchmark. Finally, I will describe some recent non-rigid 3D shape retrieval algorithms developed by our group.
Speaker's BioAfzal Godil is a project leader in the Information Technology Laboratory at National Institute of Standards and Technology (NIST) where he has been for over seventeen years. Prior to that he has worked at the NASA Langley and Lewis Research centers. His main research interests are in 3D shape analysis and retrieval, digital human modeling, shape metrology, computer vision and computational methods. He has organized twelve Shape Retrieval tracks at the different Eurographics Workshop on 3D Object Retrieval. He is also active in a variety of standards efforts, such as Web graphics, Anthropometry and Medical extension of X3D.
CS Interdisciplinary Seminar: Computer Vision and Robotic Methods for Building and Bridge Inspection
Wednesday, March 19, 2014
Bridges represent a critical component of infrastructure systems, and therefore condition monitoring via periodic inspection has long been a key part of bridge operations and maintenance practice. There are more than 578,000 bridges in the US alone, most of which must be inspected every two years, and so hundreds of millions of dollars per year are spent on the inspection of trillions of dollars in assets. In current practice, images captured during an inspection are not considered quantitative sources of information and are therefore underutilized.
This seminar will present a fundamental rethink of how we capture and handle inspection images, which has resulted in the utilization of computer vision to connect visually observable structural damage to changes in the underlying mechanical performance of structures. Included will be a discussion of the challenges of structural image segmentation and feature extraction, as well as current efforts to represent images as dynamic sources of information.
Speaker's BioDavid Lattanzi, Ph.D., P.E. is an Assistant Professor in the Sid and Reva Dewberry Department of Civil, Environmental, and Infrastructure Engineering. Dr. Lattanzi studies how to develop new methods of structural inspection through the use of artificial intelligence, computer vision, and robotics. A registered professional engineer, he has participated in post-disaster inspection and assessment work both domestically and abroad. He received his doctorate in structural engineering, as well as a concurrent M.S. in mechanical engineering, from the University of Washington, where he developed robotic bridge inspection techniques.
CS Seminar: The Building Blocks of Data Science
Wednesday, March 19, 2014
In this talk I will make an attempt to flesh out the core components of what is being called Data Science. The umbrella term ``Data Science’’ incorporates elements of Computer Science, Information theory and Statistics expressed in the language of optimization theory. The identification of these core elements will help towards arriving at a declarative framework for Data Science and decouple its use from implementation. This in turn may lead to overcome the ``Data Science Crunch’’ – where organizations own and have access to large quantities of data and appreciate its potential value, but lack human talent and a support framework to exploit it to its fullest.
Speaker's BioProf Sanjay Chawla, University of Sydney Sanjay Chawla is a Professor in the School of Information Technologies, University of Sydney, Australia. His main area of research is data mining and machine learning. More specifically he has been working on three problems of contemporary interest: outlier detection, imbalanced classification and adversarial learning. His research has been published in leading conferences and journals and has been recognized by several best-paper awards. He serves on the editorial board of IEEE TKDE and Data Mining and Knowledge Discovery. Sanjay served as the Head of School from 2008-2011 and was an academic visitor at Yahoo! Labs, Bangalore in 2012. He received his PhD in 1995 from the University of Tennessee, USA.
SWE Seminar: Generating Test Data to Distinguish Conjunctive Queries without Comparisons
Thursday, March 20, 2014
The widespread use of databases in software systems has increased the importance of unit testing the queries that form the interface to these databases. Mutation analysis is a powerful testing technique that has been adapted to test database queries. But each of the existing mutation approaches to testing database queries has one or more of the following shortcomings: inability to recognize equivalent mutants, inability to generate test databases automatically, or inability to apply to mutate all aspects of a query.
In this paper we address all three of these challenges by adapting results from the rich literature on query rewriting. We restrict attention to the class of conjunctive queries without comparisons. In return for this restriction, we give an algorithm that recognizes equivalent mutants, generates a test database that distinguishes each nonequivalent mutant, and applies to arbitrary mutations, as long at the mutation is also a conjunctive query without comparisons. The paper presents the test database generation algorithm and proves that it is sound and complete for conjunctive queries without comparisons. We then illustrate the algorithm on a sample query. We evaluate mutations of the query both with the new technique and compare the results to existing mutation techniques for databases.
Speaker's BioPreetham Vemasani is a first year PhD student at George Mason University. He received the BS degree in Computer Science and Information Technology from JNTU Hyderabad, India and the MS degree in Software Engineering from George Mason University. His research interests are in software testing and databases. He is currently working on 'Generating efficient test data for database queries' with Dr. Paul Ammann and Dr. Alexander Brodsky.
SWE Seminar: Mutation Subsumption Graphs
Thursday, March 20, 2014
Mutation testing researchers have long known that many generated mutants are not needed. This paper develops a graph model to describe redundancy among mutations. We define “true” subsumption, a relation that practicing test engineers would like to have, but cannot due to issues of computability. We also define dynamic subsumption and static subsumption as approximations of “true” subsumption. We explore the properties of the approximate subsumption relations in the context of a small example. We suggest possible uses for subsumption graphs.
Speaker's BioBob Kurtz is a Senior Principal Engineer with Raytheon. Bob has more than 25 years of experience in developing software systems, with a focus on real-time embedded systems. Bob has a B.A. in Computer Science from the State University of New York Empire State College, and an M.S. in Software Engineering from George Mason University. Bob is currently pursuing a Ph.D. in Software Engineering at George Mason University.
CS Seminar: Migratory Compression: Coarse-grained Data Reordering to Improve Compressibility
Friday, March 21, 2014
We propose Migratory Compression (MC), a coarse-grained data transformation, to improve the effectiveness of traditional compressors in modern storage systems. In MC, similar data chunks are re-located together, to improve compression factors. After decompression, migrated chunks return to their previous locations. We evaluate the compression effectiveness and overhead of MC, explore reorganization approaches on a variety of datasets, and present a prototype implementation of MC in a commercial deduplicating file system. We also compare MC to the more established technique of delta compression, which is significantly more complex to implement within file systems.
We find that Migratory Compression improves compression effectiveness compared to traditional compressors, by 11 percent to 105 percent, with relatively low impact on runtime performance. Frequently, adding MC to a relatively fast compressor like gzip results in compression that is more effective in both space and runtime than slower alternatives. In archival migration, MC improves gzip compression by 44–157 percent. Most importantly, MC can be implemented in broadly used, modern file systems.
Joint work with Xing Lin (Univ. of Utah) and Guanlin Lu, Philip Shilane, and Grant Wallace (EMC Corporation—Data Protection and Availability Division) This will be an expanded version of the talk presented by Xing at FAST'14.
Speaker's BioFred Douglis holds a Ph.D. in computer science from U.C. Berkeley. He has worked in industrial applied research throughout his career, including Matsushita, AT&T, IBM, and currently EMC. He also has been a visiting professor at VU Amsterdam and Princeton University. He received an IBM Outstanding Technical Achievement award for his contributions to System S, productized as Infosphere Streams. His research interests include storage, distributed systems, and Internet tools and performance.
He served as EIC of IEEE Internet Computing from 2007-2010 and has been on its editorial board since 1999. He has published one book, 40 workshop or conference papers, 7 journal or magazine articles, and over 50 patents and patent applications.
GRAND Seminar: Some Expeditions in Predictive Modeling to Enable Systems Biology
Friday, March 28, 2014
With the data explosion being witnessed in biology, immense emphasis is being placed on developing systematic approaches to integrate the various types and sources of data to build models of complex biological processes and diseases.
In this talk, I will discuss our efforts to model complex biomedical phenotypes using predictive modeling approaches applied to large genome-wide data sets. The first part presents experiences and results from a collaborative-competitive effort to model and predict survival rates for breast cancer patients using a recently published set of gene expression, copy number aberration and clinical features.
In the second part, I will present our analysis of heterogeneous ensemble predictive methods that generally produce the best performance for complex biomedical prediction problems. These methods leverage the consensus and diversity among hundreds or even thousands of heterogeneous base predictors, and thus generally outperform even the best homogeneous ensemble methods, like boosting and random forests.
Speaker's BioGaurav Pandey is an Assistant Professor in the Department of Genetics and Genomic Sciences at the Mount Sinai School of Medicine (New York) and is part of the newly formed Institute for Genomics and Multiscale Biology. He completed his Ph.D. in computer science and engineering from the University of Minnesota, Twin Cities in 2010, and subsequently completed a post-doctoral fellowship at the University of California, Berkeley. His primary fields of interest are computational biology, genomics and large-scale data analysis and mining, and he has published extensively in these areas.
CS Interdisciplinary Seminar: Computational Learning Sciences
Wednesday, April 09, 2014
The field of Learning Sciences conducts research on how people learn across a range of domains with and through the use of artifacts. This understanding is then used to design more productive learning environments. How people learn, our understanding of how people learn, as well as our ability to design learning environments is undergoing a tremendous transformation with increasing digitization of artifacts and our practices. Increased digitization, in addition to other affordances, implies computational capabilities embedded in artifacts. How does this impact learning? This question gives rise to a new area of research I call Computational Learning Sciences (CLS). In this talk I start an exploration of this area and work towards a problem definition for it by presenting findings from a study of newcomer participation in a Java programming community. I emphasize the inherently socio-technical nature of learning and outline three relevant and useful avenues for CLS research: 1. Content Curation – through aggregation, recommendation, and crowdsourcing; 2. Collaboration Configuration – through analytics and modeling of learner and teacher activity; and, 3. Competency Certification – through formative, dynamic and summative assessment.
Speaker's BioAditya Johri studies the use of information technologies for learning and knowledge sharing, with a focus on cognition in informal environments. His research is funded through several NSF grants including an Early Career Award. He is a co-editor of the Cambridge Handbook of Engineering Education Research (CHEER), Cambridge University Press (2014). He received his Ph.D. in Learning Sciences and Technology Design from Stanford University. He can be reached at email@example.com. More information at: http://mason.gmu.edu/~ajohri3
Oral Defense of Doctoral Dissertation: Unsupervised Bayesian Musical Key and Chord Recognition
Wednesday, April 09, 2014
Many tasks in Music Information Retrieval can be approached using indirection in terms of data abstraction. Raw music signals can be abstracted and represented by using a combination of melody, harmony, or rhythm for musical structural analysis, emotion or mood projection, as well as efficient search of large collections of music. In this dissertation, we focus on two tasks: analyzing tonality and harmony of music signals. Our approach concentrates on transcribing western popular music into its tonal and harmonic content directly from the audio signals. While the majority of the proposed methods adopt the supervised approach which requires scarce manually-transcribed training data, our approach is unsupervised where model parameters for tonality and harmony are directly estimated from the target audio data. First, raw audio signals in the time domain are transformed using undecimated wavelet transform as a basis to build an enhanced 12-dimensional pitch class profile (PCP) in the frequency domain as features of the target music piece. Second, a bag of local keys are extracted from the frame-by-frame PCPs using an infinite Gaussian mixture which allows the audio data to “speak-for-itself” without pre-setting the number of Gaussian components to model the local keys. Third, the bag of local keys is applied to adjust the energy levels in the PCPs for chord extraction.
From experimental results, we demonstrate that our approach – a much simpler one compared to most of the existing methods – performs just as well or outperforms many of the much more complex models for the two tasks without using any training data.
CS Distinguished Lecture Series: Tor and Censorship: Lessons Learned
Friday, April 11, 2014
Tor is a free-software anonymizing network that helps people around the world use the Internet in safety. Tor's 5500 volunteer relays carry traffic for around a million daily users, including ordinary citizens who want protection from identity theft and prying corporations, corporations who want to look at a competitor's website in private, people around the world whose Internet connections are censored, and even governments and law enforcement.
The last year has included major cryptographic upgrades in the Tor software, dozens of research papers on attacking and improving the Tor design, mainstream press about government attempts to attack the Tor network, discussions about funding, FBI/NSA exploitation of Tor Browser users, botnet related load on the Tor network, and other important topics.
In this talk I'll aim to strike a balance between explaining Tor's "intellectual merit" side (all the neat research problems that Tor raises, and how we've positioned ourselves to get so much attention from academics) and Tor's "broader impact" side (the many ways that Tor has changed lives around the world).
Speaker's BioRoger Dingledine is project leader for The Tor Project, a US non-profit working on anonymity research and development. While at MIT he developed Free Haven, one of the early peer-to-peer systems that emphasized resource management while maintaining anonymity for its users. He works with the Electronic Frontier Foundation, the US Navy, Voice of America, the National Science Foundation, and other organizations to design and develop systems for anonymity and traffic analysis resistance. He organizes academic conferences on anonymity, speaks at such events as Blackhat, Defcon, Toorcon, and the CCC congresses, and also does tutorials on anonymity for national and foreign law enforcement. Roger was honored in 2006 as one of the top 35 innovators under the age of 35 by Technology Review magazine.
SANG Seminar: Tackling the Challenges of I/O Virtualization in Data Centers
Friday, April 25, 2014
Large-scale data centers leverage virtualization to achieve high resource utilization, scalability, and availability. While ideally the performance of an application running inside a virtual machine (VM) shall be independent of co-located applications and VMs that share the physical resources, current systems are yet to achieve this goal.
In this talk, I will describe our efforts in addressing a number of challenges in order to achieve optimal I/O performance in such virtualized systems. Specifically, TRACON constructs mathematical models and scheduling algorithms to mitigate the VM interference; Matrix leverages machine learning and optimization techniques to allocate VM resource in a way that minimizes the cost while achieving good application performance; and Mortar enhances the hypervisor to pool together spare memory on each machine and expose it as a volatile data cache to improve virtual I/O performance.
Speaker's BioHowie Huang is an Assistant Professor in Department of Electrical and Computer Engineering at the George Washington University. His research interests are in the areas of computer systems and architecture, including cloud computing, big data computing, high-performance computing and storage systems. His projects won the Best Poster Award at PACT'11, ACM Undergraduate Student Research Competition at SC'12, and a finalist for the Best Student Paper Award at SC'11. He was a recipient of the NSF CAREER Award in 2014, NVIDIA Academic Partnership Award in 2011, IBM Real Time Innovation Faculty Award in 2008, and School of Engineering and Applied Science Outstanding Young Researcher Award in 2014. He received his Ph.D. in Computer Science from the University of Virginia in 2008.
Oral Defense of Doctoral Dissertation: Quantitative Framework to Design Services with Intrusion Tolerant QoS
Friday, April 25th, 2014
Large software systems can be designed as a set of loosely coupled services interacting with each other; simple services can be composed to form more complex services. But, for services to be usable in production, they must satisfy non-functional requirements, especially security-related quality of service in order to ensure confidentiality, integrity, and availability. Unfortunately, software vulnerabilities expose these services to malicious actors, and make them susceptible to attacks. Due to the distributed and decentralized nature of services, publishing and guaranteeing security quality of service are crucial so that potential applications and clients can make use of the provided services. Moreover, intrusion prevention and detection are not perfect in securing services, due to the increased sophistication of malicious attacks. This has motivated the addition of the Intrusion Tolerant component to complement the line of defense for applications and services. Given the need of making services intrusion-tolerant, my research focuses on providing a Quantitative Framework for Intrusion Tolerant Services, a systematic approach to model, design and implement services with Intrusion Tolerant Quality of Service (IT-QoS). The approach relies on the foundation of the recovery-based architecture of Self Cleansing Intrusion Tolerance, and a correlation component, which is based on Semi-Markov Process for computing IT-QoS metrics, and discovering a mathematical dependency between those metrics and the intrusion tolerance control parameters such as the service exposure window. To system architects of service providers, the framework would constitute as the basis for ensuring differentiated levels of certain IT-QoS metrics such as Secure Availability, and Mean Time To Security Failure, which are indicators of the reliability of a service operating in the presence of cyber security threats.
Oral Defense of Doctoral Dissertation: An Analysis of a Model-based Evolutionary Algorithm: Learnable Evolution Model
Monday, April 28, 2014
An evolutionary algorithm (EA) is a biologically inspired metaheuristic that uses mutation, crossover, reproduction, and selection operators to evolve solutions for a given problem. Learnable Evolution Model (LEM) is an EA that has an evolutionary algorithm component that works in tandem with a machine learner to collaboratively create populations of individuals. The machine learner infers rules from best and least fit individuals, and then this knowledge is exploited to improve the quality of offspring. Unfortunately, most of the extant work on LEM has been ad hoc, and so there does not exist a deep understanding of how LEM works. And this lack of understanding, in turn, means that there is no set of best practices for implementing LEM. For example, most LEM implementations use rules that describe value ranges corresponding to areas of higher fitness in which offspring should be created. However, we do not know the efficacy of different approaches for sampling those intervals. Also, we do not have sufficient guidance for assembling training sets of positive and negative examples from populations from which the ML component can learn.
This research addresses those open issues by exploring three different rule interval sampling approaches as well as three different training set configurations on a number of test problems that are representative of the types of problems that practitioners may encounter. Using the machine learner to create offspring induces a unique emergent selection pressure separate from the selection pressure that manifests from parent and survivor selection; an outcome of this research is a partially ordered set of the impact that these rule interval sampling approaches and training set configurations have on this selection pressure that practitioners can use for implementation guidance. That is, a practitioner can modulate selection pressure by traversing a set of design configurations within a Hasse graph defined by partially ordered selection pressure.
Department of Computer Science: Graduation Celebration and Awards Dinner
Wednesday, May 14, 2014
Volgenau School of Engineering: Convocation
Thursday, May 15, 2014
Convocation is for Volgenau School of Engineering graduates only. Each graduate is individually recognized and photographed crossing the stage. All guests are welcome; tickets are not required. Convocation is a traditional ceremony where caps and gowns are worn by graduates and faculty, the gowns for graduates are green.
Where do I get my cap and gown? Bachelor's and Master's Candidates Bachelor's and master's candidates may purchase caps, gowns, announcements and pick-up commencement tickets at the Commencement Fair, March 4, 5, 6, from 11am – 7pm, in the University Bookstore. The contact phone number for the University Bookstore in the Johnson Center is (703) 993-2666.
Master's candidates must also wear a master's hood. Please state your field of study when purchasing your hood as different colors denote different fields of study.
PhD Candidates Doctoral candidates and faculty should order regalia through the University Bookstore as soon as possible to avoid late delivery fees. Contact number is (703) 993-2666.
Faculty Faculty should order regalia through the University Bookstore (703) 993-2666. Any late orders will be charged a $20 late fee in order to cover the cost of express shipping. Due to limited supply, it is important to get all rental orders in early to guarantee appropriate hood color. Please contact Karen Eiserman at 703-993-3833
Department of Computer Science: Post-Convocation Reception
Thursday, May 15, 2014
Event InfoImmediately following the Volgenau School of Engineering Convocation Ceremony. All graduates from the Computer Science Department and their guests are welcome.
Oral Defense of Doctoral Dissertation: Management of Uncertainty in Self-Adaptive Software
Tuesday, May 20, 2014
The ever-growing complexity of software systems coupled with the need to maintain their quality of service (QoS) characteristics, even under adverse conditions and highly uncertain environments, have instigated the emergence of self-adaptive software systems. A self-adaptive software system has the mechanisms that automate and simplify the management and modification of software systems after they are deployed, (i.e., during run-time) to achieve certain functional or QoS goals.
While the benefits of such systems are plenty, their development has shown to be significantly more challenging than static and predictable software systems. One key culprit is that self-adaptation is subject to uncertainty. Uncertainty can be observed in every facet of adaptation, albeit at varying degrees. It follows from the fact that the system's user, adaptation logic, and business logic are loosely coupled, introducing numerous sources of uncertainty. This challenges the confidence with which the adaptation decisions are made. A key observation is that while the level of uncertainty could vary, no self-adaptive software system is ever completely free of it.
This is precisely the challenge I addressed in this research. I have presented a general quantitative approach for tackling the complexity of automatically making adaptation decisions under uncertainty. I redefined the conventional definition of optimal solution to one that has the best range of behavior. In turn, the selected solution has the highest likelihood of satisfying the system's quality objectives, even if due to uncertainty, properties expected of the system are not borne out in practice.
In this dissertation, I begin with describing the problem and providing an overview of the solutions. Then, I define what I mean by uncertainty and position the approach with regard to the related work. Next, I provide the theoretical detail of the approach, which includes the formal specification of the problem and two (combinatorial and evolutionary) optimization techniques. I also describe some techniques for quantifying uncertainty and required step for effecting adaptation decisions at run-time. Finally, I discuss the implementation details of the framework. My experience with the approach, including experimental evaluation and porting the framework to a new problem domain, has been very positive. The evaluation results have strongly corroborated my hypotheses.
CS Interdisciplinary Seminar: Smart Cars for Safe Pedestrians
Wednesday, June 18, 2014
One of the most significant large-scale deployments of intelligent systems in our daily life nowadays involves driver assistance in smart cars.
Accident statistics show that roughly one quarter of all traffic fatalities world-wide involve vulnerable road users (pedestrians, bicyclists); most accidents occur in an urban setting. Devising an effective driver assistance system for vulnerable road users has long been impeded, however, by the "perception bottleneck", i.e. not being able to detect and localize vulnerable road users sufficiently accurate. The problem is challenging due to the large variation in object appearance, the dynamic and cluttered urban backgrounds, and the potentially irregular object motion. Topping these off are stringent performance criteria and real-time constraints. I give an overview of the remarkable computer vision progress that has been achieved in this area and discuss the main enablers: the algorithms, the data, the hardware and the tests.
Daimler has recently introduced an advanced set of driver assistance functions in its Mercedes-Benz 2013-2014 S-, E-, and C-Class models, termed “Intelligent Drive”, using stereo vision. It includes a pedestrian safety component which facilitates fully automatic emergency braking - the system works day and night. I discuss “Intelligent Drive” and future research directions, on the road towards accident-free driving.
Speaker's BioDariu M. Gavrila received the PhD degree in computer science from the University of Maryland at College Park, USA, in 1996. Since 1997, he has been with Daimler R&D in Ulm, Germany, where he is currently a Principal Scientist. In 2003, he was further appointed professor at the University of Amsterdam, chairing the area of Intelligent Perception Systems (part time). Over the past 15 years, Prof. Gavrila has focused on visual systems for detecting humans and their activity, with application to intelligent vehicles, smart surveillance and social robotics. He led the multi-year pedestrian detection research effort at Daimler, which materialized in the Mercedes-Benz S-, E-, and C-Class models (2013-2014). He is frequently cited in the scientific literature and he received the I/O 2007 Award from the Netherlands Organization for Scientific Research (NWO) as well as several conference paper awards. His personal Web site is www.gavrila.net.
Oral Defense of Doctoral Dissertation: Profiling, Tracking, and Monetizing: An Analysis of Internet and Online Social Network Concerns
Wednesday, July 02, 2014
This dissertation explores concerns facing Internet and specifically Online Social Network users. The attacks we discuss can lead to identity theft, biased and tailored website content delivery, geolocation threats, monetization, and an overall lack of privacy. We introduce a profiling and tracking attack that correlates a user’s online persona that is captured from seemingly innocuous website traffic (e.g., operating system, search engine, browser, time spent on website, etc.) with that of the same user’s real Facebook profile through analytics captured from a custom Facebook Fan Page. We show how an adversary might identify the personally identifiable information of the user given only their online persona.
The protection of one’s identity is paramount especially for user’s working in the intelligence community. As a result, these organizations are currently employing privacy preserving technologies as part of their standard network defenses to anonymize their outbound traffic. Our results show that while network-level anonymity systems are better at protecting end-user privacy than having no privacy preserving technology in place, they are unable to thwart de-anonymization attacks aimed at applications and private data of end-users. We demonstrate and substantiate our claims using a targeted experiment against actual operational scenarios of real-world users who are relying on an implementation of a privacy preserving technology. To this end, we execute multiple attacks associated with network monitoring, phishing, and Online Social Networks.
We also discuss how a user can be monetized through an attack vector such as spam. Spam is a profit-fueled enterprise, and cybercriminals are focusing more of their efforts at growing Online Social Networks. One of the common methods of monetizing Online Social Network spam is to entice users to click on links promising free gift cards and iPads. However, these links actually lead to ad networks that bombard users with surveys in an attempt to collect personal and contact information that they will sell to other marketers. To date, we lack a solid understanding of this enterprise’s full structure. We examined the survey scam process to determine the affiliates that are behind this lucrative scam by performing an analysis of five months of Facebook spam data. We provide the first empirical study and analysis of survey scams and demonstrate how to determine which ad networks are sponsoring this spam.
Lastly, we focus on why people act in an insecure way when handling their personal information especially passwords and personal images. This is a major problem as seen in revenge porn and sextortion related cases. Often the victim’s life has been negatively altered in direct relationship to these types of cases. Using a combination of well-known human-computer interaction methods such as surveys and exit interviews combined with custom software (e.g., Cloudsweeper and Gmail Image Extractor) we show that users act differently if they visually see the threat associated with their security behavior. We analyze the results of Cloudsweeper which is designed to scan Google Mail accounts and report any cleartext passwords, their associated monetary value, and provides the option for passwords to be encrypted and redacted. Additionally, we introduce for the first time the Google Image Extractor which is designed to extract images from the user’s Google Mail account and provides the opportunity for users to delete their images seamlessly. Our contributions will help determine if there is a need for such applications as well as determine if convenience or privacy prevails.
The main takeaway is users were not educated about quite prevalent attack vectors for compromising client systems and violating user privacy. We show the extent to which information made freely available on the Internet, can negatively impact the organization and users. Upon completion of the experiments, we compiled the results and presented it as security awareness briefings. The security awareness briefings in combination with our software tools will help mitigate some of the concerns we present and discuss in this dissertation.
Oral Defense of Doctoral Dissertation: An Approach to Analyzing and Recognizing Human Gait
Monday, July 21, 2014
Gait analysis has been an active area of research in computer vision for a long time. It is also important for rehabilitation science where clinicians explore innovative ways helping to analyze gait of different people. The traditional ways to study gait rely on 3D optical motion capture systems which involve the use of cumbersome active/passive markers to be placed on a subject’s body.
The attachment of markers to the segments hinder natural patterns of movement and may lead to altered gait information. Automated gait analysis has been proposed as a solution to this problem. The aim of automated gait analysis is to provide information about the gait parameters and gait determinants from video without using markers. Gait is a repetitive, highly constrained and periodic activity. Different gait determinants are active in different phases of the gait cycle to minimize the excursion of the body’s center of gravity and help produce forward progression with the least expenditure of energy. The motion of limb segments encode information about different phases of gait cycle. However, estimating the motion of limbs from the videos is challenging since limbs are self occluding and only apparent motion can be observed using the images. To add to the issue,the quality of the recorded video (color contrast, cluttered background) and clothing worn by the subject can play a significant role in the computation of that apparent motion.
In this thesis, we present novel methods using image flow to identify different phases (double support, mid swing, toe off and heel strike) of a gait cycle. We use the torso excursion information and lower legs rotational velocities to identify these phases. The top 30 percent of the subject’s body is used to estimate the torso instantaneous velocity. The zero values of the vertical component of the velocity identifies the double support and mid swing phases. Utilizing these phases, we present approaches to approximate the lower leg motions using translation and rotational motion models. The zero values of the instantaneous rotational velocity of the lower legs determine the toe off and heel stike events. We also apply a modified version of color tracking algorithm to track the hand position during the gait cycle. Some of the limb segments such as upper legs and upper arms get occluded by other segments during majority of the gait cycle. We have presented a method to model the motion synergies between pairs of segments (upper leg–lower leg, foot–lower leg and upper arm–lower arm) using data from a 3D motion capture database. The estimated parameters of the non-linear dynamic models depend on the phases of the gait cycle being considered. We implement an Unscented Kalman Filter that estimates the angular position of the unobserved limb segments (upper leg, foot, upper arm and forearm) based on these models and the motion data obtained for the observed limb segments (lower leg, hand). We compare our results with those obtained from a 3D motion capture system and by manually labeling the images. The small error in the results demonstrates the sensitivity and specificity of our techniques. Finally, we use histograms of normal flow to represent the motion patterns of different regions of the body. We measure the motion similarity between two image frames using the cosine similarity measure for comparing two histograms. Computing this measure between all the pairs of image frames in the two gait sequences gives a similarity matrix as a feature. These features are used in Support Vector Machines and Dynamic Programming together with the information about the phases of gait to compare two gait sequence. We demonstrate our approach on a publicly available gait dataset and present the analysis.
In summary, we establish that we can capture segmental data using a markerless gait analysis system. These data are sensitive, reliable and provide recognizable clinically relevant information about motion through all phases of gait.
CS Seminar: Computer Agents that Negotiate Proficiently with People
Tuesday, July 22, 2014
Negotiation and persuasion are tools for social influence that are endemic to human interaction, from personal relationships and business partnerships to political debate. The inclusion of people presents novel problems for the design of automated agents’ negotiation and persuasion strategies. People do not adhere to the optimal, monolithic strategies that can be derived analytically. Their negotiation behavior is affected by a multitude of social and psychological factors. In this talk I will show how combining machine learning techniques for opponent modeling with human behavioral models, formal decision-making and heuristics approaches enable agents to interact well with people. Applications include intelligent agents that help drivers reduce energy consumption, agents that support rehabilitation and employer-employee negotiation.
Speaker's BioSarit Kraus (Ph.D. Computer Science, Hebrew University, 1989) is a Professor of Computer Science at Bar-Ilan University and an Adjunct Professor at the University of Maryland. Her research is focused on intelligent agents and multi-agent systems (including people). Kraus was awarded the IJCAI Computers and Thought Award, ACM SIGART Agents Research award, the EMET prize and her paper with Prof. Barbara Grosz was a winner of the IFAAMAS influential paper award (joint winner). She is AAAI and ECCAI fellow and a recipient of the advanced ERC grant. http://u.cs.biu.ac.il/~sarit/index.html
Oral Defense of Doctoral Dissertation: Decision Guidance Query Language (DGQL), Algorithms and System
Friday, July 25, 2014
Decision optimization is widely used in many Decision Support and Guidance Systems (DSGS) to support business decisions such as procurement, scheduling and planning. In spite of rapid changes in customer requirements, the implementation of DSGS is typically rigid, expensive and not easily extensible, in stark contrast to the agile implementation of information systems based on the DBMS and SQL technologies. This dissertation introduces the Decision Guidance Query Language (DGQL) designed to (re-)use SQL programs for decision optimization with the goals of making DSGS implementation agile and intuitive, and leveraging existing investment in SQL-implemented systems. This dissertation addresses several related technical issues with DGQL: (1) how to annotate existing queries to precisely express the optimization semantics, (2) how to translate the annotated queries into equivalent mathematical programming (MP) formulations that can be solved efficiently using existing industrial solvers, and (3) how to develop specialized optimization algorithms for a class of multi-stage production problems modeled in DGQL.
The algorithms for the multi-stage production network utilize the fact that only part of the problem is dynamic, e.g., the demand for output products in a manufacturing process, whereas the rest of the problem is static, e.g., the connectivity graph of the assembly processes and the cost functions of machine assemblies. An online decomposition algorithm (ODA) is developed based on offline preprocessing of static assembly components to create an approximated cost function, which is used to decompose the original problem into smaller problems and significantly improve solution quality and time complexity. The preprocessing of each static assembly component involves discretizing assembly output, finding the corresponding optimal machine configuration, and constructing a piece-wise linear approximation of the assembly cost function. An adaptive preprocessing algorithm (APA) is introduced that considers only a small part of the discretized points by classifying outputs based on their predicted machine configuration. An initial experimental evaluation suggests that (1) machine generated MP models introduce little or no degradation in performance as compared with expertly crafted models, (2) ODA, using offline preprocessing, leads to an order of magnitude improvement in quality of solutions and optimization time as compared to MILP, and (3) ADA shows significant improvement in preprocessing time with no reduction in the quality of the online solution.
Oral Defense of Doctoral Dissertation: A Highly Recoverable Filesystem for Solid State Drives
Thursday, July 31, 2014, 1:30 pm
Recovering deleted information from storage drives is a long-standing problem. Prior research has approached information recovery by developing file-carving techniques. However, two issues present significant challenges to on-going efforts. 1) Prior knowledge of file types is required to construct file carvers, including file headers and footers, and 2) fragmentation prevents file carvers from achieving successful recovery. More recently, solid-state drives (SSDs) have become more popular. SSDs provide several advantages over mechanical hard drives such as smaller sizes, the lack of moving parts, and provide better performance. However, due to problems such as wear leveling and write amplification in SSDs, files are severely fragmented and thus exacerbate the data recovery problem. In addition, SSDs use TRIM and garbage collection schemes to enhance their performance, which can permanently remove data immediately after executing a delete operation.
In this dissertation, I developed a framework for recovering deleted files without the knowledge of the file type amidst significant fragmentation. I developed the Recovery Filesystem by modifying an existing implementation of the exFat filesystem running on top of FUSE. The central idea underlying the Recovery Filesystem is a special identifier embedded in each data block. The identifier monitors each block by mapping the data block to a single file regardless of the file status, existing or deleted. The block sequence number and creation timestamp are also maintained to facilitate the recovery process. In addition, I developed a garbage collection scheme for SSDs that maximizes data retention without sacrificing SSD performance.
The experiments conducted in this dissertation demonstrate that the Recovery Filesystem yields acceptable read/write performance results. In addition, file recovery experiments used to compare the Recovery Filesystem with open source recovery techniques demonstrate that the Recovery Filesystem provides significant advantages in the case of fragmented data.
Volgenau School of Engineering: New Graduate Student Orientation
Monday, August 18, 2014
Event InfoVolgenau School of Engineering is welcoming its newly admitted graduate students to a special orientation event.
It is highly recommended that both domestic and international students plan to attend. Essential information regarding university services for graduate students and program information from academic departments will be provided. Also, this is your opportunity to meet your peers, the administrative and academic staff members who will assist you during the pursuit of your graduate course work and degree.
Department of Computer Science: GTA Reception and Orientation
Tuesday, August 19, 2014
This orientation is mandatory for all CS department graduate teaching assistants.
Test-out Exam: INFS 501
Tuesday, August 19, 2014
Test-out Exam: INFS 515
Tuesday, August 19, 2014
Test-out Exam: INFS 519
Tuesday, August 19, 2014
Department of Computer Science: PhD Student Orientation and Reception
Tuesday, August 19, 2014
Test-out Exam: SWE 510
Tuesday, August 19, 2014
Department of Computer Science: GTA Teaching Workshop
Wednesday, September 03, 2014
This workshop is mandatory for all CS department graduate teaching assistants.
New Student Fall Welcome 2014: BS ACS & BS CS
Tuesday, September 09, 2014
Distinguished Lecture Series: Probabilistic Topic Models and User Behavior
Friday, September 26, 2014
Probabilistic topic models provide a suite of tools for analyzing large document collections. Topic modeling algorithms discover the latent themes that underlie the documents and identify how each document exhibits those themes. Topic modeling can be used to help explore, summarize, and form predictions about documents. Topic modeling ideas have been adapted to many domains, including images, music, networks, genomics, and neuroscience.
Traditional topic modeling algorithms analyze a document collection and estimate its latent thematic structure. However, many collections contain an additional type of data: how people use the documents. For example, readers click on articles in a newspaper website, scientists place articles in their personal libraries, and lawmakers vote on a collection of bills. Behavior data is essential both for making predictions about users and for understanding how a collection is organized.
I will review the basics of topic modeling and describe our recent research on collaborative topic models, models that simultaneously analyze a collection of texts and its corresponding user behavior. > We studied collaborative topic models on scientists' libraries (from Mendeley) and scientists' click data (from the arXiv). Collaborative topic models enable interpretable recommendation systems, capturing scientists' preferences and pointing them to articles of interest. Further, they help organize the articles according to the discovered patterns of readership. For example, we can identify articles that are important within a field, and articles that transcend disciplinary boundaries.
More broadly, topic modeling is a case study in the large field of applied probabilistic modeling. I will survey some recent advances in this field. I will show how modern probabilistic modeling gives data scientists a rich language for expressing statistical assumptions and scalable algorithms for uncovering hidden patterns in massive data.
Biography:David Blei is a Professor of Statistics and Computer Science at Columbia University. His research is in statistical machine learning, involving probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference. He works on a variety of applications, including text, images, music, social networks, user behavior, and scientific data.
David earned his Bachelor's degree in Computer Science and Mathematics from Brown University (1997) and his PhD in Computer Science from the University of California, Berkeley (2004). Before arriving to Columbia, he was an Associate Professor of Computer Science at Princeton University. He has received several awards for his research, including a Sloan Fellowship (2010), Office of Naval Research Young Investigator Award (2011), Presidential Early Career Award for Scientists and Engineers (2011), Blavatnik Faculty Award (2013), and ACM-Infosys Foundation Award (2013).
Distinguished Lecture Series: Learning and Multiagent Reasoning for Autonomous Robots
Wednesday, November 12, 2014
Biography:Dr. Peter Stone is an Alfred P. Sloan Research Fellow, Guggenheim Fellow, AAAI Fellow, Fulbright Scholar, and University Distinguished Teaching Professor in the Department of Computer Science at the University of Texas at Austin. He received his Ph.D in Computer Science in 1998 from Carnegie Mellon University. From 1999 to 2002 he was a Senior Technical Staff Member in the Artificial Intelligence Principles Research Department at AT&T Labs - Research. Peter's research interests include machine learning, multiagent systems, robotics, and e-commerce. In 2003, he won a CAREER award from the National Science Foundation for his research on learning agents in dynamic, collaborative, and adversarial multiagent environments. In 2004, he was named an ONR Young Investigator for his research on machine learning on physical robots. In 2007, he was awarded the prestigious IJCAI 2007 Computers and Thought award, given once every two years to the top AI researcher under the age of 35. In 2013 he was awarded the University of Texas System Regents' Outstanding Teaching Award and in 2014 he was inducted into the UT Austin Academy of Distinguished Teachers.
CS Distinguished Lecture Series: Greening Datacenters: Past, Present and Future
Friday, February 27, 2015
Datacenters host the server infrastructure that powers organizations of many sizes, from universities and enterprises to large Internet services. Collectively, datacenters consume a massive amount of power, representing a financial burden for datacenter operators, an infrastructure burden on power utilities, and an environmental burden on society. However, this problem could be worse if it were not for several advances made over the last decade, especially in the design of large datacenters. In this talk, I will overview the architecture of these datacenters, discuss the main advances made to date, and speculate about directions for the future.
Speaker's BioDr. Ricardo Bianchini received his PhD degree in Computer Science from the University of Rochester in 1995. He is a Professor of Computer Science at Rutgers University, but is currently on leave working as Microsoft's Chief Efficiency Strategist. His main interests include cloud computing, and power/energy/thermal management of datacenters. In fact, Dr. Bianchini is a pioneer in datacenter energy management, energy-aware storage systems, energy-aware load distribution across datacenters, and leveraging renewable energy in datacenters. Dr. Bianchini has co-chaired the program committee of several conferences and workshops, and currently serves on the editorial board of four journals. He has published eight award papers, and has received the CAREER award from the National Science Foundation. He is currently an ACM Distinguished Scientist.
Distinguished Lecture Series: Confessions of an Accidental Greenie: From Green Destiny to the Green 500
Wednesday, April 29, 2015
Biography:Dr. Wu-chun Feng — or more simply, "Wu" — is a professor of computer science and electrical & computer engineering at Virginia Tech, where he directs the Systems, Networking, and Renaissance Grokking (SyNeRGy) Laboratory. His research interests span many areas of high-performance networking and computing from hardware to applications software.
To the computer science and engineering community, he is perhaps best known for his systems-level research in high-performance networking, ranging from systems-area network architectures such as Quadrics and 10-Gigabit Ethernet (10GigE) to wide-area network frameworks and implementations in support of distributed computing such as adaptive flow control for TCP (i.e., DRS: Dynamic Right-Sizing) and hybrid circuit- and packet-switched networks (i.e., CHEETAH: Circuit-switched High-speed End-to-End Transport ArcHitecture) and the autonomic rate-adaptive protocols that run on them.
To the general scientific community, he is oftentimes referred to as "Mr. Green Destiny" or "The Green Destiny Guy." Green Destiny debuted in early 2002 as the first major instantiation of the Supercomputing in Small Spaces project. It was a 240-processor supercomputer with a footprint of five square feet and a power envelope of a mere 3.2 kilowatts that debuted in early 2002. This supercomputer, which produced an admirable Linpack rating of 101 Gflops, operated without any unscheduled downtime for its two-year lifetime while running in an 85° F warehouse at 7,400 feet above sea level with no air conditioning, no air humidification, and no air filtration. Green Destiny garnered international attention in over 100 media outlets including BBC News, CNN, and The New York Times and led in part to Dr. Feng being named to HPCwire's Top People to Watch List in 2004.
Dr. Feng received a B.S. in Electrical & Computer Engineering and Music (Honors) and an M.S. in Computer Engineering from Penn State University in 1988 and 1990, respectively. He earned a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 1996. His previous professional stints include IBM T.J. Watson Research Center, NASA Ames Research Center, Vosaic, University of Illinois at Urbana-Champaign, Purdue University, The Ohio State University, Orion Multisystems, and Los Alamos National Laboratory.