Applications of Synthetic High Dimensional Data

Applications of Synthetic High Dimensional Data
Author: Sobczak-Michalowska, Marzena
Publisher: IGI Global
Total Pages: 315
Release: 2024-03-25
Genre: Computers
ISBN:

Download Applications of Synthetic High Dimensional Data Book in PDF, Epub and Kindle

The need for tailored data for machine learning models is often unsatisfied, as it is considered too much of a risk in the real-world context. Synthetic data, an algorithmically birthed counterpart to operational data, is the linchpin for overcoming constraints associated with sensitive or regulated information. In high-dimensional data, where the dimensions of features and variables often surpass the number of available observations, the emergence of synthetic data heralds a transformation. Applications of Synthetic High Dimensional Data delves into the algorithms and applications underpinning the creation of synthetic data, which surpass the capabilities of authentic datasets in many cases. Beyond mere mimicry, synthetic data takes center stage in prioritizing the mathematical domain, becoming the crucible for training robust machine learning models. It serves not only as a simulation but also as a theoretical entity, permitting the consideration of unforeseen variables and facilitating fundamental problem-solving. This book navigates the multifaceted advantages of synthetic data, illuminating its role in protecting the privacy and confidentiality of authentic data. It also underscores the controlled generation of synthetic data as a mechanism to safeguard private information while maintaining a controlled resemblance to real-world datasets. This controlled generation ensures the preservation of privacy and facilitates learning across datasets, which is crucial when dealing with incomplete, scarce, or biased data. Ideal for researchers, professors, practitioners, faculty members, students, and online readers, this book transcends theoretical discourse.

Machine Learning Aided Feature Selection for Ultrahigh Dimensional Data

Machine Learning Aided Feature Selection for Ultrahigh Dimensional Data
Author: Arkaprabha Ganguli
Publisher:
Total Pages: 0
Release: 2023
Genre: Electronic dissertations
ISBN:

Download Machine Learning Aided Feature Selection for Ultrahigh Dimensional Data Book in PDF, Epub and Kindle

The field of statistical machine learning has seen a surge in popularity for feature selection methods for ultra-high dimensional datasets due to their huge applicability in various scientific domains ranging from genetics to astronomy. These applications typically involve a vast number of potential features, and a quantitative response or outcome variable. Also, often it is observed/hypothesized that only a small subset of these features are truly associated with the response. Any traditional feature selection algorithm is motivated by the need to uncover the true sparsity pattern, buried in the ultra-high dimensional data setting. However, these methods may lead to high false discoveries providing poor scientific insights into the underlying relationship. The error-controlled methods are designed to address this issue by controlling the expected proportion of falsely identified features among the selected ones. In this thesis, we develop and study two novel feature selection methods for ultrahigh dimensional data with False Discovery Rate (FDR) control with a real-world application in the context of diffusion magnetic resonance imaging (DMRI) tractography data.In the first chapter, we propose a p-value-free FDR controlling method for feature selection. Most of the state-of-the-art methods in the literature for controlling FDR rely on p-value, which depends on specific assumptions on the data distribution and may be questionable in some high-dimensional settings. To surpass this problem, we propose a 'screening \\& cleaning' strategy consisting of assigning importance scores to the predictors, followed by constructing an estimate of the FDR. We study the theoretical properties of the method and demonstrate its superior performance compared to existing methods in an extensive simulation study. Finally, we apply the method to a gene expression dataset and identify important genes associated with drug sensitivity.In the second chapter, We extend the feature selection method from a linear model to a non-linear and non-parametric setting by utilizing the Deep Learning (DL) framework. The DL has been at the center of analytics in recent years due to its impressive empirical success in analyzing complex data objects. Despite this success, most existing tools behave like black-box machines, thus the increasing interest in interpretable, reliable, and robust deep learning models applicable to a broad class of applications. Feature-selected deep learning has emerged as a promising tool in this realm. However, the recent developments do not accommodate ultra-high dimensional and highly correlated features or high noise levels. In this article, we propose a novel screening and cleaning method with the aid of deep learning for a data-adaptive multi-resolutional discovery of highly correlated predictors with a controlled FDR. Extensive empirical evaluations over a wide range of simulated scenarios and several real datasets demonstrate the effectiveness of the proposed method in achieving high power while keeping the false discovery rate at a minimum.In the third and final chapter, we apply the proposed feature selection methods to the brain imaging tractography dataset. Our motivation comes from the evidence from studies of dementia which shows that some older adults continue to maintain their cognitive abilities despite signs of ongoing neuropathological diseases. Commonly referred to as cognitive reserve, this phenomenon has unclear neurobiological substrates and a current understanding of corresponding markers is lacking. This study aims at investigating the immense system of structural connections between brain regions constituting subcortical white matter (WM) as potential markers of cognitive reserve. Diffusion MRI tractography is an established computational neuroimaging method to model WM fiber organization throughout the brain. Standard statistical analyses capable of leveraging the high dimensionality of tractography data face additional methodological complications beyond those encountered in typical feature selection problems. Our proposed methodology is specifically tailored for addressing these concerns. Extensive simulation studies on synthetic datasets mimicking the real tractography dataset demonstrate a substantial gain in power with minimal false discoveries, compared with state-of-the-art methods for feature selection. Our application to predicting cognitive reserve in a clinical aging neuroimaging tractography dataset produces anatomically meaningful discoveries in brain regions associated with risk and resilience to neurodegeneration.Overall, this thesis presents novel and effective methods for feature selection in ultrahigh dimensional settings. Our proposed framework would benefit the researchers and professionals who encounter the difficulty of choosing pertinent variables from correlated and vast datasets in diverse fields, ranging from finance and social sciences to biology.

Synthetic Data

Synthetic Data
Author: Jimmy Nassif
Publisher: Springer Nature
Total Pages: 186
Release: 2024-01-03
Genre: Computers
ISBN: 3031475607

Download Synthetic Data Book in PDF, Epub and Kindle

The book concentrates on the impact of digitalization and digital transformation technologies on the Industry 4.0 and smart factories, how the factory of tomorrow can be designed, built, and run virtually as a digital twin likeness of its real-world counterpart, before the physical structure is actually erected. It highlights the main digitalization technologies that have stimulated the Industry 4.0, how these technologies work and integrate with each other, and how they are shaping the industry of the future. It examines how multimedia data and digital images in particular are being leveraged to create fully virtualized worlds in the form of digital twin factories and fully virtualized industrial assets. It uses BMW Group’s latest SORDI dataset (Synthetic Object Recognition Dataset for Industry), i.e., the largest industrial images dataset to-date and its applications at BMW Group and Idealworks, as one of the main explanatory scenarios throughout the book. It discusses the need of synthetic data to train advanced deep learning computer vision models, and how such datasets will help create the “robot gym” of the future: training robots on synthetic images to prepare them to function in the real world.

BIG DATA ANALYTICS

BIG DATA ANALYTICS
Author: Parag Kulkarni
Publisher: PHI Learning Pvt. Ltd.
Total Pages: 206
Release: 2016-07-07
Genre: Language Arts & Disciplines
ISBN: 8120351169

Download BIG DATA ANALYTICS Book in PDF, Epub and Kindle

The book is an unstructured data mining quest, which takes the reader through different features of unstructured data mining while unfolding the practical facets of Big Data. It emphasizes more on machine learning and mining methods required for processing and decision-making. The text begins with the introduction to the subject and explores the concept of data mining methods and models along with the applications. It then goes into detail on other aspects of Big Data analytics, such as clustering, incremental learning, multi-label association and knowledge representation. The readers are also made familiar with business analytics to create value. The book finally ends with a discussion on the areas where research can be explored.

PRICAI 2019: Trends in Artificial Intelligence

PRICAI 2019: Trends in Artificial Intelligence
Author: Abhaya C. Nayak
Publisher: Springer Nature
Total Pages: 729
Release: 2019-08-23
Genre: Computers
ISBN: 3030299112

Download PRICAI 2019: Trends in Artificial Intelligence Book in PDF, Epub and Kindle

​This three-volume set, LNAI 11670, LNAI 11671, and LNAI 11672 constitutes the thoroughly refereed proceedings of the 16th Pacific Rim Conference on Artificial Intelligence, PRICAI 2019, held in Cuvu, Yanuca Island, Fiji, in August 2019. The 111 full papers and 13 short papers presented in these volumes were carefully reviewed and selected from 265 submissions. PRICAI covers a wide range of topics such as AI theories, technologies and their applications in the areas of social and economic importance for countries in the Pacific Rim.

Database and Expert Systems Applications

Database and Expert Systems Applications
Author: Roland Wagner
Publisher: Springer
Total Pages: 927
Release: 2007-08-23
Genre: Computers
ISBN: 354074469X

Download Database and Expert Systems Applications Book in PDF, Epub and Kindle

This volume constitutes the refereed proceedings of the 18th International Conference on Database and Expert Systems Applications held in September 2007. Papers are organized into topical sections covering XML, data and information, datamining and data warehouses, database applications, WWW, bioinformatics, process automation and workflow, knowledge management and expert systems, database theory, query processing, and privacy and security.

Database and Expert Systems Applications

Database and Expert Systems Applications
Author: Sourav S. Bhowmick
Publisher: Springer
Total Pages: 890
Release: 2009-08-25
Genre: Computers
ISBN: 3642035736

Download Database and Expert Systems Applications Book in PDF, Epub and Kindle

This book constitutes the refereed proceedings of the 20th International Conference on Database and Expert Systems Applications, DEXA 2009, held in Linz, Austria, in August/September 2009. The 35 revised full papers and 35 short papers presented were carefully reviewed and selected from 202 submissions. The papers are organized in topical sections on XML and databases; Web, semantics and ontologies; temporal, spatial, and high dimensional databases; database and information system architecture, performance and security; query processing and optimisation; data and information integration and quality; data and information streams; data mining algorithms; data and information modelling; information retrieval and database systems; and database and information system architecture and performance.

Rough Sets and Intelligent Systems Paradigms

Rough Sets and Intelligent Systems Paradigms
Author: Marzena Kryszkiewicz
Publisher: Springer
Total Pages: 394
Release: 2014-06-13
Genre: Computers
ISBN: 3319087290

Download Rough Sets and Intelligent Systems Paradigms Book in PDF, Epub and Kindle

This book constitutes the refereed proceedings of the 23rd Australasian Joint Conference on Rough Sets and Intelligent Systems Paradigms, RSEISP 2014, held in Granada and Madrid, Spain, in July 2014. RSEISP 2014 was held along with the 9th International Conference on Rough Sets and Current Trends in Computing, RSCTC 2014, as a major part of the 2014 Joint Rough Set Symposium, JRS 2014. JRS 2014 received 40 revised full papers and 37 revised short papers which were carefully reviewed and selected from 120 submissions and presented in two volumes. This volume contains the papers accepted for the conference RSEISP 2014, as well as the three invited papers presented at the conference. The papers are organized in topical sections on plenary lecture and tutorial papers; foundations of rough set theory; granular computing and covering-based rough sets; applications of rough sets; induction of decision rules - theory and practice; knowledge discovery; spatial data analysis and spatial databases; information extraction from images.

Nature-Inspired Algorithms for Big Data Frameworks

Nature-Inspired Algorithms for Big Data Frameworks
Author: Banati, Hema
Publisher: IGI Global
Total Pages: 435
Release: 2018-09-28
Genre: Computers
ISBN: 1522558535

Download Nature-Inspired Algorithms for Big Data Frameworks Book in PDF, Epub and Kindle

As technology continues to become more sophisticated, mimicking natural processes and phenomena becomes more of a reality. Continued research in the field of natural computing enables an understanding of the world around us, in addition to opportunities for manmade computing to mirror the natural processes and systems that have existed for centuries. Nature-Inspired Algorithms for Big Data Frameworks is a collection of innovative research on the methods and applications of extracting meaningful information from data using algorithms that are capable of handling the constraints of processing time, memory usage, and the dynamic and unstructured nature of data. Highlighting a range of topics including genetic algorithms, data classification, and wireless sensor networks, this book is ideally designed for computer engineers, software developers, IT professionals, academicians, researchers, and upper-level students seeking current research on the application of nature and biologically inspired algorithms for handling challenges posed by big data in diverse environments.

Understanding and Interpreting Machine Learning in Medical Image Computing Applications

Understanding and Interpreting Machine Learning in Medical Image Computing Applications
Author: Danail Stoyanov
Publisher: Springer
Total Pages: 149
Release: 2018-10-23
Genre: Computers
ISBN: 3030026280

Download Understanding and Interpreting Machine Learning in Medical Image Computing Applications Book in PDF, Epub and Kindle

This book constitutes the refereed joint proceedings of the First International Workshop on Machine Learning in Clinical Neuroimaging, MLCN 2018, the First International Workshop on Deep Learning Fails, DLF 2018, and the First International Workshop on Interpretability of Machine Intelligence in Medical Image Computing, iMIMIC 2018, held in conjunction with the 21st International Conference on Medical Imaging and Computer-Assisted Intervention, MICCAI 2018, in Granada, Spain, in September 2018. The 4 full MLCN papers, the 6 full DLF papers, and the 6 full iMIMIC papers included in this volume were carefully reviewed and selected. The MLCN contributions develop state-of-the-art machine learning methods such as spatio-temporal Gaussian process analysis, stochastic variational inference, and deep learning for applications in Alzheimer's disease diagnosis and multi-site neuroimaging data analysis; the DLF papers evaluate the strengths and weaknesses of DL and identify the main challenges in the current state of the art and future directions; the iMIMIC papers cover a large range of topics in the field of interpretability of machine learning in the context of medical image analysis.