Saeed K. Rahimi, PhD, is an associate professor with the Graduate Program in Software at the University of St. Thomas. He is also a cofounder of DWSoft Corporation and InfoSpan, two companies specializing in metadata management for data warehousing. He had been a database design and implementation consultant, providing services to the industry and the federal government for over thirty years. He has spoken in many national and international conferences and has published many scientific articles. Dr. Rahimi holds a BS in electrical engineering and a PhD, both in computer science, from the University of Minnesota.
Frank S. Haug is an adjunct professor with the Graduate Programs in Software at the University of St. Thomas, where he has taught graduate courses in software development, distributed database management systems, and data warehousing. He has over twenty-five years of experience in academia and industry, working in areas including software development, database design and implementation, and network administration to implement projects across many technology platforms, DDBMS, and programming languages. Mr. Haug had a BA in mathematics and quantitative methods and computer science, and an MS in software engineering, both from the University of St. Thomas.
This book addresses issues related to managing data across a distributed database system. It is unique because it covers traditional database theory and current research, explaining the difficulties in providing a unified user interface and global data dictionary. The book gives implementers guidance on hiding discrepancies across systems and creating the illusion of a single repository for users. It also includes three sample frameworks implemented using J2SE with JMS, J2EE, and Microsoft .Net that readers can use to learn how to implement a distributed database management system. IT and development groups and computer sciences/software engineering graduates will find this guide invaluable.
Saeed K. Rahimi, PhD, is an associate professor with the Graduate Program in Software at the University of St. Thomas. He is also a cofounder of DWSoft Corporation and InfoSpan, two companies specializing in metadata management for data warehousing. He had been a database design and implementation consultant, providing services to the industry and the federal government for over thirty years. He has spoken in many national and international conferences and has published many scientific articles. Dr. Rahimi holds a BS in electrical engineering and a PhD, both in computer science, from the University of Minnesota. Frank S. Haug is an adjunct professor with the Graduate Programs in Software at the University of St. Thomas, where he has taught graduate courses in software development, distributed database management systems, and data warehousing. He has over twenty-five years of experience in academia and industry, working in areas including software development, database design and implementation, and network administration to implement projects across many technology platforms, DDBMS, and programming languages. Mr. Haug had a BA in mathematics and quantitative methods and computer science, and an MS in software engineering, both from the University of St. Thomas.
"The chapters are clearly written and all the technical details are
thoroughly displayed." (Zentralblatt MATH, 2011)
CHAPTER 1
INTRODUCTION
Distributed: (adjective) of, relating to, or being a computer network in which at least some of the processing is done by the individual workstations and information is shared by and often stored at the workstations.
—Merriam-Webster’s 11th Collegiate Dictionary
Database (noun) a [sic] usually large collection of data organized especially for rapid search and retrieval (as by a computer).
—Merriam-Webster’s 11th Collegiate Dictionary
Informally speaking, a database (DB) is simply a collection of data stored on a computer, and the term distributed simply means that more than one computer might cooperate in order to perform some task. Most people working with distributed databases would accept both of the preceding definitions without any reservations or complaints. Unfortunately, achieving this same level of consensus is not as easy for any of the other concepts involved with distributed databases (DDBs). A DDB is not simply “more than one computer cooperating to store a collection of data”—this definition would include situations that are not really distributed databases, such as any machine that contains a DB and also mounts a remote file system from another machine. Similarly, this would be a bad definition because it would not apply to any scenario where we deploy a DDB on a single computer. Even when a DDB is deployed using only one computer, it remains a DDB because it is still possible to deploy it across multiple computers. Often, in order to discuss a particular approach for implementing a DB, we need to use more restrictive and specific definitions. This means that the same terms might have conflicting definitions when we consider more than one DB implementation alternative. This can be very confusing when researching DBs in general and especially confusing when focusing on DDBs. Therefore, in this chapter, we will present some definitions and archetypical examples along with a new taxonomy. We hope that these will help to minimize the confusion and make it easier to discuss multiple implementation alternatives throughout the rest of the book.
1.1 DATABASE CONCEPTS
Whenever we use the term “DB” in this book, we are always contemplating a collection of persistent data. This means that we “save” the data to some form of secondary storage (the data usually written to some form of hard disk). As long as we shut things down in an orderly fashion (following the correct procedures as opposed to experiencing a power failure or hardware failure), all the data written to secondary storage should still exist when the system comes back online. We can usually think of the data in a DB as being stored in one or more files, possibly spanning several partitions, or even several hard disk drives—even if the data is actually being stored in something more sophisticated than a simple file.
1.1.1 Data Models
Every DB captures data in two interdependent respects; it captures both the data structure and the data content. The term “data content” refers to the values actually stored in the DB, and usually this is what we are referring to when we simply say “data.” The term “data structure” refers to all the necessary details that describe how the data is stored. This includes things like the format, length, location details for the data, and further details that identify how the data’s internal parts and pieces are interconnected. When we want to talk about the structure of data, we usually refer to it as the data model (DM) (also called the DB’s schema, or simply the schema). Often, we will use a special language or programmatic facility to create and modify the DM. When describing this language or facility, authors sometimes refer to the facility or language as “the data model” as well, but if we want to be more precise, this is actually the data modeling language (ML)—even when there is no textual language. The DM captures many details about the data being stored, but the DM does not include the actual data content. We call all of these details in the DM metadata, which is informally defined as “data about data” or “everything about data except the content itself.” We will revisit data models and data modeling languages in Chapters 10 and 11.
1.1.2 Database Operations
Usually, we want to perform several different kinds of operations on DBs. Every DB must at least support the ability to “create” new data content (store new data values in the DB) and the ability to retrieve existing data content. After all, if we could not create new data, then the DB would always be empty! Similarly, if we could not retrieve the data, then the data would serve no purpose. However, these operations do not need to support the same kind of interface; for example, perhaps the data creation facility runs as a batch process but the retrieval facility might support interactive requests from a program or user. We usually expect newer DB software to support much more sophisticated operations than minimum requirements dictate. In particular, we usually want the ability to update and delete existing data content. We call this set of operations CRUD (which stands for “create, retrieve, update, and delete”). Most modern DBs also support similar operations involving the data structures and their constituent parts. Even when the DBs support these additional “schema CRUD” operations, complicated restrictions that are dependent on the ML and sometimes dependent on very idiosyncratic deployment details can prevent some schema operations from succeeding.
Some DBs support operations that are even more powerful than schema and data CRUD operations. For example, many DBs support the concept of a query, which we will define as “a request to retrieve a collection of data that can potentially use complex criteria to broaden or limit the collection of data involved.” Likewise, many DBs support the concept of a command, which we will define as “a request to create new data, to update existing data, or to delete existing data—potentially using complex criteria similar to a query.” Most modern DBs that support both queries and commands even allow us to use separate queries (called subqueries) to specify the complex criteria for these operations.
Any DB that supports CRUD operations must consider concurrent access and conflicting operations. Anytime two or more requests (any combination of queries and commands) attempt to access overlapping collections of data, we have concurrent access. If all of the operations are only retrieving data (no creation, update, or deletion), then the DB can implement the correct behavior without needing any sophisticated logic. If any one of the operations needs to perform a write (create, update, or delete), then we have conflicting operations on overlapping data. Whenever this happens, there are potential problems—if the DB allows all of the operations to execute, then the execution order might potentially change the results seen by the programs or users making the requests. In Chapters 5, 6, and 8, we will discuss the techniques that a DB might use to control these situations.
1.1.3 Database Management
When DBs are used to capture large amounts of data content, or complex data structures, the potential for errors becomes an important concern—especially when the size and complexity make it difficult for human verification. In order to address these potential errors and other issues (like the conflicting operation scenario that we mentioned earlier), we need to use some specialized software. The DB vendor can deploy this specialized software as a library, as a separate program, or as a collection of separate programs and libraries. Regardless of the deployment, we call this specialized software a database management system (DBMS). Vendors usually deploy a DBMS as a collection of separate programs and libraries.
1.1.4 DB Clients, Servers, and Environments
There is no real standard definition for a DBMS, but when a DBMS is deployed using one or more programs, this collection of programs is usually referred to as the DB-Server. Any application program that needs to connect to a DB is usually referred to as the DB-Client. Some authors consider the DB-Server and the DBMS to be equivalent—if there is no DB-Server, then there is no DBMS; so the terms DBMS-Server and DBMS-Client are also very common. However, even when there is no DB-Server, the application using the DB is still usually called the DB-Client. Different DBMS implementations have different restrictions. For example, some DBMSs can manage more than one DB, while other implementations require a separate DBMS for each DB.
Because of these differences (and many more that we will not discuss here), it is sometimes difficult to compare different implementations and deployments. Simply using the term “DBMS” can suggest certain features or restrictions in the mind of the reader that the author did not intend. For example, we expect most modern DBMSs to provide certain facilities, such as some mechanism for defining and enforcing integrity constraints—but these facilities are not necessarily required for all situations. If we were to use the term “DBMS” in one of these situations where these “expected” facilities were not required, the reader might incorrectly assume that the...
| Erscheint lt. Verlag | 13.2.2015 |
|---|---|
| Zusatzinfo | Charts: 282 B&W, 0 Color; Drawings: 0 B&W, 0 Color; Tables: 0 B&W, 0 Color |
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
| Mathematik / Informatik ► Informatik ► Netzwerke | |
| Mathematik / Informatik ► Informatik ► Theorie / Studium | |
| Schlagworte | ACCESS • across • Book • Computer Science • Coverage • Current • Database • Database & Data Warehousing Technologies • Datenbanken u. Data Warehousing • Datenbankverwaltungssystem • DDBMS • Design • Distributed • easiertomange • Electrical & Electronics Engineering • Elektrotechnik u. Elektronik • Environments • implemented • Informatik • Information • Network • Parallel and Distributed Computing • Paralleles u. Verteiltes Rechnen • platform • realworld • Research • System • Systems Engineering & Management • Systemtechnik u. -management • theory • Top • traditional database • Transparent • Unique • Users • Verteiltes System |
| ISBN-13 | 9781118043530 / 9781118043530 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Digital Rights Management: ohne DRM
Dieses eBook enthält kein DRM oder Kopierschutz. Eine Weitergabe an Dritte ist jedoch rechtlich nicht zulässig, weil Sie beim Kauf nur die Rechte an der persönlichen Nutzung erwerben.
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür die kostenlose Software Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür eine kostenlose App.
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich