Data Persistence

Content,

  1. Introduction to data persistence.
  2. Data, Files, Databases, and DBMSs.
  3. Application to files/DB
  4. ORM
  5. NOSQL and Hadoop.
  6. Information retrieval.

Data

Data is any sort of information that is stored in computer memory. This information can
be used later for a website, an application or any other client to store for future purpose. The most common information is User information in the form of user personal, address and banking information. As an example consider Facebook, it stores our personal data, images, posts, comments and many more things. Moreover Banking application also stores user data, their transactions details, funds summary etc. All this information is data, but when it put together and store in a structural way, it becomes informational data.

Database

Database (DB) are organized, they have a structure, and all the data they store it fits into that structure. Moreover, a database is an electronic system that allows data to be stored,  easily accessed, manipulated and updated.

Databases are quite similar to spreadsheets because they are mostly made up of tables which contain rows and columns like a spreadsheet. A database needs to be hosted or created on some special database platform, some famous Database platforms are:

  • PostgreSQL
  • MySQL
  • Microsoft Access
  • SQLite

Database Management System

A database management system is a software used to perform different operations, like addition, access, updating, and deletion of the data, like adding your name in the database for an online retail store as a customer. Furthermore a database management system acts as the backbone of a database and makes using a database a cakewalk as it makes access and management of data a lot easier.

Database Server

Database server is the term used to refer to the back-end system of a database application using client/server architecture. The back-end, or a database server, performs tasks such as data analysis, storage, data manipulation, archiving, and other non-user specific tasks.

Files vs Databases

Difference Between File and Database

A data file is a collection of related records stored on a storage medium such as a hard disk or optical disc. While adatabase is a collection of data organized in a manner that allows access, retrieval, and use of that data.

Data arrangement

A Data Access Arrangement (DAA) is an electronic interface within a computer and its modem to a public telephone line. A DAA is also sometimes called a Telephone Line Interface Circuit (or Module). Moreover, An already-approved DAA design into the modem is built by
most manufacturers of modems and other devices.

Types of Database Management Systems

  • Hierarchical databases.
  • Network databases.

The network model is a database model conceived as a flexible way of representing objects and their relationships. Moreover its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.

  • Relational databases.

A relational database is a digital database based on the relational model of data, as proposed by E. F. Codd in 1970. A software system used to maintain relational databases is a relational database management system. Moreover Virtually all relational database systems use SQL for querying and maintaining the database.

  • Object-oriented databases.

An object database is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented. Moreover Object-relational databases are a hybrid of both approaches.

  • Graph databases.

In computing, a graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.

  • ER model databases.
  • Document databases.

A document-oriented database, or document store, is a computer program designed for storing, retrieving and managing document-oriented information. It is also known as semi-structured data. 

Data warehouse with big data

Data warehouse means the relational database, therefore storing, fetching data will be similar with normal SQL query. And big data is not following proper database structure, we need to use hive or spark SQL to see the data by using hive specific query. 100% dataloaded into data warehousing are using for analytics reports.

To see more about data warehouse and big data visit;

https://www.educba.com/big-data-vs-data-warehouse/

Application component

A central concept for application architecture is Application Component. The following view shows this concept in detail.

The SQL statements, Prepared statements, and Callable statements ,

What is SQL?

As it is the special purpose domain specific language for querying the data in Relational Database Management System (RDBMS), SQL refers to Structured Query Language .

Microsoft SQL Server, MySQL, Oracle etc. use SQL for querying with slight syntax differences.

SQL Language

Types of SQL statements

SQL statements are categorized into four different type of statements,

  1. DML (DATA MANIPULATION LANGUAGE)
  2. DDL (DATA DEFINITION LANGUAGE)
  3. DCL (DATA CONTROL LANGUAGE)
  4. TCL (TRANSACTION CONTROL LANGUAGE)
SQL Language

Let’s see one by one.



DML

We have four different SQL statements, In Data Manipulation Language(DML) .

  1. SELECT

    Based on some condition, Select statement is used to select the collection of records from the table.

    E.g. select * from student – Get all the records of student table.

    Select * from student where rank>5 – Get the records with the condition where students’ rank is greater than 5.

  2. INSERT

    Insert statement is used to insert the set of values into the table.

    E.g. Insert into Student (Rank, StudentName, Mark) Values(1,’Kumar’,450)

  3. UPDATE

    Update statement is used to update the existing values in the table, which is based on some condition.

    E.g., update student set StudentName=’Manoj’ where StudentName=’Kumar’ 

    The query given above will update the studentName from Manoj to Kumar where student Name Kumar.

  4. DELETE

    Delete statement is used to delete the existing record in the table, which is based on some condition.

    Eg., Delete from Student where StudentName=’Manoj’

    The query given above will delete the record which has StudentName Manoj.

DDL

We have three different SQL statements, In Data Definition Language (DDL).

  1. CREATE 

    Create statement is used to create the new table in an existing database.

    Eg., Create Table Student (Rank Int,StudentName varchar(50),Mark Float)

  2. ALTER

    Alter statement can add a column, modify a column, drop a column, rename a column or rename a table.

    Eg., Alter Table Student Add (StudentAddress varchar (100))

  3. DROP

    SQL DROP TABLE statement is used to remove a table definition and all the data, indexes, triggers, constraints and permission specifications for the table.

    Eg, Drop Student

DCL

In Data Control Language(DCL), it defines the control over the data in the database. We have two different commands, which are

  1. GRANT

    Grant is allowed to do the specified user to the specified tasks.

    Syntax

    GRANT privilege_name
    ON object_name
    TO {user_name |PUBLIC |role_name} 
    [WITH GRANT OPTION];

  2. REVOKE

    It is used to cancel previously granted or denied permissions.

    Syntax

    REVOKE privilege_name
    ON object_name
    FROM {user_name |PUBLIC |role_name}

TCL

In Transaction Control Language (TCL), the commands are used to manage the transactions in the database. These are used to manage the changes made by DML statements. It also allows the statements to be grouped together into logical transactions.

  1. COMMIT

    Commit command is used to permanently save any transaction into the database.

    Syntax Commit;

  2. ROLLBACK

    Rollback command is used to restore the database for the last committed state. It’s also used with save point to jump to the save point.

    Syntax

    Rollback to save point name

  3. SAVEPOINT 

    Save point command is used to temporarily save a transaction, so that you can roll back to that point whenever necessary.

    Syntax

    savepointsavepoint-name;

Callable Statements

The CallableStatement interface allows the use of SQL statements to call stored procedures. Furthermore stored procedures are programs that have a database interface. These programs possess the following:

They can have input and output parameters, or parameters that are both input and output.

Prepared statements

In database management systems, a prepared statement or parameterized statement is a feature used to execute the same or similar database statements repeatedly with high efficiency.



ORM

Advantages of ORM.

Eliminating the hassle for developers, they write correct and optimized SQL queries . They make the code easier to update, maintain, and reuse as the developer can think of, and manipulate data as objects.

How ORM Works.

Object-relational mapping (ORM) is a technique (a.k.a. design pattern) of accessing a relational database from an object-oriented language (Java, for example). Hibernate is a big, powerful engine that makes a connection to the database, executes necessary SQL SELECT requests, and retrieves the data.

POJO

POJO stands for Plain Old Java Object. It is an ordinary Java object, furthermore not bound by any special restriction other than those forced by the Java Language Specification and not requiring any class path. PFor increasing the readability and re-usability of a program, POJOs are used .

A POJO should not:

  1. Extend prespecified classes, Ex: public class GFG extends javax.servlet.http.HttpServlet { … } is not a POJO class.
  2. Implement prespecified interfaces, Ex: public class Bar implements javax.ejb.EntityBean { … } is not a POJO class.
  3. Contain prespecified annotations, Ex: @javax.persistence.Entity public class Baz { … } is not a POJO class.

POJOs basically defines an entity. Like in you program, if you want a Employee class then you can create a POJO as follows:

JAVA Beans

JavaBeans are classes that encapsulate many objects into a single object (thebean). Moreover it is a java class that should follow following conventions:

Must implement Serializable. It should have a public no-arg constructor. Furthermore all properties in java bean must be private with public getters and setter methods.

They are used to encapsulate many objects into a single object (the bean), so that they can be passed around as a single bean object instead of as multiple individual objects. a Java Object that is serializable, has a nullary constructor, and allows access to properties using getter and setter methods is a JavaBean.

Moreover JavaBeans are reusable software components for Java. Practically, they are classes written in the Java programming language conforming to a particular convention.

JPA

The Java Persistence API is a Java application programming interface specification which describes the management of relational data in applications using Java Platform, Standard Edition and Java Platform, Enterprise Edition. 

Java ORM Tools

  • Transparent persistence without byte code processing.
  • Object-oriented query language.
  • Object / Relational mappings.
  • Automatic primary key generation.
  • Object/Relational mapping definition.
  • HDLCA (Hibernate Dual-Layer Cache Architecture)
  • High performance.
  • J2EE integration.

Entity Developer is a powerful modeling and code generation tool for LinqConnect , Telerik Data Access, NHibernate, and ADO.NET Entity Framework. Can design an entity model from scratch or reverse-engineer an existing database. Furthermore the model is used to generate C# or Visual Basic code with predefined or custom code templates.

dotConnect is an enhanced data connectivity solution built over ADO.NET architecture and a development framework with a number of innovative technologies and support for such ORM solutions as Entity Framework and LinqConnect. dotConnect includes high performance data providers for the databases and cloud applications and offers a complete solution for developing data-related applications and web sites.

LINQ Insight is a powerful Visual Studio add-in for LINQ development that allows you to execute LINQ queries at design time directly from Visual Studio without starting a debug session and provides a powerful ORM profiler for Entity Framework, NHibernate, LINQ to SQL, and LinqConnect. Moreover it profiles the data access layer of your projects and tracks all the ORM calls and SQL queries from the ORM.

LinqConnect – a fast and easy to use ORM solution, it has developed closely to the Microsoft LINQ to SQL technology. Moreover In addition to LINQ to SQL features, LinqConnect provides its own advanced functionality. Moreover LinqConnect supports SQL Server, Oracle, MySQL, PostgreSQL, and SQLite.

NoSQL

A traditional database product would prefer more predictable, structured data. To expand as data or processing requirements grow, A relational database may require vertical and, sometimes horizontal expansion of servers. NoSQL is a whole new way of thinking about a database. Moreover NoSQL is not a relational database.

Advantages,

  • Large volumes of structured, semi-structured, and unstructured data.
  • Agile sprints, quick iteration, and frequent code pushes.
  • Object-oriented programming that is easy to use and flexible.
  • Efficient, scale-out architecture instead of expensive, monolithic architecture.

There are 4 basic types of NoSQL databases:

  • Key-Value Store – It has a Big Hash Table of keys & values {Example- Riak, Amazon S3 (Dynamo)}
  • Document-based Store- It stores documents made up of tagged elements. …
  • Column-based Store- Each storage block contains data from only one column, {Example- HBase, Cassandra}

Hadoop

Simply, Hadoop can be thought of as a set of open source programs and procedures (meaning essentially they are free for anyone to use or modify, with a few exceptions) which anyone can use as the “backbone” of their big data operations.

Core Hadoop Components

  • 1) Hadoop Common-
  • 2) Hadoop Distributed File System (HDFS) –
  • HDFS Use Case-
  • MapReduce Use Case:
  • Key Benefits of Hadoop 2.0 YARN Component-
  • YARN Use Case:
  • Pig Use Case-
  • Hive Use Case-

IR

Information retrieval is the activity of obtaining information system resources relevant to an information need from a collection. Moreover Searches can be based on full-text or other content-based indexing.

Tools for IR

http://www2.compute.dtu.dk/~pcha/IRtools/

REFERENCES

https://www.webopedia.com/TERM/D/database_server.html,

https://www.google.com/search?ei=6ECuXKihA4ui-QbxyajgAw&q=different+data+arrangements+&oq=different+data+arrangements+&gs_l=psy-ab.3…6852.14443..15263…1.0..0.129.1523.0j13……0….1..gws-wiz…….0i71j35i39j0j0i7i30j0i8i7i30.Sz7OjCAHX4Ahttps://www.educba.com/big-data-vs-data-warehouse

https://www.c-sharpcorner.com/blogs/types-of-sql-statements-with-example

https://www.geeksforgeeks.org/pojo-vs-java-beans/

https://www.devart.com/orm-solutions.html

https://www.bernardmarr.com/default.asp?contentID=1080 http://www2.compute.dtu.dk/~pcha/IRtools/

Leave a comment