HBase Schema

18 Tuesday Apr 2017

Posted by Prasad Khode in big data, Hadoop, HBase

“billions of rows * millions of columns * thousands of versions = terabytes or petabytes of storage” (The HBase project)

Apache HBase is an open source implementation of Google’s BigTable. It is built atop Apache Hadoop and is tightly integrated with it. It is a good choice for applications requiring fast random access to very large amounts of data.

HBase stores data in a form of a distributed sorted multidimensional persistence maps called Tables. The table terminology makes it easier for people coming from the relational data management world to abstract data organization in HBase. HBase is designed to manage tables with billions of rows and millions of columns.

HBase data model consists of tables containing rows. Data is organized into column families grouping columns in each row. This is where similarities between HBase and relational databases end. Now we will explain what is under the HBase table/rows/column families/columns…

View original post 2,470 more words

Installing Scala in RHEL / Cent OS

29 Sunday Jan 2017

Posted by Prasad Khode in Uncategorized

≈ 2 Comments

Tags

Cent OS, Configuring, Installing, Java, linux, RHEL, scala

To install Scala, it requires the Java run time version 1.8 or later. Once we have Java installed and configured, we can download the Scala distribution in RHEL or Cent OS using this command

wget http://www.scala-lang.org/files/archive/scala-2.12.1.tgz

Once the download is done, we will extract the distribution at the given location /usr/lib

sudo tar -xf scala-2.12.1.tgz -C /usr/lib

Lets create symbolic link to the scala directory

sudo ln -s /usr/lib/scala-2.12.1 /usr/lib/scala

Now we will add the scala bin directory to PATH

export PATH=$PATH:/usr/lib/scala/bin

Thats all we have to do. Now we can check our scala installation using the command

scala -version

It should print the following in terminal

Save data to Cassandra tables using Apache Spark

06 Tuesday Sep 2016

Posted by Prasad Khode in big data, Cassandra, Hadoop, Java, Spark

≈ 1 Comment

Tags

Apache Cassandra, apache spark, big data, BigData, Cassandra, data, Hadoop, Java, push, rdd, save, save into table, Spark, spark java, Table

Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.

Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley. In contrast to Hadoop’s two-stage disk-based MapReduce paradigm, Spark’s in-memory primitives provide performance up to 100 times faster for certain applications. By allowing user programs to load data into a cluster’s memory and query it repeatedly, Spark is well-suited to machine learning algorithms. Continue reading →

How to use existing HBase table in Apache Phoenix

26 Tuesday Jul 2016

Posted by Prasad Khode in Apache Phoenix, HBase

≈ 9 Comments

Tags

Apache HBase, Apache Phoenix, HBase, Phoenix, reuse, SELECT, Table, View

For latest updates on this post check
my new blog site

Apache Phoenix is an open source, relational database layer on top of noSQL store such as Apache HBase. Phoenix provides a JDBC driver that hides the intricacies of the noSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; upsert and delete rows singly and in bulk; and query data through SQL.

Continue reading →

Installing Apache Solr

16 Monday May 2016

Posted by Prasad Khode in Hadoop, Solr

≈ 2 Comments

Tags

Configuring, Indexing, Installing, Search

Apache Solr:

Apache Solr is an opensource search platform powered by Apache Lucene written in Java. Solr is standalone search server with REST-like API. We index documents in it via JSON, XML, CSV or binary over HTTP. We query it via HTTP GET and receive JSON, XML, CSV or binay results.

Continue reading →

HBase shell commands

25 Friday Mar 2016

Posted by Prasad Khode in Hadoop, HBase

≈ Leave a comment

Tags

commands, create, describe, HBase, list, namespace, shell, status, version, whoami

HBase is free, open-source software from the Apache Foundation. It is a cross platform technology, so we can run it on Linux, Windows or OS/X machines and also can be hosted on Amazon Web Services and Microsoft Azure.

HBase is a NoSQL database which can run on a single machine, or cluster of servers. HBase provides data access in real-time. HBase tables can store billions of rows and millions of columns, unlike other big data technologies, which are batch-oriented. In HBase we have few key concepts like row key structure, column families, and regions.
Continue reading →

Read data from Cassandra tables using Apache Spark

12 Thursday Nov 2015

Posted by Prasad Khode in Cassandra, Hadoop, Spark

≈ 3 Comments

Tags

Cassandra, Cassandra Spark Integration, Hadoop, Integration, READ, Spark

Read records from HBase table using Java

03 Tuesday Nov 2015

Posted by Prasad Khode in Hadoop, HBase, Java

≈ Leave a comment

Tags

CRUD, HBase, Java HBase, Java HBase Integration, Read Records, SELECT

For latest updates on this post check
my new blog site

hbase-client.jar will be used to get connected to HBase using Java and this is available in maven repository. The following dependency can be added in our pom.xml

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client<artifactId>
    <version>1.1.0.1</version>
<dependency>

Once we have added the dependency we need to create Configuration object specifying core-site.xml and hbase-site.xml as resources. Continue reading →

Configuring Apache Phoenix in CDH 5.x using Cloudera Manager

14 Wednesday Oct 2015

Posted by Prasad Khode in Apache Phoenix, Hadoop, HBase

≈ Leave a comment

Tags

Apache Phoenix, Hadoop, HBase, JDBC, SQL Layer, Wrapper

Continue reading →

Create MySQL Events / Schedulers

05 Wednesday Aug 2015

Posted by Prasad Khode in MySQL

≈ Leave a comment

Tags

Cron, Events, Mysql, Schedulers

MySQL Event is, performing or executing some operation based on the specified or scheduled time. MySQL Events have been added from version 5.1.6 MySQL event scheduler is a process that runs in background and looks for events to execute. Before we create or schedule an event in MySQL, we need to first verify whether its enabled or not Issue the following command to turn on the scheduler Continue reading →

Khode Prasad

~ Java, J2EE & Hadoop Engineer

HBase Schema

Installing Scala in RHEL / Cent OS

Save data to Cassandra tables using Apache Spark

How to use existing HBase table in Apache Phoenix

Installing Apache Solr

Read data from Cassandra tables using Apache Spark

Read records from HBase table using Java

Configuring Apache Phoenix in CDH 5.x using Cloudera Manager

Create MySQL Events / Schedulers

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: