Search

Crack Stats

Author

Chandan Routray

Undergraduate student at IIT Kharagpur, India

Setting Up RStudio Server on AWS EC2 Instance

Image Source:-http://aws.amazon.com/ec2/
Image Source:-http://aws.amazon.com/ec2/
Image Source:- www.rstudio.com
Image Source:- http://www.rstudio.com

Amazon Web Services(AWS) EC2
EC2 stands for Elastic Compute Cloud, it is a web service by Amazon Web Services that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers. In this blogpost, I am going to show how to set up RStudio server on an EC2 instance, which will allow you to access RStudio from your system using SSH Client or your web browser. For this purpose I am using AWS free tier, which  includes 750 hours of Linux and Windows t2.micro instances each month for one year (Note:-To stay within the Free Tier, use only EC2 Micro instances)

Setting up RStudio Server on an EC2 instance:

Step 1 : Go to http://aws.amazon.com/ec2/, sign up for it. You will require a credit card with a minimum balance of $1 for this, don’t worry if you don’t have one, visit https://www.entropay.com/  create a virtual credit card and use it for creating the account.

Step 2 : Open AWS Management Console from My Account/Console Tab and  select EC21

Step 3 : Click on Launch Instance to create a new EC2 instance.
ROW 1

Step 4: It will open a wizard where we need to configure the instance. Select the Community AMIs tab, check the Ubuntu check box and select any 64-bit AMIROW 2

Step 5 : Now select any Free tier micro instance to use and click on Next
ROW 3

Step 6 : For the next two steps of this wizard leave the settings as they are originally i.e for Configure Instance and Add Storage leave the settings at default.

Step 7 : Now we need to create a new security group. A security group is a set of firewall rules that control the traffic for your instance. In this step, you can add rules to allow specific traffic to reach your instance. For example, if you want to set up a web server and allow Internet traffic to reach your instance, add rules that allow unrestricted access to the HTTP and HTTPS ports. Add the rules shown in the image below to your instance(Note:The last rule in this list i.e. port 8787 is used to access RStudio server)Now Click on Review and Launch.

ROW 4

 

 Step 8 : Review your settings and click on the Launch button, it will prompt you to download the key pair which you will need to access your instance. Then click on Launch Instances
ROW 5

ROW 6

 

Step 9 : On the next page you can see your Instance loading, the Instance will change from Pending to Running, once it is active.ROW 7

 

Step 10 : Now your instance is running and you require a SSH Client to access the same. Download PuTTY (a SSH Client) from http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html and install it on your system.(Note: You can access your instance by your browser also, click on Connect to do the same)

Step 11 : After installing, open PuTTY gen on your system. Click on Load and Open the key pair that  you have downloaded in Step 8. Then click on Save Private Key and save the generated key.ROW 8ROW 9

 

Step 12 : Now open PuTTY, go to the Session tab and in the Host Name field type ubuntu@YourPublicDNS. You can find your public DNS from EC2 dashboard.2ROW 10

Step 13: Now in PuTTY, go to Connection–> SSH –> Auth. Click on Browse and select the private key you have created from PuTTY gen in step 11. Finally click Open.

ROW 11
Step 14 : After completing the previous step a new window will open like the image below.

ROW 14

 

Now to install RStudio Server on this instance type the following command in this window one by one:

#Install latest version of R
sudo apt-get install r-base
#Install in order to use RCurl & XML, useful if you want to use R to connect to any web data/APIs.
sudo aptitude install libcurl4-openssl-dev
sudo apt-get install libxml2-dev
#Install a few background files
sudo apt-get install gdebi-core
sudo apt-get install libapparmor1
#Download and Install RStudio Server
wget http://download2.rstudio.org/rstudio-server-0.97.336-amd64.deb
sudo gdebi rstudio-server-0.97.336-amd64.deb

 

Step 15 : Now to open RStudio server, go to your web browser and navigate to the Public DNS of your image on port 8787, similar to:

http://ec2-XX-XX-XX-XXX.compute-1.amazonaws.com:8787

 

SQL And SQL Injection(A Web Attack Technique)

What is SQL ?
SQL stands for Structured Query Language, it is a set of instructions used to interact with a database. It is the standard language for relational database management systems( According to ANSI ). SQL commands are used to perform operations on a database such as update data on it or retrieve data from it. Some common relational database management systems in the market that use SQL are: Oracle, Sybase, Microsoft SQL Server, Access, Ingres, etc. Some of the SQL commands that are common to all relational database management systems are Select, Insert, Update, Delete, Create, and Drop, these commands can be used to do almost everything that one needs to do with a database.

What is SQL Injection ?
Almost every website you visit, has its very own database where it stores important information such as your user name, password and other useful things that neither you nor the website wants to reveal to anyone else. Not just websites, every organisation manages its own database to keep its data. SQL Injection is one of the many web attack technique used by hackers to steal data from organizations. It is one of the most common application layer attack techniques used today. It takes advantage of improper coding of your website/web applications that allows a hacker to inject SQL commands in form of user input in a web form. In simple words, it arises because the fields available for user input allow SQL statements to pass through and query the database directly, through which a hacker can retrieve, update or even delete data.

A Simple Example
Here is what a HTML Code for a form(log in panel) looks like:

<form>
User Name: <input type="text" name="username"><br>
Password : <input type="password" name="pwd">
</form>

So whenever you enter a “User Name” and “Password”  it assigns its value to “username”  and “pwd” respectively. Now the system checks its database to check whether this username exists or not and if it exists then the password entered by you matches with the one it has in its database, a sample SQL query to do the same is:

SELECT * FROM userdatabase WHERE username = '$username' AND password = '$pwd’

Now, what a hacker do is, he/she injects a SQL statement in the login panel like for both username and password he/she enters ” anything' or 'x'='x .” In this way our SQL query now reads like this:

SELECT id FROM userdatabase WHERE username = 'anything' OR 'x'='x' AND password = 'anything' OR 'x'='x'

which is a valid query because 'x'='x' is true irrespective of anything.

This will allow the hacker to bypass the login form without actually knowing a valid username/password combination!

 

Web Scraping(Data Extraction from Web) Using iMacros

What is Web Scraping ?
Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information or data from websites. These programs usually simulate user exploration of the World Wide Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding in to a web browser, such as Internet Explorer, Google Chrome or Mozilla Firefox. Sometimes websites set up barriers to prevent scrapping or browser automation, in those cases human examination and copy & paste proves to be the best web scraping technique.

In this blogpost, I am going to show how to extract data from a website using iMacros. I am going to extract a table consisting of population count of all the countries of  the world from this website: http://data.worldbank.org/indicator/SP.POP.TOTL(Note:We use this site for demonstrational purposes, only) using iMacros addon for Mozilla Firefox.

Image source: www.imacros.net
Image source: http://www.imacros.net
  • Install iMacros add-on for Mozilla Firefox(https://addons.mozilla.org/en-US/firefox/addon/imacros-for-firefox/)imacros1
  • After installing you can see iMacros icon on the top bar, click on it to open the iMacros Panel.
    imacros2
  •  Recording a Macro: The very first step for data extraction is to record a macro(which in simple words contains a set of commands which the browser performs). Select “Rec” tab in the iMacros Panel, then push the “Record” button to start recording the macro. Your macro has started recording, enter this url: http://data.worldbank.org/indicator/SP.POP.TOTL  in the url panel of the browser. After the page successfully opens, push the “Stop” button in the iMacros Panel to stop recording the macro.
  • Playing the Macro: You can see the macro you have just made in iMacros Panel, double click on it to start playing.
    You can see as soon as you click the macro it automatically opens the web page (http://data.worldbank.org/indicator/SP.POP.TOTL)
    imacro3
  • Editing the Macro: To edit a macro right click on it and select “Edit Macro“. It will open the macro in iMacros Editor.
    Add these lines of code after the existing code in the macro:

    TAG POS=2 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT
    SAVEAS TYPE=EXTRACT FOLDER=* FILE=population.csv
    

    Lets analyse the code we have right now, it looks something like this
    imacros4

  • Finally, “Save & Close” the editor and play the edited macro. After it being played open the default iMacros folder(Usually C:Usersuser_nameDocumentsiMacrosDownloads), you can see a file named “population.csv“. Open it with a text editor,  you will find all the extracted data from the table in that separated by commas.

 

You can do a lot of cool stuffs using iMacros, find the full documentation at http://wiki.imacros.net/Main_Page

Make Responsive Websites With Bootstrap

Image Source :- http://getbootstrap.com/
Image Source :- http://getbootstrap.com/

What is Bootstrap ?
Bootstrap is the most popular HTML, CSS, and JS framework for developing responsive, mobile first projects on the web. Bootstrap is a free collection of tools for creating websites and web applications. It contains HTML and CSS-based design templates for typography, forms, buttons, navigation and other interface components, as well as optional JavaScript extensions. In June 2014 it was the No.1 project on GitHub with 69,000+ stars and 25,000+ forks(Source: Wikipedia).

How to download and use Bootstrap?
To download the latest Bootstrap available visit http://getbootstrap.com/, after that unzip the compressed folder. It usually contains three folders inside it namely css, js and fonts. The folder includes the following content :

bootstrap/
├── css/
│   ├── bootstrap.css
│   ├── bootstrap.min.css
│   ├── bootstrap-theme.css
│   └── bootstrap-theme.min.css
├── js/
│   ├── bootstrap.js
│   └── bootstrap.min.js
└── fonts/
    ├── glyphicons-halflings-regular.eot
    ├── glyphicons-halflings-regular.svg
    ├── glyphicons-halflings-regular.ttf
    └── glyphicons-halflings-regular.woff

(Source: www.getbootstrap.com)

To create your first bootstrap website, include the above files in the head of your HTML document.

A sample Bootstrap HTML document:
Here is what a normal Bootstrap HTML document looks like.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Bootstrap 101 Template</title>

<!-- Bootstrap -->
<link href="css/bootstrap.min.css" rel="stylesheet">

<!-- HTML5 Shim and Respond.js IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js
https://oss.maxcdn.com/respond/1.4.2/respond.min.js
<![endif]-->
</head>
<body>
<h1>Hello, world!</h1>

<!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js
<!-- Include all compiled plugins (below), or include individual files as needed -->
http://js/bootstrap.min.js
</body>
</html>

(Source: www.getbootstrap.com)

SAS (Statistical Analysis System)

Image Source:- www.sas.com
Image Source:- http://www.sas.com

What is SAS ?
SAS (Statistical Analysis System) is a software developed by SAS Institute for advanced analytics, business intelligence, data management and predictive analytics. It offers huge array of statistical functions, has good GUI (Graphical User Interface) for people to learn quickly and provides awesome technical support. It is very powerful in the area of data management, allowing you to manipulate your data in any way possible. It can perform most general statistical analyses (regression, logistic regression, survival analysis, analysis of variance, factor analysis, multivariate analysis).  However it is one most expensive software available for analytics.

Installing SAS University Edition
SAS University Edition is the free SAS software for students, teachers and professors. You can download SAS University Edition from http://www.sas.com/en_us/software/university-edition.html.

Image Source: www.sas.com
Image Source: http://www.sas.com

It requires a virtualization software package like VMware Player or Orcale Virtual Box on PC, Mac or Linux workstation.
You can download VMware Player at  http://www.vmware.com/products/player/ and Oracle Virtual Box at https://www.virtualbox.org/

After installing any one of the above virtualization software, you need to import the OVA file of SAS University Edition that you have downloaded earlier.sas2

 

Using Git For Projects

Image Source:  http://git-scm.com/
Image Source: http://git-scm.com/

If you have ever worked in a project, you must be knowing about the problems that arises when multiple people edit same files which create a lot of confusion.In this blogpost, I am going to talk about Git and why it is being used widely by people working in projects of any scale. So let’s start.

What is Git?
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. Git was initially designed and developed by Linus Torvalds for Linux kernel development in 2005, and has since become the most widely adopted version control system for software development.

What does Git do exactly?
Git allows a team of people to work together, all using the same files. It helps the team to avoid the confusion that tends to happen when multiple people are editing the same files.It also helps you to save your project at different versions, so that you can retrieve a previous version of your project without any problem.

How does Git works?
The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data. Most of the other systems store information as a list of file-based changes and think of the information they keep as a set of files and the changes made to each file over time while Git doesn’t think of or store its data this way. Instead, Git thinks of its data more like a set of snapshots of a mini file-system. Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot.So if your files have not changed, Git doesn’t store the file again rather it will link to the previous identical file it has already stored.

Want to learn more about Git?
Visit this link to learn more about Git: http://try.github.com/

 

Shiny(RStudio): Web Application Framework For R

Image Source:- http://shiny.rstudio.com/
Image Source:- http://shiny.rstudio.com/

If you want to turn your analyses into an interactive web application, then Shiny by RStudio is the best thing for you, it’s a package  from RStudio that makes it incredibly easy to build interactive web applications without requiring any knowledge of HTML, CSS or JavaScript. Applications made on Shiny are automatically “live” i.e. changing an input on the application updates the output automatically without requiring a reload from browser(See examples at http://shiny.rstudio.com/).

How to install and use Shiny on R ?
Shiny is available on CRAN mirror, you can install it like any other package, just type: install.packages("shiny") on R console. To use Shiny package include  library(shiny)  at the beginning of your code.

Basic Structure of a Shiny App
All Shiny applications have two components: a user-interface definition and a server script. A user definition script consists of the code that makes the interface of the application for user to use and it is always defined in a source file named ui.R while the server script interprets the input given by the user to produce the output(visuals) and it is always defined in a source file name server.R.

You can start learning Shiny package at http://shiny.rstudio.com/tutorial/

 

 

Learning R: Datacamp.com vs Swirl Package

On one of my previous blog, I wrote about R language,where I have suggested two resources to learn it: Swirl Package and Datacamp.com, in this blog, I am going review both of them. I’ll compare these on following attributes: Price,Courses, Offline availability, Difficulty level, User support and Learning environment.

  • Price: Swirl and Datacamp both are free to learn at, to start learning with Swirl Package you need to download and install it on R console,while you can register for free at www.datacamp.com and take the course.
  • Courses Available: Swirl offers four courses for beginners namely R programming, Data analysis, Mathematical Biostatistics Boot Camp and Open Intro while Datacamp offers Introduction to R, Data Analysis & Statistical Inference, Introduction to Computational Finance and Econometrics and How to work with Quandl.
  • Offline Availability: Internet connectivity is needed to download Swirl package but after that the connection is not required unless you want to take a new course.For Datacamp internet connectivity is a must throughout the course.
  • Difficulty level: As both offers a beginners course to learn, the difficult level is novice.At both the places you will find the difficulty level increasing eventually.
  • User Support: Both of them provide a good user support, you can get help at any point if you get stuck.
  • Learning Environment: Swirl needs R environment to run while at Datacamp you can just log in and start learning.

Presenting the Results : Working with D3.js (A JavaScript Library)

Image Source :- www.d3js.org
Image Source :- http://www.d3js.org

Even a great analysis is worthless if no one understands the results or simply chooses to ignore them, and that depends upon the briefing or presentation to present analysis results to the users. Your analysis report has to be very reader-friendly and it should also contains certain elements like graphs, tables, charts etc. which convey information quickly and concisely.

D3.js is a JavaScript library for manipulating documents based on data. It helps you bring data to life using HTML, SVG(Scalable Vector Graphics) and CSS. It allows you to bind arbitrary data to a Document Object Model (DOM), and then apply data-driven transformations to the document. It is fast, supports large datasets and has dynamic behaviors for interaction and animation.

Few examples:

Image Source:- http://bl.ocks.org/mbostock/5944371
Image Source:- http://bl.ocks.org/mbostock/5944371
Image Source:-http://bost.ocks.org/mike/sankey/
Image Source:-http://bost.ocks.org/mike/sankey/

How to download and use D3.js ?
You can download the latest version at www.d3js.org or,you can directly link to the latest release, by copying the following snippet:

<script src="http://d3js.org/d3.v3.min.js" charset="utf-8"></script>

 

 

Blog at WordPress.com.

Up ↑

Design a site like this with WordPress.com
Get started