LaTeX for Complex Stuff

In this post, I will keep updating solutions to common problems we face while preparing LaTeX manuscripts.

Customise the width of any cell in a table: First include package pbox. Then, cell contents of a table will go like this \pbox{5cm}{blah blabh}. Even the contents of cell can be forced to next line by using double backslash, i.e., \\

Customise width of entire table column: In this case, instead of mentioning l,c,r options of tabular environment for the said column, use p with width . For example \begin{tabular}{|r|c|p{4cm}} [Reference]

Present table in landscape style  [Ref]: Add necessary packages and the syntax to show table in landscape mode is as

\usepackage{floats,lscape}
\begin{landscape}
\begin{table}
…table stuff…
\end{table}
\end{landscape}

Place text and Figure side by side:

\usepackage{wrapfig}
\begin{wrapfigure}{r}{4cm} //first option is placement (l,r), second width
\includegraphics[]{abc.pdf}
\caption{}
\label{}
\end{wrap figure}

Show complete paper reference (title, author name, etc) without citation:

\usepackage{bibentry}
\nobibliography*

Write above two lines in the document heading in the same order, and then in the main document, for citing purpose write  \bibentry{paperkey} 

Shrink a table if it moves outside the text area: 

Use resizebox as explained in this Stackoverflow answer.

Local Outlier Factor

Local outlier Factor (LoF) is another density based approach to identify outliers in a dataset. The LoF is applicable to identify outliers in a dataset, which has a mixture of data distributions.

lof
Figure showing the dense and sparse distribution of points [Source: Google images]
The above figure shows two different distributions, a dense cluster of points and a sparse distribution of points.  In such datasets, for each specific distribution within a dataset, we should perform outlier detection locally, i.e., points within one distribution should not affect outlier detection in another cluster. The LoF algorithm follows the same intuition and calculates anomaly score for  each point within a distribution as:

  1. For each data point X , let D^k(X) represent distance of point X to its k^{th} neighbor, and L_{k}(X) represent set of points within D^k(X)
  2. Compute reachability distance for each data point, X  as                                     R_{k}(X, Y) = max(dist(X,Y), D^k(Y))
  3. Compute Average reachability distance AR_{k}(X) of data point X as                         AR_{k}(X) = MEAN_{Y \in L_{k}(X)} R_{k}(X, Y)
  4. In the final step, LOF score for each point, X is calculated as:                                                            LOF_{k}(X) = MEAN_{Y \in L_{k}(X)} \frac{AR_{k}(X)}{AR_{k}(Y)}

To find the best value of k , it is always good to follow ensemble approach, i.e., use a range of k values to calculate LOF scores and then use a specific method to combine the outlier scores.

 

References:

  1. Book: Outlier Analysis by Charu Aggarwal
  2.  Wikipedia
  3. Google Images

Hacks in Document and Presentation preparation

1. INSERT EQUATION IN PRESENTATION:

At times we need to insert Mathematical formulae in presentations (Powerpoint or Keynote),  and both Powerpoint and Keynote allow this by default. But, for guys, who are much comfortable with latex, then a simple and time-saving way is to copy the latex equation in LaTeXiT utility. Another advantage with this is that you can open the past file for editing at any time with this utility. Two simple steps required are:

  1. Copy latex equation or write at first-hand in the lower window of the utility
  2. Save output as pdf or image file. Note you can change the default font size of text in the same utility. Also, additional packages can be added in preamble through menu as LateX- > Show preamble

Here, is the screenshot of LaTeXiT utility

Screen Shot 2016-07-14 at 12.23.48

2. INSERT EDITED PDF IMAGE IN LATEX DOCUMENT:

Please find the detailed steps at this page of the blog.

Local Correlation Integral (LOCI) – Outlier Detection Algorithm

Local Correlation Integral (LOCI) is a density based approach for outlier analysis. It is local in nature, i.e., uses only nearby data points in terms of distance to compute density of a point. In this algorithm we have one tunable parameter – \delta . Personally, I believe that we need to tune k also according to data distribution. LOCI works with following steps

  1.  Compute density, M(X,\epsilon) of data point X as the number of neighbours within distance \epsilon . Here, density is known as counting neighbourhood of data point X
    M(X,\epsilon) = COUNT_{(Y:dist(X,Y) \leq \epsilon; Y \in datapoints )} Y
  2.  Compute average density, AM(X,\epsilon,\delta) of data point   X as the MEAN(density of neighbours of X within distance,   \delta). Here, \delta is known as sampling neighbourhood of X
    AM(X,\epsilon,\delta) = MEAN_{(Y:dist(X,Y) \leq \delta)} M(Y,\epsilon)

    The value of \epsilon is always set to be half of \delta in order to enable fast approximation. Therefore, we need to tune  \delta for accuracy without touching \epsilon

  3.  Compute Multi-Granularity Deviation Factor (MDEF) at distance,  \delta as                                                                                                                                                                    {MDEF}(X,\epsilon,\delta) = \frac{AM(X,\epsilon,\delta) - M(X,\epsilon)}{AM(X,\epsilon,\delta)}  

    This factor shows the deviation of  M from  AM for  X. Since this computation only considers local/neighbour points, therefore LOCI is referred as local in nature. The larger the value of MDEF, the greater is the outlier score. We use multiple values of  \delta to compute MDEF. Mostly we start with a radius containing 20 points to a maximum of radius spanning most of data.

  4. In this step, the deviation of  M from  AM is converted into binary label, i.e., whether  X is outlier or not. For this, we use  \sigma(X,\epsilon,\delta) metric as
    \begin{aligned}{\sigma}(X,\epsilon,\delta) = \frac{STD_{(Y:dist(X,Y) \leq \delta)}M(Y,\epsilon)}{AM(X,\epsilon,\delta)} \end{aligned}
    Here, STD refers to standard deviation.
  5.  A data point, X is declared as an outlier if its MDEF value is greater than  k. \sigma(X,\epsilon,\delta) , where  k is chosen to be 3.

Reference:

  1.  I have understood this algorithm from book: Outlier Analysis by Charu Aggarwal

The Magic of Thinking Big

I got a chance to read the book titled “The Magic of Thinking big” by David J. Schwartz. I started to read on 21 April 2016 and finished on 20 May 2016. I find this book helpful as it guides – how should we develop/improve our innate to the best. This book supports each claim with real-world examples. I believe it is impossible to contradict the author. While reading, I tried to note down some anecdotes and here I pen down the same for my revision. Each bullet heading refers to a different chapter title and sub-bullets are the anecdotes found in the respective chapter

  • Belief
    • Believe in yourself that you can grow/improve with time if you put an effort
    • Believe you are going to make a difference in this world if you would like
  • Cure yourself on Excusists
    • Thinking guides Intelligence –  Think always. Take some time out of your busy schedule and think what you are doing, how you can improve the things, are you really doing something meaningful or it is wastage of time
    • Avoid excusitis: Types – Health, Intelligence, age, luck. Don’t give excuses of any of the mentioned types. Any of these excuses is just like a disease which starts because of small issue but it covers your entire body/life till death. Better avoid giving any excuse. Accept your fault at first hand. 
  •  Build Confidence and Destroy Fear 
    • Action cures fear. Indecision and postponement fertilise fear
    • Hope is a start. It needs action to win victories
    • Destroy negative thoughts before they become mental monsters
    • Recall only best moments of your life
    • Don’t do things that result in guilt. Guilt has serious repercussions
    • Motions/actions are the precursors of emotions
  • How to think Big  
    • Know thyself/yourself
    • Never sell yourself short
    • Conquer the crime of self-deprecation. Concentrate on your assets
    • To think big, use words that produce big positive mental images
    • See Image as – Bright, hope, success, fun, and victory
    • Use big, positive, cheerful words and phrases to describe how you feel
    • Use bright, cheerful, favourable words and phrases to describe other people
    • Use positive language to encourage others
    • Use positive language to outline plans to others
    • Visualisation adds value to everything. A big thinker always visualizes what can be done in the future. He isn’t stuck with the present
    • Practice adding value to things, to people, and to yourself
    • Keep your eyes focused on big objective
    • Ask, “Is it really important”
    • Don’t fall into the triviality trap
  • How to think and dream creatively
    • When you believe in something, your mind automatically finds ways to get it 
    • Where there is a will, there is a way
    • Develop weekly improvement programme
    • Devote 10 minutes every day before work: What better I can do today? What best I can achieve today?
    • Capacity is a state of mind! How much we can do, depends on  how much we think we can do
    • Don’t let traditions paralyze your mind. Be experimental!
    • Ask yourself, “How can I do more?”
    • Practice asking and listening. 
    • Stretch your mind. Associate with people who can help you think of new ideas
  • You are what you think you are 
    • Dress properly. It defines you to others and most importantly to your innate. It gives you respect and defines your identity. Your appearance talks to others
    • Look important and think your work is important
    • The way we think about our jobs determines how our subordinates think toward their jobs
    • Always give a pep-talk [confidence building]
    • Practice uplifting self-praise. Don’t  practise belittling self-punishment
    • Sell yourself on yourself. Create a Commercial of yourself and repeat this commercial every day number of times
    • Always ask yourself, “Is this the way an important person thinks?”
  • Manage your environment 
    • You are a product of your environment
    • Always associate with positive people, and indulge with diverse groups. This gives a first-hand experience and increases your horizon
    • Never get involved in gossips. It is a thought poison
  • Make your attitudes your allies 
    • Build enthusiasm. Possible means: (i) Dig Deeper, (ii) Broadcast good news
    • Practice Appreciation: No matters what, to whom, always appreciate
    • Practice calling people by their names
    • Ask everyday yourself, “ What can I do today to make the day better, add-on scientific career”  
  • Get the Action Habit 
    • Don’t wait for perfect actions. Got an idea – work over it as soon as possible
    • To fight fear, act. To increase fear – wait, postpone, put-off
    • Action cures fear
    • Take pencil and paper; and start writing, figuring your ideas on paper
    • Benjamin Franklin: Don’t put off until tomorrow what you can do today
  • How to turn defeat into action 
    • Persistence does not guarantee victory, but adding experimentation make things happen. Possible approach: Apply, fail and, re-learn 
    • Study your setbacks to make your future bright
    • Stop blaming luck. Blaming luck will never help you to reach your full potential
    • Find the good side of each situation
  • Use goals to Help you Grow 
    • Goal should be clear and crisp, what you are going to do
    • Without setting goals, people work on hazy things without knowing where they spend time
    • Goals are essential to success as air is to life
    • Success requires heart and soul effort and you can put both of them into what you really desire. So, always work on the things which you like, no matters what other people think
    • Energy increases and even multiplies when you set a desired goal  and resolve to work towards that goal.
    • Use goals to live longer
    • Achieve your goals one at a time. Build 30-day goals
    • Invest in yourself. Purchase things that build your mental power and efficiency.
  • How to think like a leader 
    • Think progress, believe in progress and push for progress
    • Over time, we learn different ways to do things from experiences and from colleagues. Copy only high standards in your daily routine. Be sure that the master carbon copy is worth duplicating
    • Take time from every day/every weekend to confer with yourself, to introspect and to tap  your supreme thinking. Spend some time alone  every day just for thinking.

The book concludes with the following lines: A wise man will be a master of his mind. A fool will be its slave.

Box Cox Transformation

At most times, while dealing with data, I assume that the underlying distribution is normal. Also, I have found that most common statistical measures assume normal distribution of data. But we know that data distributions are not always normal. In simple words, it means that we need to plot the data always so as to confirm the underlying distribution. With plotting,  sometimes we also find that a small transformation ( like x^2 , log(x) ) results in normal distribution. This means that data transformations can make our life simple and allow us to use statistical measures intended form normally distributed data.

Box Cox Transformation: George Box and Sir David Cox came out with a transformation formula which uses different values between -5  and 5  of a parameter (\lambda) to perform transformation. In other words, this formula finds the best value at which data can be represented normally.

\displaystyle { x }_{ \lambda  }^{ ' }  =  \frac { { x }^{ \lambda  } - 1 }{ \lambda  }

\lambda=0 results in log transformation.  It is not guaranteed that data will always get transformed to normal distribution.

 

Reference:

  1. https://www.youtube.com/watch?v=EJ6EhfenqNs
  2. https://www.isixsigma.com/tools-templates/normality/making-data-normal-using-box-cox-power-transformation/

ARIMA, Time-Series Forecasting

We use ARIMA (Auto-Regressive Integrated Moving Average) to model time-series data for forecasting. ARIMA uses three basic concepts:

  1. Auto-Regressive: The term itself points to regression, i.e., it predicts new value using regression over the previous lagged values of the same series. The lags used define its order
  2. Integrated: This concept is used to remove trend (continuously increasing/decreasing time-series) from the time series. This is done by differencing consecutive values of time-series.
  3. Moving Average: In this we perform regression by using the error terms at various lags. The lags used define its order

ARIMA works only on stationary data. If the input data is not stationary (detected via automated tests, i.e, different unit tests like famous Dickey-Fuller test), then stationary is achieved via differencing approach. The ARIMA forecasting equation for a stationary time-series is regression type equation in which predictors consist of previous response values at different lags. This also includes forecast errors at different lags.

Predictor (y)\quad =\quad C\quad +\quad Weighted\quad sum\quad of\quad previous\quad y\quad and\quad previous\quad errors\quad at\quad various\quad lags

Auto-regressive models and exponential smoothing are all special cases of ARIMA models

 

References:

  1. http://ucanalytics.com/blogs/arima-models-manufacturing-case-study-example-part-3/
  2. http://people.duke.edu/~rnau/411arim.htm