Frequently Asked Questions

Running the software

  • Can I run the executables of 1305 on my computer? Yes you can download them from our website and run it on your computer. The previous version of 1305 library let you use them only on a cloud machine, but this model is abandoned.
  • What is the Amazon EC2 cloud? How is this different from using my computer? Amazon EC2 is a service that can provide computer machines on demand. You can turn on one or more machines, connect with them via ssh and run programs. When you are done you switch them off. Amazon charges you by the hour. The cheapest machine costs 0.085$/hour. So Amazon cloud is not any different from your machine, except for the fact that it is remote and you should never keep it on all the time because you pay for it.
  • Will 1305 ever be released as a library so that I can develop my own code and link to it? You now have the option to buy the source code.
  • What operating systems do you support? We currently support Linux (Ubuntu and Debian) and windows.
  • What kind of machines do you support on the amazon cloud? For more information on that issues look at Running 1305 on the Amazon EC2 cloud.
  • Does Analytics 1305 support other clouds? We currently support only Amazon EC2 but if there is demand we will support more.
  • Will the software always be free of charge? We commit that academic institutions will not get charged for the single thread executables. The parallel versions though will have and added cost on top of the Amazon EC2 charge. Academic Institutions will always have generous discounts.

Algorithms/Executables

  • Which algorithms does 1305 include? We currently support
    1. Neighbor search: All neighbor search, with several options, you can take a look at Neighbors (Nearest, Farthest, Range, k, Classification) for more information
    2. Kernel Density Estimation (KDE): Computation of density estimates using one of the most popular nonparametric method, kernel density estimator. For more information see Kernel Density Estimation and Non-parametric Bayes Classifier
    3. K-means: The well known clustering algorithm in a very efficient implementation, for more information see K-Means
    4. Linear Regression, see Linear Regression.
    5. Non-Negative Matrix Factorization, see Non-Negative Matrix Factorization
    6. Support Vector Machine, see Support Vector Machines
  • Are all the methods super scalable? For most of our methods, like neares neighbor, kde, linear regression, kmeans, we have algorithmic improvements that make us much faster than the typical textbook algorithm. For some others though, like support vector machines we don’t have any algorithmic improvement compared to what is out there. We are trying to be as efficient as it is possible with our software techniques.
  • What about the other machine learning algorithms? We are actively developing a set of state of the art machine algorithms that will be coming up soon. Feel free to contact as at support@analytics1305.com for requests suggestions. There is also a list of algorithms implemented in our internal codebase and they will be released soon.
  • Are there any parallel versions of the 1305 algorithms? We already have a parallel version of support vector machines
  • Does the software use parallel computing, especially for matrix operations, such as Math Kernel Library from Intel? The speed of the library is based on algorithmic improvements and approximations. We have used math kernel libraries that use multi threading and we have seen speedups. All these libraries though are proprietary and we are not allowed to distribute them. If you want to make a custom order we can definitely link against these libraries and ship our product to you.
  • I’m curious how much faster your SVM implementation ran, when compared to LibSVM (or more specifically LibLinear). Any published details? Our SVM implementation is not significantly faster than the libSVM one, since we don’t have any algorithmic improvement on that yet. We are more careful when it comes to memory. We do have though a version of our SVM that runs in the bagging (aka bootstrap) mode with the help of hadoop. This one is more scalable as you would expect. If you really need a scalable classifier with good performance, I would recommend our Nonparametric Bayes one (aka Kernel discriminant analysis) which is very fast and has the same (sometime better) performance than SVM (see kde). You can also use the classic nearest neighbor classifier which is faster but a less accurate (not that much), for more information see Neighbors (Nearest, Farthest, Range, k, Classification). Also examples on real datasets can be found here Examples on Datasets
  • What is the advantage of 1305? The goal of 1305 is to provide scalable machine learning solutions for business and scientific applications. So nearest neighbor KDE and k-means run much faster from the existing solutions. The implementation of our algorithm is at least 2x faster that R and Weka and in many cases it is order of magnitudes, see Benchmarks
  • In what language is 1305 written? The whole library is written in C++.
  • What is a multi-dimensional tree? A multi-dimensional tree is a very efficient structure for grouping data together. It accelerates the computation of nearest neighbors as well as other methods. For more information read here .
  • What kind of multi-dimensional trees does the 1305 use? We currently support kd-trees and metric-trees (also known as ball trees). We will soon release cover-trees depending on the requests.
  • An executable is crashing or has a bug, what should I do? You can visit our forum and post the problem, or send us an email to support@analytics1305.com. Somebody from the company will get back to you soon.
  • If I have questions, requests, what do I do? Does 1305 provide support? Visit our forum or send an email to support@analytics1305.com and one of our engineers will respond to you.
  • What is the progressive mode in the algorithms? Most of the machine learning algorithms are iterative, specially when they deal with optimization. As some of them may take long to achieve the goal we offer the option to run them for a specified number of iterations. Although this might not sound innovative, there are algorithms like KDE and k-nearest neighbors where progressive mode is not trivial.
  • What are the key concepts that make 1305 fast? Here are the following features that make 1305 fast.
    • C++ template meta-programming
    • Memory efficient. Contrary to other libraries, the user has the ability to use the ones he thinks are appropriate and necessary.
    • Use of multidimensional trees.
  • I have invented a highly scalable machine learning algorithm. What can you do about it? We value the effort of the scientist to invent a new algorithm. Talk to us (email) if you think you have something innovative and scalable. We will happily implement it with your help and share revenues with you from its usage on the cloud.
  • Does 1305 run in parallel? The current release runs on a single thread. Very soon bootstrap versions are going to be released as well as parallel versions for some algorithms.

File Formats

  • What kind of file formats does the 1305 support? We use our own format which is in a text form. More information about the spec can be found here 1305 File Format. We do provide scripts for exporting Matlab matrices to our format. We also have scripts that convert svm light files and csv files to the 1305 format. Very soon we will release a script that will convert the Weka ARFF format.
  • Why do I have to convert my data to the 1305 format? The existing popular formats are lacking precision information. Representing the data in the right precision is very critical as it can save a lot of memory which is critical for the performance in large scale computations. If you have data in any other format send us an email (support@analytics1305.com) and we will provide a script for converting it.
  • I have my data in a relational database, how can I use 1305? An obvious way would be to export the data in a csv file and then run 1305 executables. We do have though an experimental implementation that can perform computations in place. Contact us (support@analytics1305.com) for more information.

Company

  • Where does the 1305 come from? The founders of the company are alumni of the Georgia Tech Fastlab which is located in the room 1305 Klaus building. In recognition of the lab’s value to our studies, we named the company after the room number.
  • Are there other products from Analytics 1305? The company also provides custom machine learning/optimization solutions for clients.
  • Has anybody else used 1305 software before? 1305 software is part of the LogicBlox Datalog engine and it is being used for solving large scale retail problems. Other undisclosed clients in the area of commodity trading use it too.

1305 and the other Machine Learning Libraries

  • Why another machine learning library? There is a plethora of machine learning libraries for machine learning. Most of them are open source and can be found at www.mloss.org. Unfortunately most of them are not scalable. Our library is much more scalable than the current ones. We keep improving it.
  • Why not open source? We value the contribution of the open source community as we ourselves use open source software on our everyday life. Versions of our software will be given for free to individuals but not for commercial use. We will accept contributions from researchers and share potential revenues with them from the cloud usage.
  • What is the relationship of Fastlib and 1305? The founders of Analytics 1305 have been members of the developing team of Fastlib. 1305 has been written from scratch and it is intended for industrial use in production systems. Fastlib is an open source academic project and it is a good starting point for doing machine learning research with it. It is a good prototyping tool for your research. We can easily rewrite your code from Fastlib to 1305 if we both see it can be useful for industrial use.