Featured Post

Navigating the AI Boom: My Compass in a World of Digital Brains

As the host of Techsambad, I've had the privilege of exploring the cutting edge of technology with industry experts. In my recent podcast, I sat down with Mouli, a product manager at a stealth-mode consumer AI startup, to demystify the current AI landscape that seems to change by the minute. Here are my key takeaways from our enlightening conversation. The "Deluge of Models": Not All AIS Are Created Equal During our discussion, Mouli highlighted a crucial point that resonated with me: the sheer number of AI models available today is daunting. From Chatgpt, which has almost become a generic term like "Xerox," to offerings from Google (Gemini), Anthropic (Claude), Meta (Llama), and many others, it's a crowded field that can confuse even tech enthusiasts. What sets them apart? I learned that: Training Data & Parameters: Each model is trained on different datasets with varied parameters, leading to unique strengths and weaknesses. Some excel at coding, other...

Six Frequently used Terminologies in Big Data

With the deluge of Data getting added to the system, it is getting difficult to collect, curate and process information.Growing middle class population , widespread penetration of mobility and technology adoption,  are contributing towards exponential rise in the quantum of data.
 It is commonly termed as Big Data problem. In this blog, i have listed out and defined six frequently used terminologies on big data.

1.) Big Data:  Uri Fredman, in his article at FP, had charted the timeline of Big Data evolution. In 1997,NASA researchers Michael Cox and David Ellsworth use the term "big data" for the first time to describe a familiar challenge in the 1990s: supercomputers generating massive amounts of information -- in Cox and Ellsworth's case, simulations of airflow around aircraft -- that cannot be processed and visualized. "[D]ata sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk," they write. "We call this the problem of big data."

Wikipedia defines big data in information technology as a large  and complex collection of data sets  that is difficult to process using on-hand database management tools or traditional data processing applications.


2.) Hadoop: Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license.

3.) MapReduce: MapReduce is a programming model for processing large data sets, and the name of an implementation of the model by Google. MapReduce is typically used to do distributed computing on clusters of computers.

4.) Cluster AnalysisCluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). 

5.) Predictive ModellingPredictive modelling is the process by which a model is created or chosen to try to best predict the probability of an outcome.

6.) In Memory Data ComputingReal or near-real time information delivery is one of the defining characteristics of big data analytics. Latency is therefore avoided whenever and wherever possible. Data in memory is good—data on spinning disk at the other end of a FC SAN connection is not. The cost of a SAN at the scale needed for analytics applications is very much higher than other storage techniques. Source: Wikipedia.

As stated earlier, this is not a comprehensive list. Would appreciate if you can give in your feedback on what more can be added to make this list more complete.


Comments

  1. The most prominent problem businesses face following the Sarbanes Oxley Act is the issue of constantly shrinking storage space. The Sarbanes Oxley Act requires that all financial documents be saved - and that includes email correspondence.Self Storage

    ReplyDelete
  2. Information is stored and analyzed on a large number of high-performance servers. Activewizards.com/ advises Hadoop - the key technology, open source.
    Since the amount of information will only increase over time, the difficulty is not to get the data, but how to process it with maximum benefit. In general, the process of working with Big Data includes: collecting information, structuring it, creating insights and contexts, developing recommendations for action. Even before the first stage, it is important to clearly define the purpose of the work: what exactly is the data for, for example, the definition of the target audience for the product. Otherwise, there is a risk of getting a lot of information without understanding how specifically they can be used.

    ReplyDelete
  3. Thank you so much for giving this helpful knowledge. Thank you so much for sharing this concept. I highly recommend doing this as well.
    Interiors in Trivandrum

    ReplyDelete

Post a Comment