Here, are a list of a few essential and potential notes in a pdf format for a thorough preparation or revision of the B. Tech First Year course programme of various branches-. Textbooks and Reference books are reference sources that guide and impart a thorough knowledge and comprehension of the course subjects. Textbooks and Reference books are a rich source of well-researched information, and candidates should consult books that provide excellent conceptual background.
The article on B. Tech First Year Notes provides the best and essential textbooks and reference books for B. Students can refer to the article and read through the B. The list of best and highly significant textbooks and reference books for B.
Tech First Year preparation are as follows, and the candidates can select the textbooks and reference books that meet their knowledge and prepare accordingly. The syllabus is a practical course planning tool that presents an initial idea and a brief overview of the B. Tech First Year updated syllabus. Tech 1st Year Notes provides a detailed and comprehensive view of the B.
Thus, B. Tech 1st Year course syllabus presents students with a clear and concise outline of what to study, how to learn, and how to decode the chapters and topics. Tech I Year Lecture Notes enlists a semester-wise division of the crucial subjects under each semester for a better preparation or revision process.
Students must encompass all the essential topics before attempting the B. Tech First Year exam so that the paper is reasonably easy and answerable. Candidates must ensure to stay aware of the Bachelor of Technology 1st Year Syllabus as it prevents students from squandering unnecessary time on redundant topics. Candidates pursuing Bachelors in Technology B. Students can refer to this article for the latest Python Syllabus Pdf Download.
All of this material provided is more to update than the regular B. Tech Notes that are available online. Students can achieve the best grades by using these Python Handwritten Notes Pdf Free Download to study as it can provide for thorough preparation. Python is a high-level object-oriented programming language. Since Python runs on an interpreter system, the code that is written can be executed immediately. In this article, students studying their Bachelors in Technology will find the Python Programming Notes Pdf Free Download provided to be far more superior than other notes available online, the reason being that the notes are updated.
A reduce repeatedly applies a given operation to the elements of an array until only a single result remains. Outer products Finally, any ufunc can compute the output of all pairs of two different inputs using the outer method. Another extremely useful feature of ufuncs is the ability to operate between arrays of different sizes and shapes, a set of operations known as broadcasting.
Summing the Values in an Array As a quick example, consider computing the sum of all values in an array. Multidimensional aggregates One common type of aggregation operation is an aggregate along a row or column.
Similarly, we can find the maximum value within each row: In[12]: M. The axis keyword specifies the dimension of the array that will be collapsed, rather than the dimension that will be returned. Some of these NaN-safe functions were not added until NumPy 1.
Table provides a list of useful aggregation functions available in NumPy. We may also wish to compute quantiles: In[16]: print "25th percentile: ", np. Broadcasting is simply a set of rules for applying binary ufuncs addition, subtraction, multiplication, etc. We can similarly extend this to arrays of higher dimension. While these examples are relatively easy to understand, more complicated cases can involve broadcasting of both arrays.
The geometry of these examples is visualized in Figure Visualization of NumPy broadcasting The light boxes represent the broadcasted values: again, this extra memory is not actually allocated in the course of the operation, but it can be useful conceptually to imagine that it is.
Used with permission. The shapes of the arrays are: M. How does this affect the calculation? But this is not how the broadcasting rules work!
That sort of flexibility might be useful in some cases, but it would lead to potential areas of ambiguity. Centering an array In the previous section, we saw that ufuncs allow a NumPy user to remove the need to explicitly write slow Python loops. Broadcasting extends this ability. Imagine you have an array of 10 observations, each of which consists of 3 values.
Plotting a two-dimensional function One place that broadcasting is very useful is in displaying images based on two- dimensional functions. In NumPy, Boolean masking is often the most efficient way to accomplish these types of tasks. Example: Counting Rainy Days Imagine you have a series of data that represents the amount of precipitation each day for a year in a given city. What is the average precipitation on those rainy days?
How many days were there with more than half an inch of rain? Digging into the data One approach to this would be to answer these questions by hand: loop through the data, incrementing a counter each time we see values in some desired range. The result of these comparison operators is always an array with a Boolean data type. Working with Boolean Arrays Given a Boolean array, there are a host of useful operations you can do. Another way to get at this information is to use np.
For example: In[22]: are all values in each row less than 8? These have a different syntax than the NumPy versions, and in particular will fail or produce unintended results when used on multidimensional arrays. Be sure that you are using np. But what if we want to know about all days with rain less than four inches and greater than one inch? For example, we can address this sort of compound question as follows: In[23]: np. Here are some examples of results we can compute when combining masking with aggregations: In[25]: print "Number days without rain: ", np.
A more powerful pattern is to use Boolean arrays as masks, to select particular subsets of the data themselves. We are then free to operate on these values as we wish.
When would you use one versus the other? In Python, all nonzero integers will evaluate as True. For Boolean NumPy arrays, the latter is nearly always the desired operation. Fancy Indexing In the previous sections, we saw how to access and modify portions of arrays using simple indices e. For example: In[8]: row[:, np.
Modifying Values with Fancy Indexing Just as fancy indexing can be used to access parts of an array, it can also be used to modify parts of an array. For example: 82 Chapter 2: Introduction to NumPy www. The result, of course, is that x[0] contains the value 6. Why is this not the case? With this in mind, it is not the augmentation that happens multiple times, but the assignment, which leads to the rather nonintuitive results.
So what if you want the other behavior where the operation is repeated? For this, you can use the at method of ufuncs available since NumPy 1. Another method that is similar in spirit is the reduceat method of ufuncs, which you can read about in the NumPy documentation. Example: Binning Data You can use these ideas to efficiently bin data to create a histogram by hand. For example, imagine we have 1, values and would like to quickly find where they fall within an array of bins.
We could compute it using ufunc. A histogram computed by hand Of course, it would be silly to have to do this each time you want to plot a histogram. This is why Matplotlib provides the plt. To compute the binning, Matplotlib uses the np.
How can this be? If you dig into the np. Sorting Arrays Up to this point we have been concerned mainly with tools to access and operate on array data with NumPy. This section covers algorithms related to sorting values in NumPy arrays. All are means of accomplishing a similar task: sorting the values in a list or array. Fortunately, Python contains built-in sorting algorithms that are much more efficient than either of the simplistic algorithms just shown. Fast Sorting in NumPy: np. By default np.
To return a sorted version of the array without modifying the input, you can use np. NumPy provides this in the np.
Within the two partitions, the elements have arbitrary order. Similarly to sorting, we can partition along an arbitrary axis of a multidimensional array: In[13]: np. Finally, just as there is a np. With the pairwise square-distances converted, we can now use np. We can do this with the np.
Visualization of the neighbors of each point Each point in the plot has lines drawn to its two nearest neighbors. At first glance, it might seem strange that some of the points have more than two lines coming out of them: this is due to the fact that if point A is one of the two nearest neighbors of point B, this does not necessarily imply that point B is one of the two nearest neighbors of point A.
Although the broadcasting and row-wise sorting of this approach might seem less straightforward than writing a loop, it turns out to be a very efficient way of operating on this data in Python. You might be tempted to do the same type of operation by manually looping through the data and sorting each set of neighbors individually, but this would almost certainly lead to a slower algorithm than the vectorized version we used. Big-O Notation Big-O notation is a means of describing how the number of operations required for an algorithm scales as the input grows in size.
Far more common in the data science world is a less rigid use of big-O notation: as a general if imprecise description of the scaling of an algorithm.
Big-O notation, in this loose sense, tells you how much time your algorithm will take as you increase the amount of data. For our purposes, the N will usually indicate some aspect of the size of the dataset the number of points, the number of dimensions, etc.
Notice that the big-O notation by itself tells you nothing about the actual wall-clock time of a computation, but only about its scaling as you change N. But for small datasets in particular, the algorithm with better scaling might not be faster. Creating Structured Arrays Structured array data types can be specified in a number of ways. Earlier, we saw the dictionary method: In[10]: np.
The next character specifies the type of data: characters, bytes, ints, floating points, and so on see Table The last character or characters represents the size of the object in bytes. For example, you can create a type where each element contains an array or matrix of values. Why would you use this rather than a simple multidimensional array, or perhaps a Python dictionary? On to Pandas This section on structured and record arrays is purposely at the end of this chapter, because it leads so well into the next package we will cover: Pandas.
Pandas is a newer package built on top of NumPy, and provides an efficient implementation of a DataFrame. As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs. In this chapter, we will focus on the mechanics of using Series, DataFrame, and related structures effectively. We will use examples drawn from real datasets where appropriate, but these examples are not necessarily the focus.
Details on this installation can be found in the Pandas documentation. If you followed the advice outlined in the preface and used the Anaconda stack, you already have Pandas installed. Once Pandas is installed, you can import it and check the version: In[1]: import pandas pandas. For example, to display all the contents of the pandas namespace, you can type this: In [3]: pd. Introducing Pandas Objects At the very basic level, Pandas objects can be thought of as enhanced versions of NumPy structured arrays in which the rows and columns are identified with labels rather than simple integer indices.
As we will see during the course of this chapter, Pandas provides a host of useful tools, methods, and functionality on top of the basic data structures, but nearly everything that follows will require an understanding of what these structures are. Series [0. The values are simply a familiar NumPy array: In[3]: data. For example, the index need not be an integer, but can consist of values of any desired type. A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and a Series is a structure that maps typed keys to a set of typed values.
This typing is important: just as the type-specific compiled code behind a NumPy array makes it more efficient than a Python list for certain operations, the type information of a Pandas Series makes it much more efficient than Python dictionaries for certain operations.
For example, data can be a list or NumPy array, in which case index defaults to an integer sequence: In[14]: pd. DataFrame as a generalized NumPy array If a Series is an analog of a one-dimensional array with flexible indices, a DataFrame is an analog of a two-dimensional array with both flexible row indices and flexible column names.
Just as you might think of a two-dimensional array as an ordered sequence of aligned one-dimensional columns, you can think of a DataFrame as a sequence of aligned Series objects. DataFrame as specialized dictionary Similarly, we can also think of a DataFrame as a specialization of a dictionary. Where a dictionary maps a key to a value, a DataFrame maps a column name to a Series of column data.
For a DataFrame, data['col0'] will return the first column. From a single Series object. Any list of dictionaries can be made into a DataFrame. DataFrame data Out[24]: a b 0 0 0 1 1 2 2 2 4 Even if some keys in the dictionary are missing, Pandas will fill them in with NaN i. As we saw before, a DataFrame can be constructed from a dictionary of Series objects as well: In[26]: pd. Given a two-dimensional array of data, we can create a DataFrame with any specified column and index names.
If omitted, an integer index will be used for each: In[27]: pd. DataFrame np. This Index object is an interesting structure in itself, and it can be thought of either as an immutable array or as an ordered set technically a multiset, as Index objects may contain repeated values. Those views have some interesting consequences in the operations available on Index objects. Index as ordered set Pandas objects are designed to facilitate operations such as joins across datasets, which depend on many aspects of set arithmetic.
These included indexing e. Data Selection in Series As we saw in the previous section, a Series object acts in many ways like a one- dimensional NumPy array, and in many ways like a standard Python dictionary. Examples of these are as follows: In[7]: slicing by explicit index data['a':'c'] Out[7]: a 0. Notice that when you are slicing with an explicit index i.
Indexers: loc, iloc, and ix These slicing and indexing conventions can be a source of confusion. For example, if your Series has an explicit integer index, an indexing operation such as data[1] will use the explicit indices, while a slicing operation like data[] will use the implicit Python-style index.
First, the loc attribute allows indexing and slicing that always references the explicit index: In[14]: data. The purpose of the ix indexer will become more apparent in the context of DataFrame objects, which we will discuss in a moment. Data Selection in DataFrame Recall that a DataFrame acts in many ways like a two-dimensional or structured array, and in other ways like a dictionary of Series structures sharing the same index.
These analogies can be helpful to keep in mind as we explore data selection within this structure. DataFrame as a dictionary The first analogy we will consider is the DataFrame as a dictionary of related Series objects. For example, if the column names are not strings, or if the column names conflict with methods of the DataFrame, this attribute-style access is not possible.
DataFrame as two-dimensional array As mentioned previously, we can also view the DataFrame as an enhanced two- dimensional array. We can examine the raw underlying data array using the values attribute: In[24]: data.
For example, we can transpose the full DataFrame to swap rows and columns: In[25]: data. In particular, passing a single index to an array accesses a row: In[26]: data.
Here Pandas again uses the loc, iloc, and ix indexers mentioned earlier. Using the iloc indexer, we can index the underlying array as if it is a simple NumPy array using the implicit Python-style index , but the DataFrame index and column labels are maintained in the result: In[28]: data.
For example, in the loc indexer we can combine masking and fancy indexing as in the following: In[31]: data. Pandas includes a couple useful twists, however: for unary operations like negation and trigonometric functions, these ufuncs will preserve index and column labels in the output, and for binary operations such as addition and multiplication, Pandas will automatically align indices when passing the objects to the ufunc.
We will additionally see that there are well-defined operations between one-dimensional Series structures and two-dimensional DataFrame structures. Series rng. DataFrame rng. UFuncs: Index Alignment For binary operations on two Series or DataFrame objects, Pandas will align indices in the process of performing the operation.
For example, calling A. Operations between a DataFrame and a Series are similar to operations between a two-dimensional and one-dimensional NumPy array. Handling Missing Data The difference between data found in many tutorials and data in the real world is that real-world data is rarely clean and homogeneous.
In particular, many interesting datasets will have some amount of data missing. Trade-Offs in Missing Data Conventions A number of schemes have been developed to indicate the presence of missing data in a table or DataFrame.
Generally, they revolve around one of two strategies: using a mask that globally indicates missing values, or choosing a sentinel value that indicates a missing entry. In the masking approach, the mask might be an entirely separate Boolean array, or it may involve appropriation of one bit in the data representation to locally indicate the null status of a value.
In the sentinel approach, the sentinel value could be some data-specific convention, such as indicating a missing integer value with — or some rare bit pattern, or it could be a more global convention, such as indicating a missing floating-point value with NaN Not a Number , a special value which is part of the IEEE floating-point specification.
None of these approaches is without trade-offs: use of a separate mask array requires allocation of an additional Boolean array, which adds overhead in both storage and computation. Common special values like NaN are not available for all data types. As in most cases where no universally optimal choice exists, different languages and systems use different conventions. Missing Data in Pandas The way in which Pandas handles missing values is constrained by its reliance on the NumPy package, which does not have a built-in notion of NA values for non- floating-point data types.
While R contains four basic data types, NumPy supports far more than this: for example, while R has a single integer type, NumPy supports fourteen basic integer types once you account for available precisions, signedness, and endianness of the encoding. Further, for the smaller data types such as 8-bit integers , sacrificing a bit to use as a mask will significantly reduce the range of values it can represent. With these constraints in mind, Pandas chose to use sentinels for missing data, and further chose to use two already-existing Python null values: the special floating- point NaN value, and the Python None object.
This choice has some side effects, as we will see, but in practice ends up being a good compromise in most cases of interest. None: Pythonic missing data The first sentinel value used by Pandas is None, a Python singleton object that is often used for missing data in Python code.
You should be aware that NaN is a bit like a data virus—it infects any other object it touches. NaN and None in Pandas NaN and None both have their place, and Pandas is built to handle the two of them nearly interchangeably, converting between them where appropriate: In[10]: pd. Series [1, np. For example, if we set a value in an integer array to np. Be aware that there is a proposal to add a native integer NA to Pandas in the future; as of this writing, it has not been included. Table lists the upcasting conventions in Pandas when NA values are introduced.
To facilitate this convention, there are several useful methods for detecting, removing, and replacing null values in Pandas data structures. They are: isnull Generate a Boolean mask indicating missing values notnull Opposite of isnull dropna Return a filtered version of the data fillna Return a copy of the data with missing values filled or imputed We will conclude this section with a brief exploration and demonstration of these routines. Detecting null values Pandas data structures have two useful methods for detecting null data: isnull and notnull.
Either one will return a Boolean mask over the data. Dropping null values In addition to the masking used before, there are the convenience methods, dropna which removes NA values and fillna which fills in NA values.
For a Series, the result is straightforward: In[16]: data. DataFrame [[1, np. Depending on the application, you might want one or the other, so dropna gives a number of options for a DataFrame.
By default, dropna will drop all rows in which any null value is present: In[18]: df. This can be specified through the how or thresh parameters, which allow fine control of the number of nulls to allow through. This value might be a single number like zero, or it might be some sort of imputation or interpolation from the good values. You could do this in-place using the isnull method as a mask, but because it is such a common operation Pandas provides the fillna method, which returns a copy of the array with the null values replaced.
Often it is useful to go beyond this and store higher-dimensional data—that is, data indexed by more than one or two keys. In this way, higher-dimensional data can be compactly represented within the familiar one-dimensional Series and two-dimensional DataFrame objects. For concreteness, we will consider a series of data where each point has a character and numerical key. The bad way Suppose you would like to track data about states from two different years.
Notice that some entries are missing in the first column: in this multi-index representation, any blank entry indicates the same value as the line above it. This syntax is much more convenient and the operation is much more efficient! MultiIndex as extra dimension You might notice something else here: we could easily have stored the same data using a simple DataFrame with index and column labels. In fact, Pandas is built with this equivalence in mind. Each extra level in a multi-index represents an extra dimension of data; taking advantage of this property gives us much more flexibility in the types of data we can represent.
Methods of MultiIndex Creation The most straightforward way to construct a multiply indexed Series or DataFrame is to simply pass a list of two or more index arrays to the constructor. Explicit MultiIndex constructors For more flexibility in how the index is constructed, you can instead use the class method constructors available in the pd.
For example, as we did before, you can construct the MultiIndex from a simple list of arrays, giving the index values within each level: In[14]: pd. MultiIndex level names Sometimes it is convenient to name the levels of the MultiIndex. You can accomplish this by passing the names argument to any of the above MultiIndex constructors, or by setting the names attribute of the index after the fact: In[18]: pop.
MultiIndex for columns In a DataFrame, the rows and columns are completely symmetric, and just as the rows can have multiple levels of indices, the columns can have multiple levels as well. This is fundamentally four-dimensional data, where the dimensions are the subject, the measurement type, the year, and the visit number.
Indexing and Slicing a MultiIndex Indexing and slicing on a MultiIndex is designed to be intuitive, and it helps if you think about the indices as added dimensions.
Rearranging Multi-Indices One of the keys to working with multiply indexed data is knowing how to effectively transform the data. Sorted and unsorted indices Earlier, we briefly mentioned a caveat, but we should emphasize it more here. Many of the MultiIndex slicing operations will fail if the index is not sorted.
Series np. For hierarchically indexed data, these can be passed a level parameter that controls which subset of the data the aggregate is computed on. Panel Data Pandas has a few other fundamental data structures that we have not yet discussed, namely the pd.
Panel and pd. Panel4D objects. These can be thought of, respectively, as three-dimensional and four-dimensional generalizations of the one-dimensional Series and two-dimensional DataFrame structures. Once you are familiar with indexing and manipulation of data in a Series and DataFrame, Panel and Panel4D are relatively straightforward to use.
Additionally, panel data is fundamentally a dense data representation, while multi-indexing is fundamentally a sparse data representation. As the number of dimensions increases, the dense representation can become very inefficient for the majority of real-world datasets. Combining Datasets: Concat and Append Some of the most interesting studies of data come from combining different data sources.
Series and DataFrames are built with this type of operation in mind, and Pandas includes functions and methods that make this sort of data wrangling fast and straightforward. Like np.
Duplicate indices One important difference between np. While this is valid within DataFrames, the outcome is often undesirable. Catching the repeats as an error. With this set to True, the concatenation will raise an exception if there are duplicate indices. Sometimes the index itself does not matter, and you would prefer it to simply be ignored. With this set to True, the concatenation will create a new integer index for the resulting Series: In[11]: print x ; print y ; print pd.
Another alternative is to use the keys option to specify a label for the data sources; the result will be a hierarchically indexed series containing the data: In[12]: print x ; print y ; print pd.
Concatenation with joins In the simple examples we just looked at, we were mainly concatenating DataFrames with shared column names. Consider the concatenation of the following two DataFrames, which have some but not all! The append method Because direct array concatenation is so common, Series and DataFrame objects have an append method that can accomplish the same thing in fewer keystrokes. For example, rather than calling pd. It also is not a very efficient method, because it involves creation of a new index and data buffer.
Thus, if you plan to do multiple append operations, it is generally better to build a list of DataFrames and pass them all at once to the concat function. Combining Datasets: Merge and Join One essential feature offered by Pandas is its high-performance, in-memory join and merge operations. If you have ever worked with databases, you should be familiar with this type of data interaction.
The main interface for this is the pd. Relational Algebra The behavior implemented in pd. Pandas implements several of these fundamental building blocks in the pd. As we will see, these let you efficiently link data from different sources.
Categories of Joins The pd. All three types of joins are accessed via an identical call to the pd. Pandas is excellent at manipulating large amounts of data and summarizing it in multiple text and visual representations. Where things get more difficult is if you want to combine multiple pieces of data into one document. For example, if you want to put two DataFrames on one Excel sheet, you need to use the Excel libraries to manually construct your output.
It is certainly possible but not simple. This article will describe one method to combine multiple pieces of information into an HTML template and then converting it to a standalone PDF document using Jinja templates and WeasyPrint. Before going too far through this article, I would recommend that you review the previous articles on Pandas Pivot Tables and the follow-on article on generating Excel reports from these tables.
They explain the data set I am using and how to work with pivot tables. As shown in the reporting article , it is very convenient to use Pandas to output data into multiple sheets in an Excel file or create multiple Excel files from pandas DataFrames. However, if you would like to combine multiple pieces of information into a single file, there are not many simple ways to do it straight from Pandas. Fortunately, the python environment has many options to help us out. The nice thing about this approach is that you can substitute your own tools into this workflow.
Plug in mako or your templating tool of choice. If you want to use another type of markup outside of HTML , go for it. First, I decided to use HTML as the templating language because it is probably the simplest way to generate structured data and allow for relatively rich formatting.
I also think everyone knows or can figure out enough HTML to generate a simple report. However, if you choose to use other markup languages, the flow should work the same. There are certainly other options out there so feel free to experiment with your options.
0コメント