knock-knock-who-there-file-compression-TALK-FINAL-4.tar.trz.bz2.gz By
Irina Shestak
Hey, everyone, my name is Irina.
I write code over at Scripto.
And today I'll be talking about compression.
So when I start off on preparing talks to kind of go into this research mode and I read
a bunch of papers about stuff.
So I was reading stuff� certain papers on compression.
Kind of minifying your code and being faster on the Internet and found this quote.
Rules of JavaScript are beyond� it was cute to find it in a paper.
A bunch of people said we do so much JavaScript and JavaScript takes up so much of our space
on the Internet.
Everything on the Internet, kept taking up space.
So we're writing a ton of java script.
And in fact, by using compression, we were saving about 50% of all that JavaScript traffic.
So that's why I wanted to talk about compression.
I know I lured you in here by Phil Collins, but we won't actually do any Phil Collins.
It's just HTTP compression.
I'm really sorry.
So what is compression?
You start off with a file that, for example, index HTML will look something like this.
Right?
You compress it.
And hopefully it works.
Compress it, compress it, compress it.
And then at some point you end up with a much smaller output.
So if you were to, like, the compress file, looks wonkier, much different than the first
one we had to begin with.
Cool.
And a bunch of that, if you think about it, if it happens over HTTP, you have the client
talking with each other and you want to pass in the information.
What the client says, is I'm going to accept some types of compression.
These and these, that goes under code header.
Cool, I'm reading that through, but I'll send it based on whatever you can accept.
So if the client says I'll do Deflate, G zip, it will send in whatever it feels is necessary.
What are these names?
They're a little bit confusing.
Let's kind of work through these as we go through the talk.
So if you were to actually look into your headers and stuff like that in your console,
you can see that the accept encoding could be seen through the request headers and then
if you were to look at the response headers, they're just the same.
And that's what the� that's what the page sends you back.
Finally, this is from acceptance coding.
Cool.
So a lot of these� a lot of these compression things started off with something called Deflate.
That's kind of the basic algorithm that most of these will use.
Things like Gzip.
But Deflate is a basis of two Huffman and LZ77.
And these are kind of compression algorithms that take in a bunch of data and see whether
or not it's similar in one way or another and look at distances between similar codings.
I won't talk too much about them because it's a little bit boring.
So I have a bunch of sketched notes on those two in particular so you can look at those
later.
But I will talk about Deflate.
Deflate is the basis for everything else to begin with.
So Deflate's used to quite� quite primarily with PNGs and Gzip.
Those are the ones we're used to seeing to begin with.
What it is a stream of blocks.
All of these compression things are streams.
With Deflate, stream of blocks.
That each block is individually made up of three different bits.
And the first one is the one that indicates whether or not there will be more data coming
through in your stream of data.
One means we're done here, we can close out that stream.
And zero means there will be more stuff coming in.
So it's binary.
The second bit, it passes on the actual raw data and the block.
The ones they talked about that I won't actually mention as much.
And the rest of it is a little bit more dynamic.
So it's the dynamic blocks and the table that kind of references all the other ones to themselves.
And there's like a little bit that you don't touch.
So that's kind of what a single individual Deflate look looks like.
Awesome.
And so the mention that I will do with LZAT77 and Huffman, is the compression works with
the pointers, and the pointers are handled by the 77.
And when we talk about weighted symbols with and symbols in general and how these are a
lot more than a letter A, has a lot more meaning.
So that's Deflate.
And all of these other ones are basis of that.
And the one we hear a lot more about than Deflate is Gzip compression.
Those are the ones that most of the content on the Internet actually moves through.
So Gzip is kind of cool.
And an implementation of that, that's probably the mostused software library out there, maybe
after like MySQL lite, is zlib.
It's the one that actually has a noted limitation.
That's the one we'll talk about in a bit.
So you get a little bit more control over processing and memory.
So that means that you can create a tradeoff as to Whether or not you want a more compressed
content or you want it to be faster.
That's how you get around it.
And it comes with three levels that you can do to create that kind of compression.
So on level one or level zero, you don't do any compression.
But on level nine you are able to compress content a lot further and you are able to
go back further in your static blocks to copy content over and pass it through.
So what would be cool is if we do a little bit of work with zlib.
So I mostly wrote node.
And so zlib is a builtin API in Node.
So I want to� what I want to do is actually� let's work through it and see what we can
do in terms of compressing content as we serve our files over to the client.
So let's actually do that.
Awesome.
So I kind of started us off� yeah.
There's a lull in case of forget file.
So I started us off with having an HTTP.
We'll have assessed because we wanted to create a stream.
When we serve stuff over.
And I've started us off with actually having a server.
Cool.
And so when I was mentioning previously what the client does is it says, I'm going to accept
certain types of encoding.
And based on those types of encoding, we should be able to work with certain types of data.
So I said we'll be working with zlib.
So actually, let's get zlib in here.
And we'll be using streams.
So I'm just going to require pump.
And pump is a nonbroken version of pipe.
[ Laughter ] Whoops.
Did I say that out loud?
I was just getting there� cool.
So what we want to be able to do is we want to have a source.
So this will be our stream.
So we'll have a stream that takes in our indexed HTML.
And I should have an indexed HTML�in here.
Cool.
And what we want to be able to do is based on this accept header that we get, we want
to do different types of content.
So this would be a basic match for Gzip and I'll get the index in a bit because I forgot.
So we'll be working with either Gzip or Deflate.
Now we'll actually copy the matches over because there's a little bit easier.
And then based on that particular match that is Deflate, I want to be able to create a
Deflate compressed file.
And what pump does is it takes your source and it handles your operation, handles your
output, which is the response.
And I also have a handler that does the error handling for us.
And then what else I want is just like to be able to make sure that whatever's the response
that comes out is fine.
So our content encoding here is� I'm stuck.
Oh.
I'm on a different keyboard.
I'm sorry.
Persian is not going to work in� neither will Russian.
Cool.
So I'll copy these over as well to go a little bit faster.
And then if we have a Gzip kind of content we will then do the same thing.
Zlib has a Gzip implementation.
Zlib will always have a G zip implementation, the Node API.
And do the same thing.
Pump it over.
Okay.
Awesome.
So let's actually test this out.
Because we want to be able to see whether the content will actually get.
So let's start up a Node server.
And what we want to do is we want to be able to curl the command.
And we need to pass in headers.
Because that's what basically we're looking for.
So the headers is accept encoding.
And that would be Gzip.
Let's start off with G zip and local host.
And I've done this literally every time the command is header.
Yeah.
Funny story, my track pad stopped working yesterday.
Okay.
We'll type it out again.
And we'll actually get the information right away to be able to see this.
So header is accept encoding.
And it's G zip, again.
And local host.
Cool.
That actually did not get zipped.
That's not cool.
But what we can see is it's going through okay.
Well, at least we get the file back.
So let's go back to our editor.
So that's� should be lower cased is the issue.
And if we go back.
No.
That's still not working.
Cool.
Yeah.
Probably a good idea.
Ha.
Cool.
[ Applause ] And what would be kind of interesting to see
here is actually how much space this takes.
So what we can do is� huh?
Okay, you can't see the bottom.
Cool.
That would be important.
I think my keyboard also stopped working.
Awesome.
So what we also want to do is� is maybe just see some output.
Okay.
We'll silence everything else and then we'll still pass it in the header.
Which is accept encoding, G zip� and then what we'll do is we'll pipe it over and just
look at the word count.
Which is 235 with Deflate.
I'm going to minus this a bit.
Just for a second.
And then what else we wanted to do is compare it to what a Deflate output would look like
instead.
And kind of do the same thing.
So it will also compress, but the word count should be a little bit different.
There is 305.
So Gzip is smaller than what a Deflatecompatible thing is.
Which is kind of interesting.
So that was kind of the Zlib example I wanted to show you, which is neat.
And then let's look at other methods that you could use.
So we looked at like 305 Deflate, 235 Gzip just so you keep those in mind.
Cool.
Another one I wanted to talk about is Broccoli.
It's been in out in the wild for about a year or two.
So it's a fairly new kind of way of compressing.
And what it does, it's specifically was made to handle HTTP compression.
So it's made specifically for web development.
Which is kind of cool.
If we look and go back into this stream of blocks that Deflate does.
A single block has the literals, if you don't think about it in bits that I was talking
about before.
Has literals and a length and a distance.
So like the pointers and then the weighted symbols.
That's Deflate.
And for Broccoli, a single block is actually a set of commands.
You can see at the bottom.
So instead of just actually these interpretations, it's a set of commands.
And then what it has is insert, copy, distance, and actual insert literals.
And what's interesting about this is that normally weighted symbols have a dictionary
that you only insert while working in a specific window.
But what you can do is go back further at any point in time instead of just in a specific
window and apply that dictionary.
Which makes for a much smaller compressed file.
Which is neat.
But it also comes with a dictionary of commonlyused HTML terms, which is, again, specifically
made for HTTP compression so makes for a much smaller file.
But at the base, you can tweak it.
But at its base it's much slower and you have to wait a little bit longer for the compression
to happen.
So a lot of the times it gets used when it's static content rather than doing it on the
fly.
And what I wanted to show you now is actually being able to use this library I found called
Broccoli backwards, because I can pronounce that.
And actually, so you should be able to see the difference kind of between the two things.
Okay.
So let's start up another file.
And
what will� oh.
What we'll need is kind of the same deal.
It's just copied over.
We need Zlib and I'll just call it compress.
Okay, backwards.
Tr� no.
Oh, so we'll use pump and we'll still use this.
And we'll still do it on the fly rather than actually compressing files� static files.
Just so you see the difference.
Okay.
Let's not worry about the headers.
We'll still need a source.
Let's not worry about having these.
What we'll have is, then, instead of having compressed, we'll create a stream with this
stuff.
I think it's "Create stream"?
Ah.
Maybe.
That is correct.
It's called this��create stream.
Awesome.
Let's again start up a server.
Yeah.
And let's look at� I mean, the compressed file is going to look the same.
I want to look at what the word count is like.
So we'll do the same thing.
We kind of don't need to send in headers, I think, but it doesn't matter.
And also look at the word count.
Oh.
Well, the word count didn't work.
Create stream is not a function.
Okay.
There's a bunch of different methods, so you can do asynchronous.
I like this stuff.
It's create stream� cool.
That would make sense.
You're compressing.
Okay.
And we were going to need a server again.
Which is like 148, which is neat.
Yeah.
So there's that.
And that's how Broccoli works.
If we need to, actually, I think, in� I think it's super neat to be able to work with
something that's more static so you're probably going to have a ton of images.
You're probably going to have a ton of static assets you want to share.
If they're taking up your space, it's nice to use it.
And most major browsers will accept it.
All� the big four.
The big four are all on board, so Edge, Firefox, the big three and Chrome are all accepting.
I'm to the sure about Safari.
So you'll have that.
And then when you're working with your usual kind of bundles of JavaScript and CSS, working
with something like G zip if you're not already doing that.
We went through the example.
So I wanted to kind of show you what it is like to kind of work with HTTP compression.
I hope you learned something about Zlib and I hope you get to use it in your daytoday
work.
Thanks.
[ Applause ]
Không có nhận xét nào:
Đăng nhận xét